Double abstraction...

In [2]:
from rich import print as rprint
series = {'index':[0,1,2,3], 'data':[145,142,38,13],'name':'songs'}

def get(series, idx):
    value_idx = series['index'].index(idx)
    return series['data'][value_idx]

get(series,1)


142

Allowing non-integer values, the data structure supports other index types, such as strings and dates...

In [3]:
songs = {'index':['Paul', 'John', 'George', 'Ringo'], 'data': [145,142,38,13], 'name':'counts'}
rprint(get(songs,'John'))

In [4]:
import pandas as pd
songs2 = pd.Series([145,142,38,13], index=['Paul','John','George','Ringo'])
rprint(songs2)
songs2.index


Index(['Paul', 'John', 'George', 'Ringo'], dtype='object')

1. The generic name for an index == axis
2. And the index values are also called axis labels
3. The data is also called the `values` of the series

The `dtype` printed is the type of the values, not the index. Even though it looks 2D, the index is not part of the values

In [5]:
songs3 = pd.Series([145,142,38,13], name='counts', dtype='int64[pyarrow]')
rprint(songs3)
songs3.index

RangeIndex(start=0, stop=4, step=1)

In the below case, `dtype` == object, which is used for data types not natively supported by pyarrow backend

In [6]:
class Foo:
    pass
ringo = pd.Series(['Richard','Starkey',13,Foo()],name='Ringo')
ringo

0                                 Richard
1                                 Starkey
2                                      13
3    <__main__.Foo object at 0x105b70b30>
Name: Ringo, dtype: object

In [12]:
import numpy as np
nan_series = pd.Series([2,np.nan], index=['Ono','Clapton'])
rprint("Series size:",nan_series.size)
rprint("Series count:", nan_series.count())

Pandas supports indexing by name and by position by using `iloc`

In [14]:
numpy_ser = np.array([145,142,38,13])
rprint("indexing by position:",songs3.iloc[1])
rprint("indexing np array:", numpy_ser[1])

Both pandas series and numpy array have methods in common

In [17]:
rprint(songs3.mean())
rprint(numpy_ser.mean())

Use set operations to determine the methods that are common to both types

In [18]:
len(set(dir(numpy_ser)) & set(dir(songs3)))

112

The mask below represents the locations with a value higher than the median value of the series

In [23]:
songs3 = pd.Series([145,142,38,13],index=["Paul","John","George","Ringo"],name="counts", dtype="int64[pyarrow]")
mask = songs3 > songs3.median()
rprint(mask)
rprint("use mask as filter:", songs3[mask])

NumPy also supports filtering with boolean arrays but lacks the .median method on an array. Instead, NumPy provides a median function in the NumPy namespace.

In [24]:
numpy_ser[numpy_ser > np.median(numpy_ser)]

array([145, 142])