# Series

The Series object is a one dimensional data structure. It can hold numerical data, time data, strings, or arbitrary Python objects. If you are dealing with numeric data, using pandas rather than a Python list will give you additional benefits as it is faster, consumes less memory, and comes with built-in methods that are very useful to manipulate the data.

In [1]:
import pandas as pd
import numpy as np

Construct a series using python list

In [2]:
scores = [90, 67, 60, 70, 50]
names = ['Saheen', 'Dom', 'Jia', 'Kamal', 'Tim']

data_series = pd.Series(data=scores, index=names, name='scores')

In [3]:
data_series

Saheen    90
Dom       67
Jia       60
Kamal     70
Tim       50
Name: scores, dtype: int64

In [4]:
print(data_series.shape)
print(data_series.ndim)

(5,)
1


Accessing the index

In [5]:
data_series.index

Index(['Saheen', 'Dom', 'Jia', 'Kamal', 'Tim'], dtype='object')

like python dist and numpy array pandas series support indexing

In [6]:
data_series['Saheen']

90

Append values to pandas series (with nulls)

In [7]:
new_data = pd.Series(data=[78, None], index=['Kim', 'Jimmy'])
data_series = data_series.append(new_data)

In [8]:
data_series

Saheen    90.0
Dom       67.0
Jia       60.0
Kamal     70.0
Tim       50.0
Kim       78.0
Jimmy      NaN
dtype: float64

Count null values

In [9]:
data_series.isnull().sum()

1

## Like numpy series work with operation

In [10]:
print(f'sum:{data_series.sum()}')
print(f'mean:{data_series.mean():.2f}')

sum:415.0
mean:69.17


## Filter data

In [11]:
score = 60
# let say if a person got 60 or above only will able pass 
passed = data_series >= score
passed # which is mask

Saheen     True
Dom        True
Jia        True
Kamal      True
Tim       False
Kim        True
Jimmy     False
dtype: bool

In [12]:
# filter out data using the mask
data_series[passed]

Saheen    90.0
Dom       67.0
Jia       60.0
Kamal     70.0
Kim       78.0
dtype: float64

Numpy work more simlilar to pandas

In [13]:
numpy_series = np.array(scores, dtype=float)
filtered_data = numpy_series[numpy_series >= score]
filtered_data

array([90., 67., 60., 70.])

Hint - numpy and pandas are more similar

# Series CRUD

In [14]:
# creation
george_dupe = pd.Series(
    data=[10, 7, 1, 22], # must be list, numpy array or dictionary
    index=['1968', '1969', '1970', '1970'], # explicit indexing
    name='George Songs'
)

In [26]:
george_dupe

1968    10
1969     7
1970     1
1970    22
Name: George Songs, dtype: int64

In [23]:
george_dupe2 = pd.Series(
    data={
        '1968': 10,
        '1969': 7,
        '1970': [2, 22],
    },
    index=['1968', '1969', '1970', '1970'],
    name='George Songs'
)

In [25]:
george_dupe2

1968         10
1969          7
1970    [2, 22]
1970    [2, 22]
Name: George Songs, dtype: object

For this data creating series from dictionary less preferable because it cannot place different values in a series for the same index label

## Reading

In [31]:
# reading
george_dupe['1968']

10

The result would be scalar

In [32]:
george_dupe['1970']

1970     1
1970    22
Name: George Songs, dtype: int64

The result would be another series

In [34]:
# iterate 
for item in george_dupe:
    print(item)

10
7
1
22


### Check membership

In [39]:
# try to check 22 is a member of george_dupe
22 in george_dupe

False

In [36]:
22 in george_dupe.values

True

In [37]:
22 in set(george_dupe)

True

But we can directly check membership over index

In [38]:
'1970' in george_dupe

True

In [41]:
george_dupe.items()
# contain (key, value)s

<zip at 0x7f6cf43eea80>

In [43]:
for index, value in george_dupe.iteritems(): #items and iteritems are similar
    print(index, value)

1968 10
1969 7
1970 1
1970 22


## Update

In [47]:
george_dupe.loc['1969'] = 6
george_dupe['1969']

6

In [49]:
# index assignment apped value to the series if the index no exist int the series
george_dupe['1973'] = 11 # index operation either update of append
george_dupe

1968    10
1969     6
1970     1
1970    22
1973    11
Name: George Songs, dtype: int64

In [51]:
# if you try to update 1970 value it will be updated 2 times
george_dupe['1970'] = 2
george_dupe

1968    10
1969     6
1970     2
1970     2
1973    11
Name: George Songs, dtype: int64

In [52]:
# try to update value based purely on position
george_dupe.iloc[3] = 22
george_dupe

1968    10
1969     6
1970     2
1970    22
1973    11
Name: George Songs, dtype: int64

Note that .append method works like python extend method which require another series

In [56]:
# set value 
george_dupe.__setitem__('1974', 9)
george_dupe

1968    10
1969     6
1970     2
1970    22
1973    11
1974     9
Name: George Songs, dtype: int64

## Deletion

In [59]:
# which is not common in pandas
del george_dupe['1973']
george_dupe

1968    10
1969     6
1970     2
1970    22
1974     9
Name: George Songs, dtype: int64

**Try to filter series rather than deleting it**

In [62]:
george_dupe[george_dupe > 6] # mask

1968    10
1970    22
1974     9
Name: George Songs, dtype: int64

# Series indexing

In [63]:
george_dupe.index

Index(['1968', '1969', '1970', '1970', '1974'], dtype='object')

In [65]:
george_dupe.keys()

Index(['1968', '1969', '1970', '1970', '1974'], dtype='object')

In [66]:
george_dupe.index.is_unique

False

In [67]:
george_dupe[george_dupe>2].index.is_unique # try to filter out a duplicate index

True

In [72]:
print(george_dupe)
george_dupe[3] # indexing along axis

1968    10
1969     6
1970     2
1970    22
1974     9
Name: George Songs, dtype: int64


22