<a href="https://colab.research.google.com/github/dinhngocquan/AI-Fundamentals/blob/main/Chapter1/5_pandas_series.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Pandas Series:

In [1]:
import pandas as pd
import numpy as np

## 1.1. Creating Series objects:

In [3]:
my_index=['a','b','c']
my_value=[222,333,444]
my_dict={'a':222,'b':333,'c':444}
my_arr=np.array(my_value)

In [4]:
pd.Series(my_value)

Unnamed: 0,0
0,222
1,333
2,444


In [6]:
pd.Series(data=my_value,index=my_index)

Unnamed: 0,0
a,222
b,333
c,444


In [5]:
pd.Series(my_arr)

Unnamed: 0,0
0,222
1,333
2,444


In [7]:
pd.Series(my_arr,index=my_index)

Unnamed: 0,0
a,222
b,333
c,444


In [8]:
pd.Series(my_dict)

Unnamed: 0,0
a,222
b,333
c,444


## 1.2. Series attributes and indexing:

In [9]:
s=pd.Series(data=[111,222,333,444],index=['a','b','c','d'],name='MySeries')
s

Unnamed: 0,MySeries
a,111
b,222
c,333
d,444


Series attributes:

In [10]:
s.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [11]:
s.name

'MySeries'

In [12]:
s.dtype

dtype('int64')

In [13]:
s.values

array([111, 222, 333, 444])

Series indexing and slicing:

In [14]:
s[1]

  s[1]


np.int64(222)

In [15]:
s['a']

np.int64(111)

In [16]:
s[2:4]

Unnamed: 0,MySeries
c,333
d,444


In [17]:
s[['a','d']]

Unnamed: 0,MySeries
a,111
d,444


## 1.3. Series operations:

In [18]:
s1 = pd.Series(data=[1,2,3,4], index = ['d','b','c','a'])
s2 = pd.Series(data=[1,2,3,4], index = ['a','b','d','e'])

In [19]:
s1+s2

Unnamed: 0,0
a,5.0
b,4.0
c,
d,4.0
e,


In [20]:
s1-s2

Unnamed: 0,0
a,3.0
b,0.0
c,
d,-2.0
e,


In [21]:
s1*s2

Unnamed: 0,0
a,4.0
b,4.0
c,
d,3.0
e,


In [22]:
s1/s2

Unnamed: 0,0
a,4.0
b,1.0
c,
d,0.333333
e,


In [23]:
2*s1

Unnamed: 0,0
d,2
b,4
c,6
a,8


## 1.4. Series methods:

In [24]:
s1.sum()

np.int64(10)

In [25]:
s1.mean()

np.float64(2.5)

In [26]:
s1.median()

2.5

In [29]:
s1.max()

4

In [30]:
s1.std()

1.2909944487358056

In [31]:
s1.sort_values()

Unnamed: 0,0
d,1
b,2
c,3
a,4


In [32]:
s1.sort_index()

Unnamed: 0,0
a,4
b,2
c,3
d,1


apply() method:

In [33]:
ser_height = pd.Series([165.3, 170.1, 175.0, 182.1, 168.0, 162.0, 155.2, 176.9, 178.5, 176.1,
                        167.1, 180.0, 162.2, 176.1, 158.2, 168.6, 169.2],name='height')
ser_height

Unnamed: 0,height
0,165.3
1,170.1
2,175.0
3,182.1
4,168.0
5,162.0
6,155.2
7,176.9
8,178.5
9,176.1


In [34]:
ser_height.apply(lambda x:x/100)

Unnamed: 0,height
0,1.653
1,1.701
2,1.75
3,1.821
4,1.68
5,1.62
6,1.552
7,1.769
8,1.785
9,1.761


Create a series of objects from Python dictionary objects.

In [35]:
sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
obj1=pd.Series(sdata)
obj1

Unnamed: 0,0
Ohio,35000
Texas,71000
Oregon,16000
Utah,5000


When creating a series of objects with only dictionary objects, the key values of dictionaries enter in order. You can also index them yourself.

In [36]:
states = ['California', 'Ohio', 'Oregon', 'Texas']
obj2=pd.Series(sdata,index=states)
obj2

Unnamed: 0,0
California,
Ohio,35000.0
Oregon,16000.0
Texas,71000.0


Only three of the values in sdata in this example can be checked, because the values for 'California' cannot be found. This value is denoted as NaN(not a number), and is treated as a missing value or NA value in pandas. 'Utah' is not included in the states, so it is excluded from the execution result.

inull and notnull functions are used to find missing data.

In [37]:
pd.isnull(obj2)

Unnamed: 0,0
California,True
Ohio,False
Oregon,False
Texas,False


In [38]:
pd.notnull(obj2)

Unnamed: 0,0
California,False
Ohio,True
Oregon,True
Texas,True


This method also exists as an instance method of the series.

In [40]:
obj2.isnull()

Unnamed: 0,0
California,True
Ohio,False
Oregon,False
Texas,False


A useful function of the series is automatic aligning with indeces and labels in arithmetic operations.

In [41]:
obj1

Unnamed: 0,0
Ohio,35000
Texas,71000
Oregon,16000
Utah,5000


In [42]:
obj2

Unnamed: 0,0
California,
Ohio,35000.0
Oregon,16000.0
Texas,71000.0


In [43]:
obj1+obj2

Unnamed: 0,0
California,
Ohio,70000.0
Oregon,32000.0
Texas,142000.0
Utah,


Both the series object and the series index have a name attribute, which is closely related to the core function of pandas.

In [44]:
obj2.name = 'population'
obj2.index.name = 'state'
obj2

Unnamed: 0_level_0,population
state,Unnamed: 1_level_1
California,
Ohio,35000.0
Oregon,16000.0
Texas,71000.0


The index of the series can be changed through substitution.

In [45]:
obj1

Unnamed: 0,0
Ohio,35000
Texas,71000
Oregon,16000
Utah,5000


In [46]:
obj1.index = ['Bob', 'Steve', 'Jeff', 'Ryan']
obj1

Unnamed: 0,0
Bob,35000
Steve,71000
Jeff,16000
Ryan,5000
