### Notes on Series with the pandas library
File: `pd_01_Series.ipynb` <br>
Xuhua Huang <br>
Last updated: July 8, 2022 <br>
Created on: June 18, 2022

In [1]:
import pandas as pd
from pandas import Series, DataFrame
import numpy as np

#### Introduction to `pandas` Data Structures

In [2]:
obj = pd.Series([4, 7, -5, 3])
obj

0    4
1    7
2   -5
3    3
dtype: int64

In [3]:
obj.index

RangeIndex(start=0, stop=4, step=1)

In [4]:
obj.values

array([ 4,  7, -5,  3], dtype=int64)

In [5]:
obj2 = pd.Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])
obj2.index

Index(['d', 'b', 'a', 'c'], dtype='object')

In [6]:
obj2['d'] = 6
obj2[obj2 > 0] * 2

d    12
b    14
c     6
dtype: int64

In [7]:
'b' in obj2

True

In [8]:
'e' in obj2

False

#### Using `NumPy` Array to Construct a `DataFrame`

In [9]:
data = {
    'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
    'year':  [2000, 2001, 2002, 2001, 2002, 2003],
    'pop':   [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]
}

In [10]:
df_states_pop = pd.DataFrame(data=data, columns=['year', 'state', 'pop'])
df_states_pop

Unnamed: 0,year,state,pop
0,2000,Ohio,1.5
1,2001,Ohio,1.7
2,2002,Ohio,3.6
3,2001,Nevada,2.4
4,2002,Nevada,2.9
5,2003,Nevada,3.2


#### Algorithmic and Data Alignment

In [11]:
s1: pd.Series = pd.Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd', 'e'])
s1

a    7.3
c   -2.5
d    3.4
e    1.5
dtype: float64

In [12]:
s2: pd.Series = pd.Series([-2.1, 3.6, -1.5, 4, 3.1], index=['a', 'c', 'e', 'f', 'g'])
s2

a   -2.1
c    3.6
e   -1.5
f    4.0
g    3.1
dtype: float64

In [13]:
"""
When adding 2 series, when there is an index without value
in this case, the index 'd',
an NaN will be filled
"""
s1 + s2

a    5.2
c    1.1
d    NaN
e    0.0
f    NaN
g    NaN
dtype: float64

In [14]:
df1: pd.DataFrame = pd.DataFrame(np.arange(9.).reshape((3, 3)), columns=list('bcd'), index=['Ohio', 'Texas', 'Colorado'])
df1

Unnamed: 0,b,c,d
Ohio,0.0,1.0,2.0
Texas,3.0,4.0,5.0
Colorado,6.0,7.0,8.0


In [15]:
df2: pd.DataFrame = pd.DataFrame(np.arange(12.).reshape((4, 3)), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
df2

Unnamed: 0,b,d,e
Utah,0.0,1.0,2.0
Ohio,3.0,4.0,5.0
Texas,6.0,7.0,8.0
Oregon,9.0,10.0,11.0


**Adding** these 2 `pd.DataFrame` objects will preform an outer-line join, returning a `pd.DataFrame` object. <br>
The index and columns will be the outer-join of the indexes and columns of every DataFrame. <br>

Column `c` and `e` are not commonly-owned columns by both of the DataFrame's, therefore `NaN`s were filled as discussed.

In [16]:
df1 + df2

Unnamed: 0,b,c,d,e
Colorado,,,,
Ohio,3.0,,6.0,
Oregon,,,,
Texas,9.0,,12.0,
Utah,,,,
