# CPSC380: 3_Pandas_3_Essential_Functionality

In this notebook, you will learn how to create the following objects:
 - Arithmetic and Data Alignment
 - Function Application and Mapping
 
Read more: 
 - Python Data Analysis textbook (chapter 5) and 
 - [Pandas website] (https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html).

In [1]:
import pandas as pd
import numpy as np

## 1. Arithmetic and Data Alignment

### 1.1 Between series and series object

In [2]:
ser_1 = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])
ser_2 = pd.Series([10,20,30,40], index=['a', 'b', 'c', 'd'])

ser_1+ser_2

a    11
b    22
c    33
d    44
dtype: int64

In [3]:
ser_1 = pd.Series([1,2,3,4,5], index=['a', 'b', 'c', 'd', 'e'])
ser_2 = pd.Series([10,20,30,40],index=['a', 'b', 'c', 'd'])

ser_1+ser_2

a    11.0
b    22.0
c    33.0
d    44.0
e     NaN
dtype: float64

In [4]:
df = pd.DataFrame({'col1': ser_1, 'col2': ser_2})
print(df,'\n')

# elementwise operation
df['addition'] = df['col1']+df['col2']
df['subtraction'] = df['col1']-df['col2']
df['multiplication'] = df['col1']*df['col2']
df['devision'] = df['col1']/df['col2']
df

   col1  col2
a     1  10.0
b     2  20.0
c     3  30.0
d     4  40.0
e     5   NaN 



Unnamed: 0,col1,col2,addition,subtraction,multiplication,devision
a,1,10.0,11.0,-9.0,10.0,0.1
b,2,20.0,22.0,-18.0,40.0,0.1
c,3,30.0,33.0,-27.0,90.0,0.1
d,4,40.0,44.0,-36.0,160.0,0.1
e,5,,,,,


### 1.2 between dataframe and series object

The default behavior is to align the **index** of the series with the **column index** of the dataframe and 
perform the operations between each row and the series.

In [5]:
df = pd.DataFrame({'col1': ser_1, 'col2': ser_2})

# subtract the whole dataframe from the first row (broadcasting rule)
print(df, '\n')
print(df.iloc[0],'\n')
print(df-df.iloc[0],'\n')

   col1  col2
a     1  10.0
b     2  20.0
c     3  30.0
d     4  40.0
e     5   NaN 

col1     1.0
col2    10.0
Name: a, dtype: float64 

   col1  col2
a   0.0   0.0
b   1.0  10.0
c   2.0  20.0
d   3.0  30.0
e   4.0   NaN 



If you would instead like to operate column-wise, you can use the object methods, while specifying the axis keyword

In [6]:
print(df, '\n')
print(df['col1'],'\n')
df.subtract(df['col1'],axis=0) # {0 or ‘index’, 1 or ‘columns’}

   col1  col2
a     1  10.0
b     2  20.0
c     3  30.0
d     4  40.0
e     5   NaN 

a    1
b    2
c    3
d    4
e    5
Name: col1, dtype: int64 



Unnamed: 0,col1,col2
a,0,9.0
b,0,18.0
c,0,27.0
d,0,36.0
e,0,


In [7]:
df=df.apply(lambda x: x-df['col1'], axis=0)
df

Unnamed: 0,col1,col2
a,0,9.0
b,0,18.0
c,0,27.0
d,0,36.0
e,0,


### 1.3 between dataframe and dataframe object

In [8]:
df_1 = pd.DataFrame(np.arange(1,17).reshape(4,4),
                    index= ['Fi', 'Se', 'Th', 'Fo'],
                    columns = ['a', 'b', 'c', 'd'])

df_2 = pd.DataFrame(np.arange(1,17).reshape(4,4) * 10,
                    index= ['Fi', 'Se', 'Th', 'Fo'],
                    columns = ['a', 'b', 'c', 'd'])
print(df_1,'\n')
print(df_2, '\n')

     a   b   c   d
Fi   1   2   3   4
Se   5   6   7   8
Th   9  10  11  12
Fo  13  14  15  16 

      a    b    c    d
Fi   10   20   30   40
Se   50   60   70   80
Th   90  100  110  120
Fo  130  140  150  160 



In [9]:
df_1+df_2

Unnamed: 0,a,b,c,d
Fi,11,22,33,44
Se,55,66,77,88
Th,99,110,121,132
Fo,143,154,165,176


In [10]:
df_1-df_2

Unnamed: 0,a,b,c,d
Fi,-9,-18,-27,-36
Se,-45,-54,-63,-72
Th,-81,-90,-99,-108
Fo,-117,-126,-135,-144


In [11]:
df_1

Unnamed: 0,a,b,c,d
Fi,1,2,3,4
Se,5,6,7,8
Th,9,10,11,12
Fo,13,14,15,16


In [48]:
print(df_1['a'])
print(type(df_1['a']))
print(df_1['a'].keys())

Fi     1
Se     5
Th     9
Fo    13
Name: a, dtype: int32
<class 'pandas.core.series.Series'>
Index(['Fi', 'Se', 'Th', 'Fo'], dtype='object')


In [47]:
print(df_1[['a', 'b']])
print(type(df_1[['a', 'b']]))
print(df_1[['a', 'b']].keys())

     a   b
Fi   1   2
Se   5   6
Th   9  10
Fo  13  14
<class 'pandas.core.frame.DataFrame'>
Index(['a', 'b'], dtype='object')


## 2. Function Application and Mapping

### 2.1 DataFrame.apply(func, axis=0)
 - Apply a function **along an axis** of the DataFrame.
 - Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1).

In [14]:
df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])

print(df, '\n')
print(df.apply(np.sqrt),'\n') # NOT elemenwise

print(df.apply(np.sqrt, axis=0),'\n')  # along with row axis
print(df.apply(np.sqrt, axis=1))

   A  B
0  4  9
1  4  9
2  4  9 

     A    B
0  2.0  3.0
1  2.0  3.0
2  2.0  3.0 

     A    B
0  2.0  3.0
1  2.0  3.0
2  2.0  3.0 

     A    B
0  2.0  3.0
1  2.0  3.0
2  2.0  3.0


In [15]:
print(df, '\n')
print(df.apply(np.sum, axis=0),'\n')  # first column [4, 4, 4] sum()= 12
print(df.apply(np.sum, axis=1))       # first row [4, 9] sum = 13

   A  B
0  4  9
1  4  9
2  4  9 

A    12
B    27
dtype: int64 

0    13
1    13
2    13
dtype: int64


In [16]:
print(df, '\n')
print(df.apply(lambda x: [1, 2], axis=0),'\n')  # first column [4, 4, 4] result=[1, 2]
print(df.apply(lambda x: [1, 2], axis=1))       # first row [4, 9] result=[1, 2]

   A  B
0  4  9
1  4  9
2  4  9 

   A  B
0  1  1
1  2  2 

0    [1, 2]
1    [1, 2]
2    [1, 2]
dtype: object


In [17]:
frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'),
                     index=['Utah', 'Ohio', 'Texas', 'Oregon'])
frame = np.abs(frame)

In [18]:
print(frame,'\n')

# the difference b/w max-min for each column (for default), or row (axis=1)
print(frame.apply(lambda x: x.max() - x.min(), axis=0),'\n') # along row axis (each column)
# the first column [0.212835, 0.078002, 0.213323, 0.075263]: max-min=0.212835-0.075263=0.138059

print(frame.apply(lambda x: x.max() - x.min(), axis=1),'\n') # along column axis (each row)
# the first row: [0.212835  0.125972  0.328623]

               b         d         e
Utah    1.062016  0.188901  1.153915
Ohio    0.233104  2.561892  1.328863
Texas   1.044360  0.056821  1.393812
Oregon  1.050718  1.946996  0.867579 

b    0.828912
d    2.505071
e    0.526233
dtype: float64 

Utah      0.965015
Ohio      2.328788
Texas     1.336992
Oregon    1.079417
dtype: float64 



In [19]:
print(frame,'\n')

# given an input row or column,
# find (min, max) 
# return the pair with a seris object
def f(x):
    return pd.Series([x.min(), x.max()], index=['min', 'max'])

print(frame.apply(f, axis=0), '\n')         # along row axis (each column)
# the first column [0.212835, 0.078002, 0.213323, 0.075263]: min=0.075263, max= 0.212835 

print(frame.apply(f, axis=1),'\n')          # along column axis (each row)

               b         d         e
Utah    1.062016  0.188901  1.153915
Ohio    0.233104  2.561892  1.328863
Texas   1.044360  0.056821  1.393812
Oregon  1.050718  1.946996  0.867579 

            b         d         e
min  0.233104  0.056821  0.867579
max  1.062016  2.561892  1.393812 

             min       max
Utah    0.188901  1.153915
Ohio    0.233104  2.561892
Texas   0.056821  1.393812
Oregon  0.867579  1.946996 



### 2.2 DataFrame.applymap():
Apply a function to a Dataframe **elementwise**.

This method applies a function that accepts and returns a scalar to every element of a DataFrame.

In [20]:
print(frame,'\n')
print(frame.applymap(lambda x: '%.2f' % x),'\n')

               b         d         e
Utah    1.062016  0.188901  1.153915
Ohio    0.233104  2.561892  1.328863
Texas   1.044360  0.056821  1.393812
Oregon  1.050718  1.946996  0.867579 

           b     d     e
Utah    1.06  0.19  1.15
Ohio    0.23  2.56  1.33
Texas   1.04  0.06  1.39
Oregon  1.05  1.95  0.87 



In [21]:
# if using apply then the input will be an array instead of scalar value
print(frame,'\n')
print(frame.apply(lambda x: ['%.2f' % i for i in x], axis=0))

               b         d         e
Utah    1.062016  0.188901  1.153915
Ohio    0.233104  2.561892  1.328863
Texas   1.044360  0.056821  1.393812
Oregon  1.050718  1.946996  0.867579 

           b     d     e
Utah    1.06  0.19  1.15
Ohio    0.23  2.56  1.33
Texas   1.04  0.06  1.39
Oregon  1.05  1.95  0.87


### 2.3 Series.map(): 
Map values of **Series** according to input correspondence.

Used for substituting each value in a Series with another value, that may be derived from a function, a dict or a Series.

In [22]:
frame['e'].map(lambda x: '%.2f' % x)

Utah      1.15
Ohio      1.33
Texas     1.39
Oregon    0.87
Name: e, dtype: object