<a href="https://colab.research.google.com/github/Shuraimi/DataScience-Handbook-Notes/blob/main/2.%20Data_manipulation_with_Pandas/4.%20Operating_on_data_in_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Operating on data in Pandas

In [None]:
import numpy as np
import pandas as pd

One of the essential pieces of Numpy is its ability to perform element wise operations both with basic arithmetic and more sophisticated operations such as Trigonometric.

Pandas inherits this functionality from Numpy and the Ufuncs discussed.

Pandas includes a couple of twists, however for *unary* operations like negation and trig, the index and column labels are preserved whereas for *binary* operations, such as addition, subtraction etc Pandas will align the indices while passing to Ufunc.

## Ufuncs Index preservation

Because Pandas is designed to work with Numpy, any Numpy Ufuncs will work for Series and DataFrame.

In [None]:
r=np.random.RandomState(42)
s=pd.Series(r.randint(0,10,4))
s

0    6
1    3
2    7
3    4
dtype: int64

In [None]:
df=pd.DataFrame(r.randint(0,10,(3,4)),columns=['A','B','C','D'])
df

Unnamed: 0,A,B,C,D
0,3,1,7,3
1,1,5,5,9
2,3,5,1,9


If we apply any Numpy Ufuncs on Series or DataFrame objects, the result will be another pandas object with the indices preserved

In [None]:
np.exp(s)

0     403.428793
1      20.085537
2    1096.633158
3      54.598150
dtype: float64

In [None]:
np.exp(df)

Unnamed: 0,A,B,C,D
0,20.085537,2.718282,1096.633158,20.085537
1,2.718282,148.413159,148.413159,8103.083928
2,20.085537,148.413159,2.718282,8103.083928


Similarly, other numpy Ufuncs are applied in the same way.

## Ufuncs Index alignment

When performing binary operations on 2 Series or DataFrames, pandas adjusts the indices.

Suppose we are combining 2 data sources,
1. Top 3 states in US by area
2. Top 3 states in US by population

In [None]:
area = pd.Series({'Alaska': 1723337, 'Texas': 695662,
 'California': 423967}, name='area')
population = pd.Series({'California': 38332521, 'Texas': 26448193,
 'New York': 19651127}, name='population')

In [None]:
population/area

Alaska              NaN
California    90.413926
New York            NaN
Texas         38.018740
dtype: float64

In [None]:
area.index|population.index

  area.index|population.index


Index(['Alaska', 'California', 'New York', 'Texas'], dtype='object')

These resulting array contains union of indicies of 2 input arrays.

Any missing values are marked as NaN by default.

Another example:-

Suppose we have 2 Series with different indices

In [None]:
A=pd.Series([2,6,7],index=[0,1,2])
B=pd.Series([2,6,7],index=[1,2,4])
A+B

0     NaN
1     8.0
2    13.0
4     NaN
dtype: float64

When we add them, we get NaN values where indices don't match for a particular index or when there are missing values.

These missing values can be limited from result by filling with 0

In [None]:
A.add(B,fill_value=0)

0     2.0
1     8.0
2    13.0
4     7.0
dtype: float64

### Index alignment in DataFrame

Similar type of alignment takes place in DataFrames.

In [None]:
A=pd.DataFrame(r.randint(0,20,(2,2)),columns=list('AB'))

In [None]:
B=pd.DataFrame(r.randint(0,10,(2,3)),columns=list('BAC'))

In [None]:
A+B

Unnamed: 0,A,B,C
0,17,9,
1,9,10,


Notice that the result has sorted indices and the indices are sorted irrespective of their order.

In [None]:
#we can fill using mean by stacking the rows of A and computing it's mean
fill=A.stack().mean()
A.add(B,fill_value=fill)

Unnamed: 0,A,B,C
0,17,9,14.0
1,9,10,13.0


## Ufuncs Operations between DataFrame and Series

The operations between DataFrame and Series is similar to operations between 1Dand 2D Numpy array.
The index and column is aligned

In [None]:
A=pd.DataFrame(r.randint(0,8,(2,3)),columns=list('ABC'))

In [None]:
A-A.iloc[0]
#differnce of a DataFrame and one of its row

Unnamed: 0,A,B,C
0,0,0,0
1,4,-3,-1


This operate row wise by default.
To operate column wise,

In [None]:
A.subtract(A['A'],axis=0)

Unnamed: 0,A,B,C
0,0,2,-1
1,0,-5,-6


These operations on datawill automatic with the indicies.