<a href="https://colab.research.google.com/github/Saifullah785/python-data-science-handbook-notes/blob/main/03_03_Operations_in_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Operating in Data in Pandas**
his section introduces how universal functions (ufuncs) from NumPy can be applied to Pandas Series and DataFrames, highlighting how Pandas handles the alignment of data based on indices.

## **Ufuncs: Index Preservation**

 This covers how applying NumPy ufuncs to Pandas objects preserves the index of the original object in the output.

In [55]:
# importing necessary libraries

import pandas as pd
import numpy as np

In [56]:
rng = np.random.default_rng(42)

# Create a Pandas Series with random integers
ser = pd.Series(rng.integers(0, 10, 4))
ser

Unnamed: 0,0
0,0
1,7
2,6
3,4


In [57]:
# Create a Pandas DataFrame with random integers
df = pd.DataFrame(rng.integers(0, 10, (3, 4)), columns=["A", "B", "C", "D"])
df

Unnamed: 0,A,B,C,D
0,4,8,0,6
1,2,0,5,9
2,7,7,7,7


In [58]:
# Apply the exponential function to each element in the Series
np.exp(ser)

Unnamed: 0,0
0,1.0
1,1096.633158
2,403.428793
3,54.59815


In [59]:
# Apply the sine function to each element in the DataFrame after scaling
np.sin(df * np.pi / 4)

Unnamed: 0,A,B,C,D
0,1.224647e-16,-2.449294e-16,0.0,-1.0
1,1.0,0.0,-0.707107,0.707107
2,-0.7071068,-0.7071068,-0.707107,-0.707107


## **Ufuncs: Index Alignment**

 This section explains how Pandas aligns data based on index labels when performing operations between Series or DataFrames with different indices.

## **Index Alignment in Series**

This specifically focuses on how index alignment works when performing operations between two Pandas Series, resulting in the union of the indices and potential NaN values for non-matching labels.

In [60]:
# Create two Pandas Series with state populations and areas
area = pd.Series({'Alaska': 1723337, 'Texas': 695662,
                  'California': 423967}, name='area')

population = pd.Series({'California': 38332521, 'Texas': 26448193,
                        'New York': 19651127}, name='population')

In [61]:
area

Unnamed: 0,area
Alaska,1723337
Texas,695662
California,423967


In [62]:
population

Unnamed: 0,population
California,38332521
Texas,26448193
New York,19651127


In [63]:
# Divide the population Series by the area Series.
# Pandas aligns the data based on the index labels.
population / area

Unnamed: 0,0
Alaska,
California,90.413926
New York,
Texas,38.01874


In [64]:
# Find the union of the indices of the two Series
area.index.union(population.index)

Index(['Alaska', 'California', 'New York', 'Texas'], dtype='object')

In [65]:
# Add two Series with overlapping but not identical indices
# The result will have the union of the indices, and non-matching entries will be NaN
A = pd.Series([2, 4, 6], index=[0, 1, 2])
B = pd.Series([1, 3, 5], index=[1, 2, 3])
A + B

Unnamed: 0,0
0,
1,5.0
2,9.0
3,


In [66]:
# Add two Series with overlapping but not identical indices, filling missing values with 0
A.add(B, fill_value=0)

Unnamed: 0,0
0,2.0
1,5.0
2,9.0
3,5.0


## **Index Alignment in DataFrames**

This explains how index alignment works for operations between two DataFrames, aligning data based on both row and column indices.

In [67]:
# Create a DataFrame A with random integers
A = pd.DataFrame(rng.integers(0, 20, (2, 2)), columns=['a','b'])
A

Unnamed: 0,a,b
0,10,2
1,16,9


In [68]:
A['a']

Unnamed: 0,a
0,10
1,16


In [69]:
# Create a DataFrame B with random integers and different columns/index
B = pd.DataFrame(rng.integers(0, 10, (3, 3)), columns = ['b','a','c'])
B

Unnamed: 0,b,a,c
0,5,3,1
1,9,7,6
2,4,8,5


In [70]:
# Add two DataFrames with different indices and columns
# Pandas aligns based on both index and column labels
A + B

Unnamed: 0,a,b,c
0,13.0,7.0,
1,23.0,18.0,
2,,,


In [71]:
# Add two DataFrames with different indices and columns, filling missing values with the mean of DataFrame A
A.add(B, fill_value=A.values.mean())

Unnamed: 0,a,b,c
0,13.0,7.0,10.25
1,23.0,18.0,15.25
2,17.25,13.25,14.25


# **Ufuncs: Operations Between DataFrames and Series**

 This describes how Pandas handles operations between a DataFrame and a Series, aligning the Series to either the rows or columns of the DataFrame based on the specified axis.

In [72]:
# Create a NumPy array
A = rng.integers(10, size=(3, 4))
A

array([[4, 4, 2, 0],
       [5, 8, 0, 8],
       [8, 2, 6, 1]])

In [73]:
# Subtract the first row of the NumPy array from all rows
A - A[0]

array([[ 0,  0,  0,  0],
       [ 1,  4, -2,  8],
       [ 4, -2,  4,  1]])

In [74]:
# Create a DataFrame from the NumPy array
df = pd.DataFrame(A, columns=list('QRST'))
# Subtract the first row of the DataFrame from all rows
df - df.iloc[0]

# df = pd.DataFrame(A, columns=['Q', 'R', 'S', 'T'])
# df - df.iloc[0]

Unnamed: 0,Q,R,S,T
0,0,0,0,0
1,1,4,-2,8
2,4,-2,4,1


In [75]:
df['Q']

Unnamed: 0,Q
0,4
1,5
2,8


In [76]:
# Subtract the 'R' column (as a Series) from each column of the DataFrame
# axis=0 ensures subtraction is done row-wise (column-wise operation)
df.subtract(df['R'], axis=0)

Unnamed: 0,Q,R,S,T
0,0,0,-2,-4
1,-3,0,-8,0
2,6,0,4,-1


In [77]:
# Select every other column of the first row of the DataFrame
halfrow = df.iloc[0, ::2]
halfrow

Unnamed: 0,0
Q,4
S,2


In [78]:
# Subtract the selected halfrow Series from the DataFrame
# Pandas aligns the Series to the DataFrame's columns
df - halfrow

Unnamed: 0,Q,R,S,T
0,0.0,,0.0,
1,1.0,,-2.0,
2,4.0,,4.0,
