<a href="https://colab.research.google.com/github/chantiasNK26768/data-science-visualization/blob/main/EXP2_Pandas_GroupBy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# EXPERIMENT 2 – Pandas GroupBy Operations
**Aim:**  
To perform aggregation, filtering, and transformation operations using Pandas `groupby()`.

**Tools Used:**  
- Python  
- NumPy  
- Pandas  

---


*Cell 1 – Import Libraries & Create DataFrame*

In [1]:
import numpy as np
import pandas as pd

# Set random seed for reproducibility
rng = np.random.RandomState(0)

# Create DataFrame
df = pd.DataFrame({
    'name': ['chanti', 'nagendra', 'shannu', 'sundhar', 'madhu',
             'kitta', 'nagendra', 'shannu', 'sundhar', 'madhu'],
    'age': [20, 21, 22, 21, 23, 19, 20, 21, 23, 18],
    'fee': rng.randint(100000, 150000, 10)
})

print("Original DataFrame")
df


Original DataFrame


Unnamed: 0,name,age,fee
0,chanti,20,102732
1,nagendra,21,143567
2,shannu,22,142613
3,sundhar,21,145891
4,madhu,23,121243
5,kitta,19,130403
6,nagendra,20,132103
7,shannu,21,141993
8,sundhar,23,120757
9,madhu,18,146884


*Cell 2 – Aggregation*

In [2]:
# Aggregation using min, median, and max
df.groupby('name').aggregate(['min', np.median, 'max'])


  df.groupby('name').aggregate(['min', np.median, 'max'])


Unnamed: 0_level_0,age,age,age,fee,fee,fee
Unnamed: 0_level_1,min,median,max,min,median,max
name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
chanti,20,20.0,20,102732,102732.0,102732
kitta,19,19.0,19,130403,130403.0,130403
madhu,18,20.5,23,121243,134063.5,146884
nagendra,20,20.5,21,132103,137835.0,143567
shannu,21,21.5,22,141993,142303.0,142613
sundhar,21,22.0,23,120757,133324.0,145891


In [3]:
# Aggregation with different functions for different columns
df.groupby('name').aggregate({
    'age': 'min',
    'fee': 'max'
})


Unnamed: 0_level_0,age,fee
name,Unnamed: 1_level_1,Unnamed: 2_level_1
chanti,20,102732
kitta,19,130403
madhu,18,146884
nagendra,20,143567
shannu,21,142613
sundhar,21,145891


*Cell 3 – Filtering*

In [4]:
# Define filter function
def filter_func(x):
    return x['fee'].std() > 3

# Standard deviation by group
df.groupby('name').std()


Unnamed: 0_level_0,age,fee
name,Unnamed: 1_level_1,Unnamed: 2_level_1
chanti,,
kitta,,
madhu,3.535534,18130.924976
nagendra,0.707107,8106.27214
shannu,0.707107,438.406204
sundhar,1.414214,17772.421838


In [5]:
# Filtered DataFrame
df.groupby('name').filter(filter_func)


Unnamed: 0,name,age,fee
1,nagendra,21,143567
2,shannu,22,142613
3,sundhar,21,145891
4,madhu,23,121243
6,nagendra,20,132103
7,shannu,21,141993
8,sundhar,23,120757
9,madhu,18,146884


*Cell 4 – Transformation*

In [6]:
# Transform: subtract mean of each group
df.groupby('name').transform(lambda x: x - x.mean())


Unnamed: 0,age,fee
0,0.0,0.0
1,0.5,5732.0
2,0.5,310.0
3,-1.0,12567.0
4,2.5,-12820.5
5,0.0,0.0
6,-0.5,-5732.0
7,-0.5,-310.0
8,1.0,-12567.0
9,-2.5,12820.5


## Conclusion
In this experiment, Pandas `groupby()` was used to perform:
- Aggregation
- Filtering
- Transformation

These operations help in analyzing grouped data efficiently.
