<a href="https://colab.research.google.com/github/Tanu-N-Prabhu/Python/blob/master/Data%20Analysis/Level%202/groupby_and_pivot_tables.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **GroupBy and Pivot Tables**

_Use `groupby()` to split-apply-combine data, and `pivot_table()` to reshape and summarize_

## What You'll Learn
- How to use `groupby()` to compute grouped statistics
- How to create pivot tables for aggregated summaries
- Differences and use-cases for GroupBy vs Pivot Table



## Import Libraries

In [2]:
import pandas as pd
import numpy as np

## Sample Dataset

In [3]:
data = {
    'Department': ['Sales', 'Sales', 'HR', 'HR', 'IT', 'IT', 'Sales'],
    'Employee': ['Alice', 'Bob', 'Carol', 'David', 'Eve', 'Frank', 'Grace'],
    'Salary': [50000, 55000, 48000, 52000, 60000, 61000, 53000],
    'Years_Experience': [2, 3, 4, 2, 5, 6, 3]
}

df = pd.DataFrame(data)
df

Unnamed: 0,Department,Employee,Salary,Years_Experience
0,Sales,Alice,50000,2
1,Sales,Bob,55000,3
2,HR,Carol,48000,4
3,HR,David,52000,2
4,IT,Eve,60000,5
5,IT,Frank,61000,6
6,Sales,Grace,53000,3


## 1. groupby() - Split, Apply, Combine

**Group by a single column:**

In [4]:
df.groupby('Department')['Salary'].mean()

Unnamed: 0_level_0,Salary
Department,Unnamed: 1_level_1
HR,50000.0
IT,60500.0
Sales,52666.666667


**Group by multiple columns:**

In [5]:
df.groupby(['Department', 'Years_Experience'])['Salary'].mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,Salary
Department,Years_Experience,Unnamed: 2_level_1
HR,2,52000.0
HR,4,48000.0
IT,5,60000.0
IT,6,61000.0
Sales,2,50000.0
Sales,3,54000.0


**Apply multiple aggregation functions:**

In [6]:
df.groupby('Department').agg({
    'Salary': ['mean', 'max', 'min'],
    'Years_Experience': 'median'
})

Unnamed: 0_level_0,Salary,Salary,Salary,Years_Experience
Unnamed: 0_level_1,mean,max,min,median
Department,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
HR,50000.0,52000,48000,3.0
IT,60500.0,61000,60000,5.5
Sales,52666.666667,55000,50000,3.0


## 2. pivot_table() - Flexible Table Generator

In [7]:
pd.pivot_table(df,
               values='Salary',
               index='Department',
               columns='Years_Experience',
               aggfunc='mean',
               fill_value=0)

Years_Experience,2,3,4,5,6
Department,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
HR,52000.0,0.0,48000.0,0.0,0.0
IT,0.0,0.0,0.0,60000.0,61000.0
Sales,50000.0,54000.0,0.0,0.0,0.0


**Pivot table with multiple values:**

In [8]:
pd.pivot_table(df,
               values=['Salary', 'Years_Experience'],
               index='Department',
               aggfunc={'Salary': 'mean', 'Years_Experience': 'max'})

Unnamed: 0_level_0,Salary,Years_Experience
Department,Unnamed: 1_level_1,Unnamed: 2_level_1
HR,50000.0,4
IT,60500.0,6
Sales,52666.666667,3


## Key Differences

| Feature       | `groupby()`                  | `pivot_table()`                            |
|:------------- |:---------------------------- |:------------------------------------------ |
| Use Case      | Grouping + aggregation logic | Tabular summarization                      |
| Output Format | Series or DataFrame          | Matrix-style DataFrame                     |
| Flexibility   | High                         | Higher for tables (index/columns/aggfuncs) |
| Null Handling | Keeps NaNs                   | Can handle NaNs using `fill_value`         |


## Practice Exercise
Try this on your own:

```
# Sample challenge:
# Create a pivot table showing mean salary by Department and Years_Experience
```

## Summary
- `groupby()` is great for programmatic analysis
- `pivot_table()` is ideal for presentation-ready summaries

