# **Pivot Table**

* Reshapes data from **long format** to **wide format**.
* Aggregates values (sum, mean, etc.) across multiple dimensions.
* Similar to Excel Pivot Table.

In [31]:
import pandas as pd
import numpy as np

In [2]:
df=pd.DataFrame({"Days":[1,2,3,4,5,6],"Name":['Haroon','Abbas','Ali','Sara','Junaid','Hurain'],"Eng":[12,34,23,53,64,63],"Maths":[33,52,51,20,83,24],"Science":[53,45,24,64,22,74]})
df

Unnamed: 0,Days,Name,Eng,Maths,Science
0,1,Haroon,12,33,53
1,2,Abbas,34,52,45
2,3,Ali,23,51,24
3,4,Sara,53,20,64
4,5,Junaid,64,83,22
5,6,Hurain,63,24,74


## `pivot()` ‚Äî **Simple reshaping**

**Basic Syntax :**
```python
df.pivot(index='row', columns='column', values='value')
```
**Work:**
* Converts long ‚Üí wide
* NO aggregation
* No Duplicates
* Requires unique index‚Äìcolumn pairs

#### **Pivot Funtion  Features**

| Parameter    | Use                                    |
| ------------ | -------------------------------------- |
| `index`      | Row grouping                           |
| `columns`    | Column grouping                        |
| `values`     | Column to aggregate                    |

In [3]:
df.pivot(index='Days',columns='Name',values=['Maths'])

Unnamed: 0_level_0,Maths,Maths,Maths,Maths,Maths,Maths
Name,Abbas,Ali,Haroon,Hurain,Junaid,Sara
Days,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
1,,,33.0,,,
2,52.0,,,,,
3,,51.0,,,,
4,,,,,,20.0
5,,,,,83.0,
6,,,,24.0,,


### ‚ùå When pivot() FAILS

In [4]:
# df = pd.DataFrame({
#     'Name': ['Ali','Ali'],
#     'Subject': ['Math','Math'],
#     'Marks': [80,85]
# })

# df.pivot(index='Name', columns='Subject', values='Marks')

# ‚ùå Error: duplicate entries, cannot reshape

## `pivot_table` ‚Äî **Smart & powerful**

**Basic Syntax :**
```python
pd.pivot_table(
    df,
    index='row',
    columns='column',
    values='value',
    aggfunc='mean'
)
```
**Work :**
* Converts long ‚Üí wide
* Handles duplicates
* Aggregates values (mean, sum, etc.)
* Excel-like pivot table

#### **Pivot Table Function Features**
| Parameter      | Type                   | Default  | What it Does                  | When You Use It         |
| -------------- | ---------------------- | -------- | ----------------------------- | ----------------------- |
| `data`         | DataFrame              | required | Source DataFrame              | Always                  |
| `values`       | str / list             | None     | Column(s) to aggregate        | Numeric columns         |
| `index`        | str / list             | None     | Rows grouping                 | Categories (rows)       |
| `columns`      | str / list             | None     | Columns grouping              | Categories (columns)    |
| `aggfunc`      | function / list / dict | `'mean'` | Aggregation function          | Sum, count, max, etc    |
| `fill_value`   | scalar                 | None     | Replace missing values        | Clean NaNs              |
| `margins`      | bool                   | False    | Adds totals (row & column)    | Grand totals            |
| `margins_name` | str                    | `'All'`  | Name of total row/column      | Rename totals           |
| `dropna`       | bool                   | True     | Drop columns with all NaN     | Keep empty groups       |
| `observed`     | bool                   | False    | Show only observed categories | Categorical data        |
| `sort`         | bool                   | True     | Sort result                   | Disable for performance |
|

In [5]:
df = pd.DataFrame({
    'Name': ['Ali','Ali'],
    'Subject': ['Math','Math'],
    'Marks': [80,85]
})

pd.pivot_table(df, index='Name', columns='Subject', values='Marks')

# ‚úî Output: average marks = 82.5 


Subject,Math
Name,Unnamed: 1_level_1
Ali,82.5


In [6]:
import pandas as pd

df = pd.DataFrame({
    'Dept': ['IT','IT','HR','HR','Finance','Finance'],
    'Name': ['Ali','Ahmed','Sara','Usman','Zara','Nida'],
    'Salary': [50000,55000,60000,58000,70000,68000],
    'Experience': [2,3,5,4,7,8]
})

# Pivot table: average salary per Dept
pd.pivot_table(df, index='Dept', values='Salary', aggfunc='mean')


Unnamed: 0_level_0,Salary
Dept,Unnamed: 1_level_1
Finance,69000.0
HR,59000.0
IT,52500.0


---
# **Practice Problems**


## üîπ PIVOT PROBLEMS

---

### üß† Problem 1

Using the melted DataFrame, pivot it back to:

> Product as rows
> Month as columns
> Units_Sold as values


In [7]:
df = pd.DataFrame({
    'Product': ['Pen', 'Book'],
    'Sales_2023_Jan': [100, 200],
    'Sales_2023_Feb': [150, 250]
})
melted=pd.melt(df,id_vars=['Product'],var_name='Month',value_name='Sales')
melted['Month']=melted['Month'].str.split('_').str[-1]
melted

Unnamed: 0,Product,Month,Sales
0,Pen,Jan,100
1,Book,Jan,200
2,Pen,Feb,150
3,Book,Feb,250


In [8]:
melted.pivot(index='Product', columns='Month', values='Sales')

Month,Feb,Jan
Product,Unnamed: 1_level_1,Unnamed: 2_level_1
Book,250,200
Pen,150,100


### üß† Problem 2

What error do you get if duplicate values exist when using `pivot()`?
Explain **why**.


##### ‚ùå Error: duplicate entries, cannot reshape, `pivot()` supports **NO** duplicate entries and **NO** aggregation

---
## üîπ PIVOT_TABLE PROBLEMS

---

In [9]:
transactions = pd.DataFrame({
    'Dept': ['IT', 'IT', 'HR', 'HR', 'HR'],
    'Employee': ['Ali', 'Ahmed', 'Sara', 'Usman', 'Zara'],
    'Month': ['Jan', 'Jan', 'Jan', 'Feb', 'Feb'],
    'Sales': [1000, 1500, 800, 1200, 900]
})

### üß† Problem 3

Create a pivot table that shows:

> Total Sales per Dept per Month


In [10]:
pd.pivot_table(transactions,columns=['Month'],values=['Sales'],aggfunc='sum')


Month,Feb,Jan
Sales,2100,3300


### üß† Problem 4

Modify the pivot table to show **average sales** instead of sum.


In [11]:
pd.pivot_table(transactions,columns=['Month'],values=['Sales'],aggfunc='mean')

Month,Feb,Jan
Sales,1050.0,1100.0


### üß† Problem 5

Add **Grand Total rows and columns**.


In [12]:
pivot=transactions.pivot_table(index='Dept',columns='Month',values='Sales',aggfunc='sum',margins=True,margins_name='Grand Total')
pivot

Month,Feb,Jan,Grand Total
Dept,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
HR,2100.0,800.0,2900
IT,,2500.0,2500
Grand Total,2100.0,3300.0,5400


### üß† Problem 6

Handle missing combinations using a fill value of `0`.


In [13]:
transactions.pivot_table(index='Dept',columns='Month',values='Sales',aggfunc='sum',margins=True,margins_name='Grand Total',fill_value=0)

Month,Feb,Jan,Grand Total
Dept,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
HR,2100,800,2900
IT,0,2500,2500
Grand Total,2100,3300,5400


### üß† Problem 7 (Concept Check)

Explain **in your own words**:

> When should you use `melt` instead of `pivot_table`?


---
# **HARD PRACTICE PROBLEMS**

## üîπ Problem 1: Duplicate Handling (Silent Aggregation)

```python
df = pd.DataFrame({
    'Region': ['North', 'North', 'South', 'South'],
    'Product': ['Pen', 'Pen', 'Pen', 'Book'],
    'Sales': [100, 150, 200, 300]
})
```

üëâ Create a pivot showing **total sales** per `Region √ó Product`.

‚ö†Ô∏è Why tricky: duplicates exist.


In [14]:
df = pd.DataFrame({
    'Region': ['North', 'North', 'South', 'South'],
    'Product': ['Pen', 'Pen', 'Pen', 'Book'],
    'Sales': [100, 150, 200, 300]
})
pd.pivot_table(df,index='Region',columns='Product',values='Sales',aggfunc='sum')

Product,Book,Pen
Region,Unnamed: 1_level_1,Unnamed: 2_level_1
North,,250.0
South,300.0,200.0


## üîπ Problem 2: Multiple Aggregations

```python
df = pd.DataFrame({
    'Dept': ['IT', 'IT', 'HR', 'HR'],
    'Employee': ['A', 'B', 'C', 'D'],
    'Salary': [50000, 60000, 55000, 58000]
})
```
üëâ Pivot showing **mean and max salary** per department.

In [None]:
df = pd.DataFrame({
    'Dept': ['IT', 'IT', 'HR', 'HR'],
    'Employee': ['A', 'B', 'C', 'D'],
    'Salary': [50000, 60000, 55000, 58000]
})
pd.pivot_table(df,index='Dept',values='Salary',aggfunc=['mean','max'])

Unnamed: 0_level_0,mean,max
Unnamed: 0_level_1,Salary,Salary
Dept,Unnamed: 1_level_2,Unnamed: 2_level_2
HR,56500.0,58000
IT,55000.0,60000



## üîπ Problem 3: Missing Category Combinations

```python
df = pd.DataFrame({
    'City': ['Karachi', 'Karachi', 'Lahore'],
    'Year': [2023, 2024, 2023],
    'Sales': [200, 250, 300]
})
```

üëâ Pivot with **City as rows**, **Year as columns**,
**include missing combinations filled with 0**.


In [16]:
df = pd.DataFrame({
    'City': ['Karachi', 'Karachi', 'Lahore'],
    'Year': [2023, 2024, 2023],
    'Sales': [200, 250, 300]
})
pd.pivot_table(df,index='City',columns='Year',values='Sales',fill_value=0)

Year,2023,2024
City,Unnamed: 1_level_1,Unnamed: 2_level_1
Karachi,200.0,250.0
Lahore,300.0,0.0



## üîπ Problem 4: Margins + Rename Totals

```python
df = pd.DataFrame({
    'Category': ['A', 'A', 'B', 'B'],
    'Value': [10, 20, 30, 40]
})
```

üëâ Pivot with **category-wise sum** and **grand total named "Total"**.


In [17]:
df = pd.DataFrame({
    'Category': ['A', 'A', 'B', 'B'],
    'Value': [10, 20, 30, 40]
})
df.pivot_table(index='Category',values='Value',aggfunc='sum',margins=True,margins_name='Total')

Unnamed: 0_level_0,Value
Category,Unnamed: 1_level_1
A,30
B,70
Total,100


## üîπ Problem 5: Multiple `values`

```python
df = pd.DataFrame({
    'Region': ['North', 'South'],
    'Sales': [200, 300],
    'Profit': [50, 80]
})
```

üëâ Pivot showing **both Sales and Profit totals** by region.


In [26]:
df = pd.DataFrame({
    'Region': ['North', 'South'],
    'Sales': [200, 300],
    'Profit': [50, 80]
})
pd.pivot_table(df,index='Region',values=['Sales','Profit'],aggfunc='sum')

Unnamed: 0_level_0,Profit,Sales
Region,Unnamed: 1_level_1,Unnamed: 2_level_1
North,50,200
South,80,300


## üîπ Problem 6: Count vs Size Trap

```python
df = pd.DataFrame({
    'Dept': ['IT', 'IT', 'HR'],
    'Bonus': [1000, None, 2000]
})
```

üëâ Create a pivot showing:

* total rows per dept
* non-null bonuses per dept

‚ö†Ô∏è count ‚â† size


In [36]:
df = pd.DataFrame({
    'Dept': ['IT', 'IT', 'HR'],
    'Bonus': [1000, None, 2000]
})
pd.pivot_table(df,index='Dept',values='Bonus',aggfunc={'Bonus':'count','Dept':'size'})

#! count() ignores NaN and Duplicate, size() does not ignore duplicate.

Unnamed: 0_level_0,Bonus,Dept
Dept,Unnamed: 1_level_1,Unnamed: 2_level_1
HR,1,1
IT,1,2



## üîπ Problem 7: Categorical + observed Parameter

```python
df = pd.DataFrame({
    'City': pd.Categorical(['Karachi', 'Lahore']),
    'Product': pd.Categorical(['Pen', 'Book']),
    'Sales': [100, 200]
})
```

üëâ Pivot showing **only observed combinations**.


In [28]:
df = pd.DataFrame({
    'City': pd.Categorical(['Karachi', 'Lahore']),
    'Product': pd.Categorical(['Pen', 'Book']),
    'Sales': [100, 200]
})
df.pivot_table(index='City',columns='Product',values='Sales',aggfunc='sum',observed=True)

Product,Book,Pen
City,Unnamed: 1_level_1,Unnamed: 2_level_1
Karachi,,100.0
Lahore,200.0,


## üîπ Problem 8: Sorting Disabled (Performance Awareness)

```python
df = pd.DataFrame({
    'Team': ['A', 'B', 'A', 'B'],
    'Score': [80, 70, 90, 60]
})
```

üëâ Pivot with **mean score per team**,
disable sorting.


In [21]:
df = pd.DataFrame({
    'Team': ['A', 'B', 'A', 'B'],
    'Score': [80, 70, 90, 60]
})
pd.pivot_table(df,index='Team',values='Score',aggfunc='mean',sort=False)

Unnamed: 0_level_0,Score
Team,Unnamed: 1_level_1
A,85.0
B,65.0


## üîπ Problem 9: Nested Grouping (MultiIndex Result)

```python
df = pd.DataFrame({
    'Year': [2023, 2023, 2024],
    'Quarter': ['Q1', 'Q2', 'Q1'],
    'Revenue': [1000, 1500, 2000]
})
```

üëâ Pivot with:

* rows ‚Üí Year
* columns ‚Üí Quarter
* values ‚Üí sum of Revenue


In [37]:
df = pd.DataFrame({
    'Year': [2023, 2023, 2024],
    'Quarter': ['Q1', 'Q2', 'Q1'],
    'Revenue': [1000, 1500, 2000]
})
df.pivot_table(index='Year',columns='Quarter',values='Revenue',aggfunc='sum')

Quarter,Q1,Q2
Year,Unnamed: 1_level_1,Unnamed: 2_level_1
2023,1000.0,1500.0
2024,2000.0,


## üîπ Problem 10: Advanced ‚Äì Dictionary `aggfunc`

```python
df = pd.DataFrame({
    'Dept': ['IT', 'IT', 'HR'],
    'Salary': [50000, 60000, 55000],
    'Bonus': [5000, 6000, 4000]
})
```

üëâ Pivot per department with:

* Salary ‚Üí mean
* Bonus ‚Üí sum

‚ö†Ô∏è Uses dict aggfunc.



In [40]:
df = pd.DataFrame({
    'Dept': ['IT', 'IT', 'HR'],
    'Salary': [50000, 60000, 55000],
    'Bonus': [5000, 6000, 4000]
})
pd.pivot_table(df,index='Dept',values=['Salary','Bonus'],aggfunc={'Salary':'mean','Bonus':'sum'})

Unnamed: 0_level_0,Bonus,Salary
Dept,Unnamed: 1_level_1,Unnamed: 2_level_1
HR,4000,55000.0
IT,11000,55000.0
