### What is a Pivot Table in Pandas?

A pivot table is used to:

• Summarize large datasets

•Perform aggregation (sum, mean, count, etc.)

• Convert row data into column-based summaries

Produce Excel-like reports

• It answers questions like:

• Average sales per region?

• Total marks per subject?

• Count of employees per department?

### Syntax

pd.pivot_table(
    data,
    values=None,
    index=None,
    columns=None,
    aggfunc='mean',
    fill_value=None,
    margins=False,
    dropna=True,
    margins_name='All',
    observed=False,
    sort=True
)

Why is pivot_table() used?

✔ Data summarization

✔ Data aggregation

✔ Multi-dimensional analysis

✔ Cleaner alternative to groupby()

✔ Excel-style reporting

### Parameters Explained

### 1.data (Required)

The DataFrame to analyze.

e.g

pd.pivot_table(data=df)

In [3]:
# Example 1: Group by one column
import pandas as pd

df = pd.DataFrame({
    'Name': ['Ali', 'Sara', 'Ali', 'John'],
    'City': ['Lahore', 'Karachi', 'Lahore', 'Karachi'],
    'Sales': [200, 300, 150, 400]
})

# Pivot table: group by City
pivot = pd.pivot_table(data=df, index='City', values='Sales', aggfunc='mean')

print(pivot)

         Sales
City          
Karachi  350.0
Lahore   175.0


In [4]:
# Example 2: Group by two columns
pivot = pd.pivot_table(data=df, index='City', columns='Name', values='Sales', aggfunc='sum')
print(pivot)

Name       Ali   John   Sara
City                        
Karachi    NaN  400.0  300.0
Lahore   350.0    NaN    NaN


### 2.values

Column(s) to aggregate.

e.g

values='Sales'


Multiple values:

e.g

values=['Sales', 'Profit']

### Case 1: Single column aggregation

pd.pivot_table(data=df, index='City', values='Sales', aggfunc='sum')

• values='Sales' → tells pandas: only aggregate the Sales column.

• aggfunc='sum' → adds up all the sales for each city.

In [6]:
# Example
import pandas as pd

df = pd.DataFrame({
    'City': ['Lahore', 'Karachi', 'Lahore', 'Karachi'],
    'Sales': [200, 300, 150, 400],
    'Profit': [50, 80, 40, 100]
})

pivot = pd.pivot_table(data=df, index='City', values='Sales', aggfunc='sum')
print(pivot)
# Only the Sales column is aggregated

         Sales
City          
Karachi    700
Lahore     350


### Case 2: Multiple column aggregation

pd.pivot_table(data=df, index='City', values=['Sales', 'Profit'], aggfunc='sum')

• values=['Sales', 'Profit'] → tells pandas: aggregate both Sales and Profit.

• aggfunc='sum' → adds up both columns for each city.

In [7]:
# Example
pivot = pd.pivot_table(data=df, index='City', values=['Sales', 'Profit'], aggfunc='sum')
print(pivot)
#  Now we see two aggregated columns side by side: Sales and Profit.

         Profit  Sales
City                  
Karachi     180    700
Lahore       90    350


### 3.index

Row labels (grouping rows).

index='Region'


Multiple index:

index=['Region', 'Year']

### Case 1: Single index

pd.pivot_table(data=df, index='Region', values='Sales', aggfunc='sum')

• index='Region' → tells pandas: group rows by the Region column.

• Each unique Region becomes a row label in the pivot table.

• values='Sales' → aggregate the Sales column.

• aggfunc='sum' → sum up the sales for each region.

In [8]:
# Example
import pandas as pd

df = pd.DataFrame({
    'Region': ['East', 'West', 'East', 'West', 'North'],
    'Year': [2024, 2024, 2025, 2025, 2025],
    'Sales': [200, 300, 150, 400, 250]
})

pivot = pd.pivot_table(data=df, index='Region', values='Sales', aggfunc='sum')
print(pivot)

        Sales
Region       
East      350
North     250
West      700


### Case 2: Multiple indexes

pd.pivot_table(data=df, index=['Region', 'Year'], values='Sales', aggfunc='sum')

• index=['Region','Year'] → tells pandas: group rows by both Region and Year.

• This creates a hierarchical index (multi-level rows).

• Each combination of Region + Year becomes a row label.

In [9]:
# Example
pivot = pd.pivot_table(data=df, index=['Region','Year'], values='Sales', aggfunc='sum')
print(pivot)
#  Now rows are grouped by Region first, then Year.

             Sales
Region Year       
East   2024    200
       2025    150
North  2025    250
West   2024    300
       2025    400


### 4.columns

Column labels (grouping columns).

columns='Product'

In [10]:
# Example
import pandas as pd

df = pd.DataFrame({
    'Region': ['East', 'East', 'West', 'West', 'North'],
    'Product': ['A', 'B', 'A', 'B', 'A'],
    'Sales': [200, 150, 300, 400, 250]
})

pivot = pd.pivot_table(data=df, index='Region', columns='Product', values='Sales', aggfunc='sum')
print(pivot)

Product      A      B
Region               
East     200.0  150.0
North    250.0    NaN
West     300.0  400.0


### 5.aggfunc (Default: mean)

Aggregation function.

| Function | Purpose |
|----------|---------|
| `mean()` | Average |
| `sum()` | Total |
| `count()` | Count |
| `min()` | Minimum |
| `max()` | Maximum |
| `median()` | Median |
| `np.std()` | Standard Deviation |

###  Multiple functions:
aggfunc=['sum', 'mean']

In [12]:
# Example
import pandas as pd

df = pd.DataFrame({
    'Region': ['East', 'West', 'East', 'West', 'North'],
    'Sales': [200, 300, 150, 400, 250],
    'Profit': [50, 80, 40, 100, 60]
})

# Pivot table with multiple aggfuncs
pivot = pd.pivot_table(data=df, index='Region', values='Sales', aggfunc=['sum', 'mean'])
print(pivot)

         sum   mean
       Sales  Sales
Region             
East     350  175.0
North    250  250.0
West     700  350.0


### 6.fill_value

Replace missing (NaN) values.

fill_value=0

In [13]:
# Example
import pandas as pd

df = pd.DataFrame({
    'Region': ['East', 'East', 'West', 'North'],
    'Product': ['A', 'B', 'A', 'A'],
    'Sales': [200, 150, 300, 250]
})

pivot = pd.pivot_table(
    data=df,
    index='Region',
    columns='Product',
    values='Sales',
    aggfunc='sum',
    fill_value=0
)

print(pivot)

Product    A    B
Region           
East     200  150
North    250    0
West     300    0


### 7.margins (Default: False)

Adds row & column totals.

margins=True

In [14]:
# Example
import pandas as pd

df = pd.DataFrame({
    'Region': ['East', 'East', 'West', 'West', 'North'],
    'Product': ['A', 'B', 'A', 'B', 'A'],
    'Sales': [200, 150, 300, 400, 250]
})

pivot = pd.pivot_table(
    data=df,
    index='Region',
    columns='Product',
    values='Sales',
    aggfunc='sum',
    margins=True
)

print(pivot)

Product      A      B   All
Region                     
East     200.0  150.0   350
North    250.0    NaN   250
West     300.0  400.0   700
All      750.0  550.0  1300


### 8.margins_name (Default: 'All')

Custom name for totals row/column.

margins_name='Total'

In [15]:
# Example
import pandas as pd

df = pd.DataFrame({
    'Region': ['East', 'East', 'West', 'West', 'North'],
    'Product': ['A', 'B', 'A', 'B', 'A'],
    'Sales': [200, 150, 300, 400, 250]
})

pivot = pd.pivot_table(
    data=df,
    index='Region',
    columns='Product',
    values='Sales',
    aggfunc='sum',
    margins=True,
    margins_name='Total'
)

print(pivot)

Product      A      B  Total
Region                      
East     200.0  150.0    350
North    250.0    NaN    250
West     300.0  400.0    700
Total    750.0  550.0   1300


### 9.dropna (Default: True)

• True → ignores columns with all NaN

• False → keeps them

dropna=False

In [17]:
# Example
import pandas as pd

df = pd.DataFrame({
    'Region': ['East', 'West', None, 'West'],
    'Sales': [200, 300, 150, 400]
})

# Pivot table with dropna=False
pivot = pd.pivot_table(data=df, index='Region', values='Sales', aggfunc='sum', dropna=False)
print(pivot)

        Sales
Region       
East      200
West      700
NaN       150


### 10.observed (Default: False)

For categorical data:

• False → show all categories

• True → show only observed values

observed=True

In [18]:
# Example
import pandas as pd

# Define categorical column with all possible categories
df = pd.DataFrame({
    'Region': pd.Categorical(['East', 'West'], categories=['East','West','North']),
    'Product': ['A','B'],
    'Sales': [200,300]
})

# Pivot with observed=False (default)
pivot1 = pd.pivot_table(data=df, index='Region', columns='Product', values='Sales', aggfunc='sum', observed=False)

# Pivot with observed=True
pivot2 = pd.pivot_table(data=df, index='Region', columns='Product', values='Sales', aggfunc='sum', observed=True)

print("Observed=False:\n", pivot1, "\n")
print("Observed=True:\n", pivot2)

Observed=False:
 Product    A    B
Region           
East     200    0
West       0  300
North      0    0 

Observed=True:
 Product      A      B
Region               
East     200.0    NaN
West       NaN  300.0


### 11.sort (Default: True)

Sorts result by index.

sort=False

In [19]:
# Example
import pandas as pd

df = pd.DataFrame({
    'Region': ['West', 'East', 'North', 'East', 'West'],
    'Sales': [300, 200, 250, 150, 400]
})

# Pivot with sort=True (default)
pivot1 = pd.pivot_table(data=df, index='Region', values='Sales', aggfunc='sum', sort=True)

# Pivot with sort=False
pivot2 = pd.pivot_table(data=df, index='Region', values='Sales', aggfunc='sum', sort=False)

print("Sort=True:\n", pivot1, "\n")
print("Sort=False:\n", pivot2)

Sort=True:
         Sales
Region       
East      350
North     250
West      700 

Sort=False:
         Sales
Region       
West      700
East      350
North     250


### Complete Example

In [20]:
import pandas as pd

data = {
    'Region': ['East', 'West', 'East', 'West', 'East'],
    'Product': ['A', 'A', 'B', 'B', 'A'],
    'Sales': [200, 150, 300, 250, 400]
}

df = pd.DataFrame(data)

pivot = pd.pivot_table(
    df,
    values='Sales',
    index='Region',
    columns='Product',
    aggfunc='sum',
    fill_value=0,
    margins=True
)

print(pivot)

Product    A    B   All
Region                 
East     600  300   900
West     150  250   400
All      750  550  1300


### pivot() vs pivot_table()
| Feature | `pivot()` | `pivot_table()` |
|---------|-----------|-----------------|
| Aggregation | No | Yes |
| Handles Duplicates | Error (fails) | Yes (aggregates) |
| Missing Values | Not handled | Can be filled |
| Best For | Unique data pairs | Real-world data with duplicates |

Always prefer pivot_table()