# Pivot Tables and Cross tabs

### Pivot Tables

                            pd.pivot_table(df, index = "..", columns = "..", values = ["..", "..", etc], aggfunc = ["mean", "median", etc])

A pivot table is used to reorganize and summarize a DataFrame. It only provides specified aggregate values of (values = ["..", "..", etc])

It provide mean value by default if not mentioned inside aggfunc parameter

With it, we can:

    Group data

    Aggregate (mean, sum, etc.)

    Reshape your table


When we want to create our own Dataframe by targeting specific index, values, columns, then we use pivot tables for it

In [28]:
import numpy as np
import pandas as pd

In [29]:
data = {
    'Date': pd.date_range('2023-01-01', periods=20),
    'Product': ['A', 'B', 'C', 'D'] * 5,
    'Region': ['East', 'West', 'North', 'South', 'East', 'West', 'North', 'South', 'East', 'West',
               'North', 'South', 'East', 'West', 'North', 'South', 'East', 'West', 'North', 'South'],
    'Sales': np.random.randint(100, 1000, 20),
    'Units': np.random.randint(10, 100, 20),
    'Rep': ['John', 'Mary', 'Bob', 'Alice', 'John', 'Mary', 'Bob', 'Alice', 'John', 'Mary',
            'Bob', 'Alice', 'John', 'Mary', 'Bob', 'Alice', 'John', 'Mary', 'Bob', 'Alice']
}

df = pd.DataFrame(data)
df['Month'] = df['Date'].dt.month_name()
df['Quarter'] = 'Q' + df['Date'].dt.quarter.astype(str)
df

Unnamed: 0,Date,Product,Region,Sales,Units,Rep,Month,Quarter
0,2023-01-01,A,East,803,55,John,January,Q1
1,2023-01-02,B,West,479,59,Mary,January,Q1
2,2023-01-03,C,North,628,99,Bob,January,Q1
3,2023-01-04,D,South,774,82,Alice,January,Q1
4,2023-01-05,A,East,582,66,John,January,Q1
5,2023-01-06,B,West,162,53,Mary,January,Q1
6,2023-01-07,C,North,585,49,Bob,January,Q1
7,2023-01-08,D,South,375,28,Alice,January,Q1
8,2023-01-09,A,East,355,80,John,January,Q1
9,2023-01-10,B,West,252,89,Mary,January,Q1


In [30]:
pivot1 = pd.pivot_table(df, values = "Sales", index = "Region", columns = "Product", aggfunc = ["median", "sum"])
pivot1

Unnamed: 0_level_0,median,median,median,median,sum,sum,sum,sum
Product,A,B,C,D,A,B,C,D
Region,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
East,426.0,,,,2409.0,,,
North,,,628.0,,,,3378.0,
South,,,,619.0,,,,3055.0
West,,252.0,,,,1420.0,,


In [31]:
pivot2 = pd.pivot_table(df, values = ["Sales", "Units"], index = "Region", columns = "Product")
pivot2

Unnamed: 0_level_0,Sales,Sales,Sales,Sales,Units,Units,Units,Units
Product,A,B,C,D,A,B,C,D
Region,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
East,481.8,,,,72.6,,,
North,,,675.6,,,,48.8,
South,,,,611.0,,,,51.2
West,,284.0,,,,53.6,,


In [32]:
data = {
    'Name': ['Amy', 'Bob', 'Cara', 'Amy', 'Bob', 'Cara'],
    'Subject': ['Math', 'Math', 'Math', 'Science', 'Science', 'Science'],
    'Score': [90, 80, 85, 95, 75, 88]
}
df2 = pd.DataFrame(data)
df2

Unnamed: 0,Name,Subject,Score
0,Amy,Math,90
1,Bob,Math,80
2,Cara,Math,85
3,Amy,Science,95
4,Bob,Science,75
5,Cara,Science,88


In [33]:
pivot3 = pd.pivot_table(df2, index="Name", columns = "Subject", values = "Score")
pivot3

Subject,Math,Science
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Amy,90.0,95.0
Bob,80.0,75.0
Cara,85.0,88.0


### Cross Tabs

A crosstab is like a pivot table but easier — used to count how many times things happen

It's like pivot table, But it's used to find counting value of something unlike pivot table is used to find aggregate values

In [34]:
df

Unnamed: 0,Date,Product,Region,Sales,Units,Rep,Month,Quarter
0,2023-01-01,A,East,803,55,John,January,Q1
1,2023-01-02,B,West,479,59,Mary,January,Q1
2,2023-01-03,C,North,628,99,Bob,January,Q1
3,2023-01-04,D,South,774,82,Alice,January,Q1
4,2023-01-05,A,East,582,66,John,January,Q1
5,2023-01-06,B,West,162,53,Mary,January,Q1
6,2023-01-07,C,North,585,49,Bob,January,Q1
7,2023-01-08,D,South,375,28,Alice,January,Q1
8,2023-01-09,A,East,355,80,John,January,Q1
9,2023-01-10,B,West,252,89,Mary,January,Q1


In [35]:
pd.crosstab(df["Region"], df["Product"])

Product,A,B,C,D
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
East,5,0,0,0
North,0,0,5,0
South,0,0,0,5
West,0,5,0,0
