# Pandas Crosstab Tutorial
The `pd.crosstab()` function is a powerful tool for analyzing the relationship between two categorical variables. It computes a frequency table (also known as a contingency table) that displays the distribution of data.

In [2]:
import pandas as pd

### 1. Load Data
We start by loading a demographics dataset containing information about individuals' Nationality, Sex, Age, and Handedness.

In [4]:
df = pd.read_excel("demographics.xlsx")
df.head()

Unnamed: 0,Name,Nationality,Sex,Age,Handedness
0,Kathy,USA,Female,23,Right
1,Linda,USA,Female,18,Right
2,Peter,USA,Male,19,Right
3,John,USA,Male,22,Left
4,Fatima,Bangadesh,Female,31,Left


### 2. Basic Crosstab
We want to see the frequency distribution of **Handedness** across different **Nationalities**.
* The first argument (`df.Nationality`) becomes the rows (index).
* The second argument (`df.Handedness`) becomes the columns.

In [None]:
pd.crosstab(df.Nationality, df.Handedness)

Handedness,Left,Right
Nationality,Unnamed: 1_level_1,Unnamed: 2_level_1
Bangadesh,2,0
China,2,1
India,2,1
USA,1,3


### 3. Variable Comparison
Here we check the distribution of Handedness based on **Sex** (Male vs. Female).

In [6]:
pd.crosstab(df.Sex, df.Handedness)

Handedness,Left,Right
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,2,3
Male,5,2


### 4. Margins (Subtotals)
By adding `margins=True`, Pandas automatically calculates the row and column subtotals (labeled as "All"). This is useful for seeing the total population size alongside the breakdown.

In [8]:
pd.crosstab(df.Nationality, df.Handedness, margins=True)

Handedness,Left,Right,All
Nationality,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bangadesh,2,0,2
China,2,1,3
India,2,1,3
USA,1,3,4
All,7,5,12


### 5. Multi-Level Columns
You are not limited to one variable per axis. Here, we pass a **list** of variables `[df.Handedness, df.Nationality]` to the columns argument.
* This creates a hierarchical column structure, breaking down Handedness further by Nationality for each Sex.

In [10]:
pd.crosstab(df.Sex, [df.Handedness, df.Nationality], margins=True)

Handedness,Left,Left,Left,Left,Right,Right,Right,All
Nationality,Bangadesh,China,India,USA,China,India,USA,Unnamed: 8_level_1
Sex,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
Female,1,1,0,0,1,0,2,5
Male,1,1,2,1,0,1,1,7
All,2,2,2,1,1,1,3,12


### 6. Multi-Level Rows
Similarly, we can pass a list to the index argument to create hierarchical rows. This groups the data first by **Nationality**, and then by **Sex**.

In [11]:
pd.crosstab([df.Nationality, df.Sex], df.Handedness, margins=True)

Unnamed: 0_level_0,Handedness,Left,Right,All
Nationality,Sex,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bangadesh,Female,1,0,1
Bangadesh,Male,1,0,1
China,Female,1,1,2
China,Male,1,0,1
India,Male,2,1,3
USA,Female,0,2,2
USA,Male,1,1,2
All,,7,5,12


### 7. Normalization (Percentages)
Instead of raw counts, we often want to see percentages.
* `normalize='index'`: Calculates the percentage based on the row total.
* In this example, it shows the probability of being Left or Right-handed *given* a specific Sex (e.g., if you are Male, what is the % chance you are Left-handed?).

In [None]:
pd.crosstab(df.Sex, df.Handedness, normalize='index')

Handedness,Left,Right
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,0.4,0.6
Male,0.714286,0.285714
