### Day 17: Pivot Tables with Pandas

**Summary:** Analyzing survival patterns in the Titanic dataset using pivot tables. The focus is on understanding how survival varies by passenger class and gender. Multiple aggregations are performed to compute mean survival rates, passenger counts, and total survivors to reveal clear demographic survival patterns.


**Goal**: To summarize Titanic survival data by class and gender and to use Pivot tables to condense raw data into insights, similar to Excel’s pivot functionality but with Pandas power.

In [8]:
import pandas as pd
df= pd.read_csv('../titanic-analysis/datasets/cleaned_titanic.csv')

### Inspecting basic columns

In [4]:
df.info
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


### Simple Pivot Table that shows Survival Rate by Class

In [5]:
pivot1= df.pivot_table(
    index='Pclass',
    values='Survived',
      aggfunc='mean'
)
print('Survival rate by class')
print(pivot1)

Survival rate by class
        Survived
Pclass          
1       0.629630
2       0.472826
3       0.242363


### **Survival Rate by Passenger Class**

#### **What I Did**

I created a pivot table grouped by `Pclass` to compute the mean survival rate for each passenger class.This used the `pivot_table()` function with `'Survived'` as the value and `'mean'` as the aggregation method.
 #### **What the Output Shows**
The output displays the average survival rate for passengers in each class (1st, 2nd, 3rd). 
These numbers represent the percentage of passengers who survived within each class.

#### **Insights**

Higher survival rates typically appear in 1st class and lowest in 3rd class, reflecting socioeconomic disparities in access to lifeboats.

### Survival Rates (Proportion) by Passenger class and Sex
 
This shows the percentage of survivors in each group.

In [6]:
pivot2= df.pivot_table(
    index='Pclass',
    values='Survived',
    columns='Sex',
    aggfunc='mean')
print('Survival rate by Passenger class and Sex ')
print(pivot2)

Survival rate by Passenger class and Sex 
Sex       female      male
Pclass                    
1       0.968085  0.368852
2       0.921053  0.157407
3       0.500000  0.135447


### **Survival Rate by Passenger Class and Gender**

#### **What I Did**

I generated a second pivot table grouping passengers by both `Pclass` and `Sex`.
The table calculates mean survival rates to compare survival differences across class and gender.

#### **What the Output Shows**

- The table shows survival percentages for males and females in 1st, 2nd, and 3rd class.
- It presents a clear breakdown of how gender interacts with class in determining survival chances.
- These numbers are proportions (0–1). Multiply by 100 to get %.

#### **Insights**

Female passengers across all classes tend to have significantly higher survival rates than males, reinforcing the “women and children first” evacuation pattern.

### Pivot with counts: number of passengers by Class and Gender

In [14]:
pivot3= df.pivot_table(
    index='Pclass',
    columns='Sex',
    values='Survived',
    aggfunc='count')

print('Number of passengers in each class by gender')
print(pivot3)

Number of passengers in each class by gender
Sex     female  male
Pclass              
1           94   122
2           76   108
3          144   347


### **Passenger Counts by Class and Gender**

#### **What I Did**
- I created a pivot table that counts the number of passengers (`aggfunc='count'`) grouped by `Pclass` and `Sex`.
- This shows the distribution of passengers across gender and class categories.
  
#### **What the Output Shows**
- The output displays the total number of male and female passengers in each class.
- It helps understand dataset composition before interpreting survival rates.

#### **Insights**

3rd class usually contains the largest group of passengers, especially males, which helps explain lower overall survival rates for that class.


### Total Survivors by Class and Gender

Now let’s count how many actually survived.

In [10]:
pivot_survivors = df.pivot_table(
    index='Pclass',
    columns='Sex',
    values='Survived',
    aggfunc='sum'
    )
print("Total Survivors by Class and Gender:")
print(pivot_survivors)

Total Survivors by Class and Gender:
Sex     female  male
Pclass              
1           91    45
2           70    17
3           72    47


**Total Survivors by Class and Gender**

#### **What I Did**
I created a pivot table using `aggfunc='sum'` to calculate the total number of survivors grouped by class and gender.This shows the absolute count of survivors instead of averages.

#### **What the Output Shows**
The output shows how many male and female passengers survived in each class.Comparing this with passenger counts provides insight into survival proportions.

#### **Insights**
Female survivors outnumber male survivors in every class, and 1st class shows the highest number of survivors overall relative to its size.

### Side by side Comparison

In [13]:
print("--- Passenger Counts ---", pivot3)
print("\n--- Total Survivors ---", pivot_survivors),
print("\n--- Survival Rates ---", pivot2)

--- Passenger Counts --- Sex     female  male
Pclass              
1           94   122
2           76   108
3          144   347

--- Total Survivors --- Sex     female  male
Pclass              
1           91    45
2           70    17
3           72    47

--- Survival Rates --- Sex       female      male
Pclass                    
1       0.968085  0.368852
2       0.921053  0.157407
3       0.500000  0.135447


## Refelctions

- Use `aggfunc=['mean','sum']` or multiple values for deeper analysis.
- Missing combinations default to `NaN`; use `fill_value=0` to replace them.
- Pivot tables are powerful for dashboards or quick reports before heavier visualization.
- Always double-check the meaning of your aggregation—averages can mislead if groups are unbalanced. 