# 2. Statistical Testing - Titanic

**Goal:** Use statistical methods to verify if survival differences are statistically significant.

In [1]:
import pandas as pd
import numpy as np
from scipy import stats
import seaborn as sns

df = sns.load_dataset('titanic')

## 1. Sex vs. Survival (Chi-Square)

**H₀:** Survival is independent of Sex.  
**H₁:** Survival is dependent on Sex.

In [2]:
table = pd.crosstab(df['sex'], df['survived'])
chi2, p, dof, ex = stats.chi2_contingency(table)

print(f"Chi-Square Statistic: {chi2:.4f}")
print(f"P-Value: {p:.5e}")

if p < 0.05:
    print("\n✅ REJECT H₀: Survival is highly dependent on Sex.")

Chi-Square Statistic: 260.7170
P-Value: 1.19736e-58

✅ REJECT H₀: Survival is highly dependent on Sex.


## 2. Age: Survivors vs. Non-Survivors (T-Test)

**H₀:** Mean age is the same for both groups.  
**H₁:** Mean age is significantly different.

In [3]:
survived_age = df[df['survived'] == 1]['age'].dropna()
died_age = df[df['survived'] == 0]['age'].dropna()

t_stat, p_val = stats.ttest_ind(survived_age, died_age, equal_var=False)

print(f"T-Statistic: {t_stat:.4f}")
print(f"P-Value: {p_val:.4f}")

T-Statistic: -2.0460
P-Value: 0.0412
