<a href="https://colab.research.google.com/github/dvtran63/ai-learning-notebooks/blob/main_b1/titanic_conditional_probability.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Titanic Conditional Probability Exercise
This notebook walks you through calculating conditional probabilities using the Titanic dataset in **Seaborn**.

You'll compute:
1. $P(\text{Survived} \mid \text{Age} < 18)$  
2. $P(\text{Survived} \mid \text{Sex} = \text{male}, \text{Age} < 18)$  
3. $P(\text{Survived} \mid \text{Pclass} = 1)$  

Feel free to modify and explore further!

In [3]:
# Install seaborn if needed (Colab usually has it pre‑installed)
!pip install seaborn --quiet

In [4]:
import seaborn as sns
import pandas as pd
import numpy as np

## Load the Titanic dataset

In [5]:
df = sns.load_dataset('titanic')
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


## Clean the Data
Remove rows with missing **age** or **survived** values.

In [6]:
df_filtered = df[df['age'].notnull() & df['survived'].notnull()]
df_filtered.shape

(714, 15)

## Helper Function: Conditional Probability

In [7]:
def conditional_probability(df, condition_mask, target_column, target_value):
    """Calculate P(target = target_value | condition_mask)"""
    joint_count = len(df[condition_mask & (df[target_column] == target_value)])
    condition_count = condition_mask.sum()
    return joint_count / condition_count if condition_count else np.nan

## 1️⃣ Probability of Survival given Age < 18

In [8]:
mask_under_18 = df_filtered['age'] < 18
p_survived_under_18 = conditional_probability(df_filtered, mask_under_18, 'survived', 1)
print(f"P(Survived | Age < 18) = {p_survived_under_18:.2f}")

P(Survived | Age < 18) = 0.54


## 2️⃣ Probability of Survival for **Male** passengers under 18

In [9]:
mask_male_under_18 = (df_filtered['age'] < 18) & (df_filtered['sex'] == 'male')
p_survived_male_under_18 = conditional_probability(df_filtered, mask_male_under_18, 'survived', 1)
print(f"P(Survived | Male & Age < 18) = {p_survived_male_under_18:.2f}")

P(Survived | Male & Age < 18) = 0.40


## 3️⃣ Probability of Survival for **First‑Class** passengers

In [None]:
mask_first_class = df_filtered['pclass'] == 1
p_survived_first_class = conditional_probability(df_filtered, mask_first_class, 'survived', 1)
print(f"P(Survived | First‑Class) = {p_survived_first_class:.2f}")

## 🎯 Your Turn
Try changing the conditions to explore other questions, e.g.:
- Probability of survival given **embark_town == 'Southampton'**
- Probability of survival given **parch > 0** (travelling with parents/children)

Happy exploring!

In [10]:
mask_from_southampton = df_filtered['embark_town'] == 'Southampton'
p_survived_from_southampton = conditional_probability(df_filtered, mask_from_southampton, 'survived', 1)
print(f"P(Survived | From Southampton) = {p_survived_from_southampton:.2f}")

P(Survived | From Southampton) = 0.36


In [12]:
mask_with_parents = df_filtered['parch'] > 0
p_survived_with_parents = conditional_probability(df_filtered, mask_with_parents, 'survived', 1)
print(f"P(Survived | With Parents) = {p_survived_with_parents:.2f}")

P(Survived | With Parents) = 0.54
