# Lab Instructions

You have been hired by James Cameron to create profiles of two characters for a reboot of the Titanic Movie: one that is most likely to survive the sinking and one that is least likely to survive.  Mr. Cameron wants this reboot to be as historically accurate as possible, so your profile of each character should be backed up with data and visualizations.

Each character profile should include information on their:
* Age, fare
* Sex
* Passenger class
* Travel companions (including both parents/children and siblings/spouse)
* Port of departure (indicated by the Embarked feature in the dataset)

For quantitative features like `Age` and `Fare`, you will need to use the `.loc` method we learned in class (or something similar) to place individuals in categories.  How you choose to do this is up to you, but make sure you explain your reasoning.

You should include at least one visualization for each element of the character profile (age, sex, passenger class, etc.) as evidence.

After you have developed your two character profiles, use your Pandas data wrangling skills to identify at least one real passenger in the dataset that fits each profile.  Print out the names of these individuals.  Look them up in [Encyclopeida Titanica](https://www.encyclopedia-titanica.org/) (or a similar resource).  

Tell Mr. Cameron at least one thing about the real passengers who fit your two character profiles that you learned from an external resource.  You need one interesting fact about a person who fits the profile of "most likely to survive" and one interesting fact about a person who fits the profile of "least likely to surivive".  



In [2]:
import pandas as pd

df = pd.read_csv('titanic_passengers.csv')



In [None]:
# Quick dataset inspection

df.head()

# show shape and a brief summary
print('shape:', df.shape)
print('\ninfo:')
df.info()


## Approach and goals

This lab requires us to build two character profiles: one likely to survive and one likely to not survive, backed with data and visualizations.  

Plan:
- Inspect the dataset and create useful categorical fields (age groups, fare groups, and travel companions).
- Compute survival rate by Sex, Pclass, Age group, travel companions, and Embarked.
- Produce at least one visualization for each feature used in the profile.
- Choose at least one real passenger that matches each profile and print their names.
- Add an external research note with a fact about each real passenger (use Encyclopedia Titanica or similar).


In [3]:
df['Fare'].describe()

count    891.000000
mean      32.204208
std       49.693429
min        0.000000
25%        7.910400
50%       14.454200
75%       31.000000
max      512.329200
Name: Fare, dtype: float64

**Interpretation â€” Fare**

The `describe()` output above shows the distribution of `Fare` values. Fares span a wide range, so I'll create a quartile-based fare category to better compare survival across fare levels.


In [None]:
# feature engineering: companions, age_group, fare_group
import matplotlib.pyplot as plt

# companions = siblings/spouses + parents/children
df['companions'] = df['SibSp'] + df['Parch']

# create simple age groups (NaNs will remain NaN)
age_bins = [0, 12, 18, 35, 60, 120]
age_labels = ['child (0-12)', 'teen (13-18)', 'young adult (19-35)', 'adult (36-60)', 'senior (60+)']
df['age_group'] = pd.cut(df['Age'], bins=age_bins, labels=age_labels, right=True)

# create quartile-based fare groups (fill missing Fare with 0 so they fall in 'Low')
df['fare_group'] = pd.qcut(df['Fare'].fillna(0), 4, labels=['Low', 'MedLow', 'MedHigh', 'High'])

# Calculate survival rates by category
survival_by_sex = df.groupby('Sex')['Survived'].mean()
survival_by_pclass = df.groupby('Pclass')['Survived'].mean()
survival_by_age_group = df.groupby('age_group')['Survived'].mean()
survival_by_companions = df.groupby(pd.cut(df['companions'], bins=[-1,0,1,3,100], labels=['alone','1_comp','2-3','4+']))['Survived'].mean()
survival_by_embarked = df.groupby('Embarked')['Survived'].mean()
survival_by_fare = df.groupby('fare_group')['Survived'].mean()

# show computed values
print('Survival rate by sex:\n', survival_by_sex)
print('\nSurvival rate by passenger class (Pclass):\n', survival_by_pclass)
print('\nSurvival rate by age group:\n', survival_by_age_group)
print('\nSurvival rate by number of travel companions:\n', survival_by_companions)
print('\nSurvival rate by port of embarkation (Embarked):\n', survival_by_embarked)
print('\nSurvival rate by fare quartile:\n', survival_by_fare)

# small visualizations (these will render in notebook) - bar charts
plt.figure(figsize=(10,6))
survival_by_sex.plot(kind='bar', title='Survival rate by Sex')
plt.ylabel('Survival rate')
plt.show()

plt.figure(figsize=(10,6))
survival_by_pclass.plot(kind='bar', title='Survival rate by Passenger Class (Pclass)')
plt.ylabel('Survival rate')
plt.show()

plt.figure(figsize=(10,6))
survival_by_age_group.plot(kind='bar', title='Survival rate by Age Group')
plt.ylabel('Survival rate')
plt.show()


In [None]:
# Find combinations with highest and lowest survival rates

combo = df.groupby(['Sex','Pclass','age_group'])['Survived'].mean().sort_values(ascending=False)
combo.head(12)

# pick at least one candidate who survived and matches typical 'most likely to survive' profile
most_likely = df[(df['Survived']==1) & (df['Sex']=='female') & (df['Pclass']==1)]
most_likely[['Name','Sex','Pclass','Age','Age','Fare','Survived']].head(5)

# For least likely - choose male, third class passengers who did not survive
least_likely = df[(df['Survived']==0) & (df['Sex']=='male') & (df['Pclass']==3)]
least_likely[['Name','Sex','Pclass','Age','Fare','Survived']].head(5)


## Character profiles and next steps

Based on the survival-rate analysis above, we generally see the patterns that are historically known: women and first-class passengers survived at higher rates, while men in third class were less likely to survive.

Suggested character profiles (examples you can use or adjust):

- Most likely to survive: female, first-class passenger, younger (child / teen or young adult), with a smaller number of companions, embarked from a higher-fare group.
- Least likely to survive: male, third-class passenger, adult, often traveling alone or with many companions, in lower fare group.

From the outputs above you can pick specific names that match each profile (we printed candidate names). Please look up the selected passengers on Encyclopedia Titanica (https://www.encyclopedia-titanica.org/) and add at least one historical fact about each chosen passenger (for example: occupation, hometown, family details, or a short biography snippet). 

When you're finished:
1. Run All Cells so the notebook shows outputs for the instructor.
2. Save and commit the notebook and push to your GitHub repository.

If you'd like, I can also:
- Expand any of the visualizations (add labels, colors or subplots).
- Choose specific passenger names and look up facts for you if you want me to draft text you can verify and submit.
