# A look into who survived the sinking of the Titanic

This analysis serves to explore a sample of Titanic passengers for various insights into who surived. It will answer the following question. Which factors made a passenger more likely to survive the sinking of the Titanic?

It is a tentative work in progress.

In [67]:
import pandas as pd
import matplotlib.pyplot as plt

passenger_data = pd.read_csv('titanic-data.csv', index_col='PassengerId')

passenger_data.head(20)

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.075,,S
9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S
10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C


## Overall survival rate

As an introduction to the analysis, here's the overall survival rate for the sample

### 38.3% of passengers survived

In [61]:
print passenger_data.mean()['Survived']

0.383838383838


## Survival rate by sex

Obviously both males and females boarded the Titanic, here's their survival rate.

### 74.2% of females survived

### 18.9% of males survived

In [62]:
print passenger_data.groupby('Sex').mean()['Survived']

Sex
female    0.742038
male      0.188908
Name: Survived, dtype: float64


## Survival rate by age

This analysis breaks passengers into three age groups: children (0-14 years old), adolescents (14-20 years old), and adults (21+ years old). This is an arbitrary distinction that probably reflects a modern bias, but hopefully it yields equally interesting insights.

### 58.4% of children survived

### 36.3% of adolescents survived

### 38.9% of adults survived

In [72]:
# note: while the data has a number of empty values for age,
#       the mean() function ignores those, so we can safely
#       calculate it without modifying that column

# data wrangling: added a new column 'age_group' to passenger_data
#                 so that we can easily calculate the mean survival rate

# define the bin values and group names
bins = [0,14,20,100]
group_names = ['Children', 'Adolescents', 'Adults']

# create a new column 'age_group' and add it to the data
age_group = pd.cut(passenger_data['Age'], bins, labels=group_names)
passenger_data['age_group'] = age_group

# get survival rate by age group
print passenger_data.groupby('age_group').mean()['Survived']

age_group
Children       0.584416
Adolescents    0.362745
Adults         0.388785
Name: Survived, dtype: float64


## Survival by ticket class

Passengers could purchase three types of tickets to board the Titanic: first class, second class, and third class. Here's the breakdown of the survival rate by ticket class.

### 62.9% of first class ticket holders survived

### 47.3% of second class ticket holders survived

### 24.2% of third class ticket holders survived

In [56]:
print passenger_data.groupby('Pclass').mean()['Survived']

Pclass
1    0.629630
2    0.472826
3    0.242363
Name: Survived, dtype: float64
