# Investigate a Dataset: Titanic Data

## 1. Introduction

In this report, I will investigate the Titanic survivor data using exploratory data analysis.

In the data wrangling phase, I will determine the appropriate datatypes for our dataset, and I will also show how to handle missing values. 

In the data exploration phase, I will first look at each variable and its distribution. After that, I will answer two question:

1. What factors make people more likely to survive?

2. What money can buy? -- Explore relations among passenger class, cabin, and fare. 

Last, I will conclude this report by summarizing the findings and stating the limitations of my analysis.

# 2 Data Wrangling

In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%pylab inline

Populating the interactive namespace from numpy and matplotlib


### 2.1 Handling Data Types

After reviewing the original description of the dataset from [the Kaggle website](https://www.kaggle.com/c/titanic/data), the data type of each variable is chosen as following, and categorical variables will be converted to more descriptive labels:

| Variable | Definition                                 | Key                                            | Type            |
|----------|--------------------------------------------|------------------------------------------------|-----------------|
| Survived | Survival                                   | 0 = No, 1 = Yes                                | int (Survived)* |
| Pclass   | Ticket class                               | 1 = 1st, 2 = 2nd, 3 = 3rd                      | int (Class)     |
| Sex      | Sex                                        |                                                | str             |
| Age      | Age in years                               |                                                | float           |
| SibSp    | # of siblings / spouses aboard the Titanic |                                                | int             |
| ParCh    | # of parents / children aboard the Titanic |                                                | int             |
| Ticket   | Ticket number                              |                                                | int             |
| Fare     | Passenger fare                             |                                                | float           |
| Cabin    | Cabin number                               |                                                | str             |
| Embarked | Port of embarkation                        | C = Cherbourg, Q = Queenstown, S = Southampton | str (Port)      |

\* indicate the name of converted categorical variable

In [4]:
data_file = 'titanic-data.csv'
titanic_df = pd.read_csv(
    data_file, 
    dtype = {'PassengerId': str}
)

In [6]:
# Convert categorical variables to more descriptive labels.

# Create descriptive Survival column from Survived column
titanic_df['Survival'] = titanic_df['Survived'].map({0: 'Died', 
                                                     1: 'Survived'})

# Create descriptive Class column from Pclass column 
titanic_df['Class'] = titanic_df['Pclass'].map({1: 'First Class', 
                                                2: 'Second Class', 
                                                3: 'Third Class'})

# Create descriptive Port column from Embarked column
titanic_df['Port'] = titanic_df['Embarked'].map({'C': 'Cherbourg', 
                                                 'Q': 'Queenstown', 
                                                 'S': 'Southampton'})

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,ParCh,Ticket,Fare,Cabin,Embarked,Survival,Class,Port
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,Died,Third Class,Southampton
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,Survived,First Class,Cherbourg
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,Survived,Third Class,Southampton
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,Survived,First Class,Southampton
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,Died,Third Class,Southampton
