# Building a Classifier for Survival on the Titanic

![Titanic sinking](https://upload.wikimedia.org/wikipedia/commons/4/4f/Titanic_the_sinking.jpg)
Source: [Wikimedia Commons](https://commons.wikimedia.org/wiki/File:Titanic_the_sinking.jpg?uselang=de)



> The captain can, by simply moving an electric switch, instantly close the doors throughout, practically making the vessel unsinkable.   

-- _Irish News and Belfast Morning News, June 1st, 1911, on the incomplete Titanic._

RMS Titanic was a British passenger liner that sank in the North Atlantic Ocean in the early morning hours of 15 April 1912, after it collided with an iceberg during its maiden voyage from Southampton to New York City. There were an estimated 2,224 passengers and crew aboard the ship, and more than 1,500 died. 

This data set is a partial list of passengers of the ship, recording several attributes of the person as well as whether they survived the disaster.


## Task

**Your task is to build a machine learning model that classifies passengers as survivors or victims**

Apply your new knowledge about
- **[📓 classification](../ml/ml-classification-intro.ipynb)**
- **[📓 feature engineering and selection](../ml/ml-feature-engineering.ipynb)**
- **[📓 algorithm selection and hyperparameter tuning](../ml/ml-algo-hyperparameter.ipynb)**

**Evaluate and present the model performance using**
- confusion matrices
- the _precision_, _recall_ and _F1_ metrics
- cross-validation

Here is the documentation of the dataset for reference:

In [2]:
with open('../.assets/data/titanic/titanic-documentation.txt', 'r') as doc:
    print(doc.read())

Data Dictionary

Variable	Definition	Key
survival 	Survival 	0 = No, 1 = Yes
pclass 	    Ticket class 	1 = 1st, 2 = 2nd, 3 = 3rd
sex 	    Sex 	
Age 	    Age in years 	
sibsp 	    # of siblings / spouses aboard the Titanic 	
parch 	    # of parents / children aboard the Titanic 	
ticket 	    Ticket number 	
fare 	    Passenger fare 	
cabin 	    Cabin number 	
embarked 	Port of Embarkation 	C = Cherbourg, Q = Queenstown, S = Southampton


Variable Notes

pclass: A proxy for socio-economic status (SES)
1st = Upper
2nd = Middle
3rd = Lower

age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5

sibsp: The dataset defines family relations in this way...
Sibling = brother, sister, stepbrother, stepsister
Spouse = husband, wife (mistresses and fiancés were ignored)

parch: The dataset defines family relations in this way...
Parent = mother, father
Child = daughter, son, stepdaughter, stepson
Some children travelled only with a nanny, therefore parch=0 for t

## Workflow

Start your ML workflow here.

---
_This notebook is licensed under a [Creative Commons Attribution 4.0 International License (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/). Copyright © 2018 [Point 8 GmbH](https://point-8.de)_