<a href="https://colab.research.google.com/github/arnaud22560/Titanic_Disaster/blob/main/Titanic_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Titanic - Machine Learning from Disaster
**Arnaud Le Doeuff** 

*December 2020*

Predict survival on the Titanic and get familiar with ML basics

[Kaggle competition](https://www.kaggle.com/c/titanic/overview)

![Titanic image](https://www.maisonapart.com/images/auto/640-480-c/20130624_180353_titanic-visuel-communique.jpg)



# Part 1: Get started
## The challenge
The competition is simple: we want you to use the Titanic passenger data (name, age, price of ticket, etc) to try to predict who will survive and who will die.

## The data
There are three files in the data: (1) train.csv, (2) test.csv, and (3) gender_submission.csv.

In [1]:
import pandas as pd
import numpy as np

Data importation from Github

In [11]:
train_data = pd.read_csv("https://raw.githubusercontent.com/arnaud22560/Titanic_Disaster/main/data/train.csv")
test_data = pd.read_csv("https://raw.githubusercontent.com/arnaud22560/Titanic_Disaster/main/data/test.csv")
gender_submission = pd.read_csv("https://raw.githubusercontent.com/arnaud22560/Titanic_Disaster/main/data/gender_submission.csv")

### (1) train.csv
train.csv contains the details of a subset of the passengers on board (891 passengers, to be exact -- where each passenger gets a different row in the table).

In [6]:
train_data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


The values in the second column ("**Survived**") can be used to determine whether each passenger survived or not:

* if it's a "1", the passenger survived.
* if it's a "0", the passenger died.

For instance, the first passenger listed in train.csv is *Mr. Owen Harris Braund*. He was 22 years old when he died on the Titanic.
___

### (2) test.csv


Using the patterns you find in train.csv, you have to predict whether the other 418 passengers on board (in test.csv) survived.

In [10]:
test_data.head()

Unnamed: 0,PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
1,893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0,,S
2,894,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
4,896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S


### (3) gender_submission.csv
The gender_submission.csv file is provided as an **example** that shows how you should structure your predictions. It predicts that all female passengers survived, and all male passengers died.

In [12]:
gender_submission.head()

Unnamed: 0,PassengerId,Survived
0,892,0
1,893,1
2,894,0
3,895,0
4,896,1


## Explore the dataset

Let's check the rate of women and men who survied.

In [13]:
women = train_data.loc[train_data.Sex == 'female']["Survived"]
rate_women = sum(women)/len(women)

print("% of women who survived:", rate_women)

men = train_data.loc[train_data.Sex == 'male']["Survived"]
rate_men = sum(men)/len(men)

print("% of men who survived:", rate_men)

% of women who survived: 0.7420382165605095
% of men who survived: 0.18890814558058924


From this you can see that almost 75% of the women on board survived, whereas only 19% of the men lived to tell about it. Since gender seems to be such a strong indicator of survival, the submission file in gender_submission.csv is not a bad first guess, and it makes sense that it performed reasonably well!

___

## Machine learning model