# Titanic Survival Prediction

The RMS Titanic set sail on its maiden voyage in 1912, crossing the Atlantic from Southampton, England to New York City. The ship never completed the voyage, sinking to the bottom of the Atlantic Ocean after hitting an iceberg, bringing down 1,502 of 2,224 passengers onboard.

In this project I've build Regression model that predicts which passengers survived the sinking of the Titanic, based on features.

The data I'll be using for training the model is provided by [Kaggle Titanic competition!](https://www.kaggle.com/c/titanic)

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

In [2]:
passengers = pd.read_csv('passengers.csv')
passengers.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


### Data Cleaning

Given the saying, “women and children first,” Sex and Age seem like good features to predict survival. And given the strict class system onboard the Titanic, I've utilize the `Pclass` column, or the passenger class, as another feature. 

In [3]:
# Replaced all missing values with mean

passengers.Age.fillna(value = passengers.Age.mean(), inplace = True)

In [4]:
# mapping Male = 0   Female = 1

passengers['Sex'] = passengers['Sex'].map({'male':0,'female':1})

In [5]:
passengers['FirstClass'] = passengers.Pclass.apply(lambda x:1 if x == 1 else 0)



passengers['SecondClass'] = passengers.Pclass.apply(lambda x:1 if x == 2 else 0)

In [13]:
print(passengers.dtypes[['Sex','Age','FirstClass','SecondClass']])

Sex              int64
Age            float64
FirstClass       int64
SecondClass      int64
dtype: object


All explorable features are numeric type now

### Train, Test data splitting

In [14]:
features = passengers[['Sex', 'Age', 'FirstClass', 'SecondClass']]
survival = passengers.Survived

In [20]:
train_features, test_features, train_survival, test_survival = train_test_split(features, survival, train_size = 0.8, test_size = 0.2, random_state = 6)

# Testing to see wheather train feature and labels has same 80% data and Test set with 20%.

print(len(train_features))
print(len(train_survival))
print(len(test_features))

712
712
179


### Normalization

**Z - Score normalization** :   Scaling the feature data so,

 
It has mean = 0 and standard deviation = 1

In [16]:
scaler = StandardScaler()
train_features = scaler.fit_transform(train_features)
test_features = scaler.transform(test_features)

# Logistic Regression


In [None]:
model = LogisticRegression()
model.fit(train_features, train_survival)

In [18]:
# R^2 value for training data

print(model.score(train_features, train_survival))


# R^2 value for test data

print(model.score(test_features, test_survival))

0.8019662921348315
0.7653631284916201


### Feature Impact 
`Sex` and `First Class ` features has great impact on the survival of passengers.


In [21]:
list(zip(['Sex','Age','FirstClass','SecondClass'],model.coef_[0]))

[('Sex', 1.2756717044394632),
 ('Age', -0.3946064736205097),
 ('FirstClass', 0.9443893659495298),
 ('SecondClass', 0.4054737167875692)]

## Predicting with my model

Provided in the code is information for 

- 3rd class passenger `Jack`, 
- 1st class passenger `Rose` and
- 3rd class youngest passenger onboard `Millvina Dean`

In [22]:
Jack = np.array([0.0,20.0,0.0,0.0])
Rose = np.array([1.0,17.0,1.0,0.0])
Millvina = np.array([1.0,0.2,0.0,0.0])


sample_passengers = np.array([Jack, Rose, Millvina])

In [23]:
# Z-score transform

sample_passengers = scaler.transform(sample_passengers)


In [26]:
final = (model.predict(sample_passengers))

if final[0] == 0:
    print('Unfortunately Jack did not survived')
else :
    print('jack was one of the luckiest passengers who survived')
if final[1] == 0:
    print('Unfortunately Rose did not survived')
else :
    print('Rose was one of the luckiest passengers who survived')
if final[2] == 0:
    print('Unfortunately Millvina did not survived')
else :
    print('Millvina was one of the luckiest passengers who survived')

Unfortunately Jack did not survived
Rose was one of the luckiest passengers who survived
Millvina was one of the luckiest passengers who survived


### My model supports the fact that, 
- **Rose** and **Millvina Dean** was among the fortunate survivor of **The Great Titanic** disaster. 
- But, unfortunately **Jack** lost his life.

### Survival Prediction Probabilty

In [40]:
prediction = (model.predict_proba(sample_passengers))
Dead_pred = [pred[0] for pred in prediction]
Survived_pred = [pred[1] for pred in prediction]


print('Probability of Jack\'s death {}\nProbability of Rose\'s survival {}\nProbability of Millvina\'s survival {}'.format(Dead_pred[0], Survived_pred[1], Survived_pred[2]))

Probability of Jack's death 0.8750488803397127
Probability of Rose's survival 0.9548581109458395
Probability of Millvina's survival 0.7946230083248694
