## Predicting Titanic Survival

The RMS Titanic set sail on its maiden voyage in 1912, crossing the Atlantic from Southampton, England to New York City. The ship never completed the voyage, sinking to the bottom of the Atlantic Ocean after hitting an iceberg, bringing down 1,502 of 2,224 passengers onboard.

The data used here to train our model is provided by Kaggle

In this project a logistic regression model will used to predict which passengers survived the sinking of the Titanic based on features such as age and class

In [27]:
# importing our Modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

In [28]:
# Loading in our data and viewing it
data = pd.read_csv('train.csv')
data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [29]:
# Starting with Sex and Age to predict survival 

# Updating sex so female = 1 and male = 0
data['Sex'] = data['Sex'].map({'female': 1, 'male': 0})

In [30]:
# Replacing NaNs in age with mean value
data['Age'].fillna(value = data['Age'].mean(), inplace = True)

print('Total number of NaNs in age after mean imputation:', data['Age'].isnull().sum().sum())

Total number of NaNs in age after mean imputation: 0


In [31]:
# Creating new columns 'first class' and 'second class which uses values from the Pclass column 

# If Pclass = 1 then first class all else is 0
data['First_Class'] = data['Pclass'].apply(lambda i: 1 if i == 1 else 0)

# If Pclass = 2 then second class all else is 0
data['Second_Class'] = data['Pclass'].apply(lambda i: 1 if i == 2 else 0)

data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,First_Class,Second_Class
0,1,0,3,"Braund, Mr. Owen Harris",0,22.0,1,0,A/5 21171,7.25,,S,0,0
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",1,38.0,1,0,PC 17599,71.2833,C85,C,1,0
2,3,1,3,"Heikkinen, Miss. Laina",1,26.0,0,0,STON/O2. 3101282,7.925,,S,0,0
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",1,35.0,1,0,113803,53.1,C123,S,1,0
4,5,0,3,"Allen, Mr. William Henry",0,35.0,0,0,373450,8.05,,S,0,0


In [44]:
# Creating our features and labels data set
features = data[['Sex', 'Age', 'First_Class', 'Second_Class']]
labels = data['Survived']

# Splitting the data into test and train
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size = 0.2)

# Normalising the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Creating our model
model = LogisticRegression()
model.fit(X_train, y_train)

# Scoring our model
train_score = model.score(X_train, y_train)
test_score = model.score(X_test, y_test)

print('Train Score:', train_score)
print('Test Score:', test_score)



Train Score: 0.7823033707865169
Test Score: 0.7877094972067039


In [58]:
# Features with coefficients 
features_list = ['Sex', 'Age', 'First_Class', 'Second_Class']
dict(zip(features_list, model.coef_[0]))



{'Sex': 1.237136287526673,
 'Age': -0.44339134543475983,
 'First_Class': 1.1085874272507168,
 'Second_Class': 0.5154756263480155}

In [73]:
# Predicting Jack, Rose and my fate with the model
jack = np.array([0, 20, 0, 0])
rose = np.array([1, 17, 1, 0])
me = np.array([0, 24, 0, 1])

combined = np.array([jack, rose, me])
combined = scaler.transform(combined)

print(model.predict(combined))

print('0? not looking so great for me')

[0 1 0]
0? not looking so great for me


