Predict Titanic Survival
The RMS Titanic set sail on its maiden voyage in 1912, crossing the Atlantic from Southampton, England to New York City. The ship never completed the voyage, sinking to the bottom of the Atlantic Ocean after hitting an iceberg, bringing down 1,502 of 2,224 passengers onboard.

In this project you will create a Logistic Regression model that predicts which passengers survived the sinking of the Titanic, based on features like age and class.

The data we will be using for training our model is provided by Kaggle. Feel free to make the model better on your own and submit it to the Kaggle Titanic competition!

If you get stuck during this project or would like to see an experienced developer copmplete it, check out the project walkthrough video which can be found in the “get help” menu in the bottom-right of this window.

In [None]:
import codecademylib3_seaborn
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the passenger data
passengers = pd.read_csv('passengers.csv')

# Update sex column to numerical
passengers['Sex'] = passengers['Sex'].map({'female':1, 'male':0})
print(passengers['Sex'].values)
# Fill the nan values in the age column
print(passengers['Age'].values)
passengers['Age'].fillna(value=passengers['Age'].mean(), inplace=True)

# Create a first class column
passengers['FirstClass'] = passengers['Pclass'
].apply(lambda x:1 if x ==1 else 0)

# Create a second class column
passengers['SecondClass'] = passengers['Pclass'
].apply(lambda x:1 if x ==2 else 0)

# Select the desired features
features = passengers[['Sex', 'Age', 'FirstClass', 'SecondClass']]
survival = passengers['Survived']
# Perform train, test, split
x_train, x_test, y_train, y_test = train_test_split(features,survival,test_size = 0.8)

# Scale the feature data so it has mean = 0 and standard deviation = 1
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

# Create and train the model
model = LogisticRegression()
model.fit(x_train, y_train)
# Score the model on the train data
print(model.score(x_train, y_train))


# Score the model on the test data
print(model.score(x_test, y_test))


# Analyze the coefficients
print(model.coef_)

# Sample passenger features
Jack = np.array([0.0,20.0,0.0,0.0])
Rose = np.array([1.0,17.0,1.0,0.0])
You =  np.array([0.0, 25.0, 0.0,0.0])

# Combine passenger arrays
sample_passengers = np.array([Jack , Rose, You])

# Scale the sample passenger features
sample_passengers = scaler.transform(sample_passengers)
print(sample_passengers)
# Make survival predictions!
print(model.predict(sample_passengers))
