# Titanic - Machine Learning from Disaster

This notebook demonstrates the usage of a neural network built entirely from scratch using only NumPy. The main project, located in the `examples` folder, showcases how to evaluate the classic Titanic dataset to classify survivors using our custom neural network implementation.

We will walk through the process of loading the dataset, preprocessing the data, and using the neural network to make predictions. This example highlights the flexibility and functionality of our self-made neural network.

Kaggle link: https://www.kaggle.com/competitions/titanic/overview

In [1]:
# Retrieve the modules from the main folder
import sys
sys.path.insert(0, '..')

# Import pandas for loading the dataset
import pandas as pd
import numpy as np

# Import the modules
from neural_network import NeuralNetwork
from utils import StandardScaler, train_test_split, accuracy_score

In [2]:
# Load the data from Kaggle
df_titanic = pd.read_csv("./datasets/Titanic.csv")

df_titanic.head(10)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.075,,S
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C


In [3]:
# Get dummie data for Pclass & Sex
df_titanic = pd.get_dummies(df_titanic.drop('Name', axis=1), dtype = int)

In [4]:
# Define the features
features = ['Pclass', 'Age', 'SibSp', 'Parch', 'Fare', 'Sex_female', 'Sex_male']

In [5]:
df_titanic.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Age            177
SibSp            0
              ... 
Cabin_G6         0
Cabin_T          0
Embarked_C       0
Embarked_Q       0
Embarked_S       0
Length: 840, dtype: int64

In [6]:
# Impute the median for age
df_titanic['Age'] = pd.to_numeric(df_titanic['Age'], errors='coerce')
df_titanic['Age'] = df_titanic['Age'].fillna(np.median(df_titanic['Age'].dropna()))

In [7]:
# Split the data for testing
X_train, X_test, y_train, y_test = train_test_split(df_titanic[features], df_titanic['Survived'], test_size=0.33, random_state=42)

# Reshape y_train
y_train = np.array(y_train).reshape(-1,1)

In [8]:
# Scale the data
scaler = StandardScaler()
X_train = scaler.fit_transform(np.array(X_train))
X_test = scaler.transform(np.array(X_test))

In [9]:
# Define the neural network
nn = NeuralNetwork(layers=[7,50,25,10,1], activation='relu', loss='cross_entropy', random_seed=42)

# Train the model
nn.train(X_train, y_train, epochs=5000, learning_rate=0.01)

# Store the results from the prediction
results = nn.predict(X_test)

# Print the accuracy score
print(f'\nThe accuracy score of the model is: {accuracy_score(y_test, results):.5f}')

Epoch: 0, loss: 1.5207545173988075
Epoch: 100, loss: 0.8145254577791565
Epoch: 200, loss: 0.5330092492126295
Epoch: 300, loss: 0.4446227519620024
Epoch: 400, loss: 0.3267715695932089
Epoch: 500, loss: 0.3150891536936057
Epoch: 600, loss: 0.30881118721100403
Epoch: 700, loss: 0.2539680853846389
Epoch: 800, loss: 0.24863359390900616
Epoch: 900, loss: 0.24491033716904625
Epoch: 1000, loss: 0.2419528897024141
Epoch: 1100, loss: 0.23938664014168623
Epoch: 1200, loss: 0.23765905763828069
Epoch: 1300, loss: 0.2359963647182899
Epoch: 1400, loss: 0.23463821047327868
Epoch: 1500, loss: 0.23335046748720306
Epoch: 1600, loss: 0.2321854999502134
Epoch: 1700, loss: 0.23092832345389516
Epoch: 1800, loss: 0.22981226282485573
Epoch: 1900, loss: 0.22875855782272025
Epoch: 2000, loss: 0.22771172459273845
Epoch: 2100, loss: 0.22675435553219211
Epoch: 2200, loss: 0.22571217754087203
Epoch: 2300, loss: 0.22483860291387237
Epoch: 2400, loss: 0.22402368886007087
Epoch: 2500, loss: 0.22325040263466162
Epoch: 2