# Predicting Titanic Survivors

In this notebook, we will be training and testing a neural network using TensorFlow.  
After that, we will attempt to build our own neural network from scratch and compare the results.

---

## Steps:

- Feature Engineering  
- Preprocessing  
  - Imputing  
  - Encoding  
  - Scaling  
- Training the model (Keras)  
- Tuning hyperparameters  
- Evaluate  
- Build and train custom model (from scratch)  
- Tune hyperparameters  
- Evaluate  
- Who won?  

After doing all of this, I will attempt to use model stacking with the following models:

- Neural Network (the winner)
- Logistic Regression
- XGBoost
- RandomForest

In [260]:
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import MinMaxScaler

In [261]:
# Load datasets
X_train = pd.read_csv("titanic_data/train.csv")
X_test = pd.read_csv("titanic_data/test.csv")

In [262]:
# Split datasets
X_train, X_val = train_test_split(X_train, test_size=0.25, random_state=42)

In [263]:
# Extract target
y_train = X_train.pop('Survived')
y_val = X_val.pop('Survived')

In [264]:
X_train

Unnamed: 0,PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
298,299,1,"Saalfeld, Mr. Adolphe",male,,0,0,19988,30.5000,C106,S
884,885,3,"Sutehall, Mr. Henry Jr",male,25.00,0,0,SOTON/OQ 392076,7.0500,,S
247,248,2,"Hamalainen, Mrs. William (Anna)",female,24.00,0,2,250649,14.5000,,S
478,479,3,"Karlsson, Mr. Nils August",male,22.00,0,0,350060,7.5208,,S
305,306,1,"Allison, Master. Hudson Trevor",male,0.92,1,2,113781,151.5500,C22 C26,S
...,...,...,...,...,...,...,...,...,...,...,...
106,107,3,"Salkjelsvik, Miss. Anna Kristine",female,21.00,0,0,343120,7.6500,,S
270,271,1,"Cairns, Mr. Alexander",male,,0,0,113798,31.0000,,S
860,861,3,"Hansen, Mr. Claus Peter",male,41.00,2,0,350026,14.1083,,S
435,436,1,"Carter, Miss. Lucile Polk",female,14.00,1,2,113760,120.0000,B96 B98,S


In [265]:
y_train

298    1
884    0
247    1
478    0
305    1
      ..
106    1
270    0
860    0
435    1
102    0
Name: Survived, Length: 668, dtype: int64

In [266]:
X_train.describe()

Unnamed: 0,PassengerId,Pclass,Age,SibSp,Parch,Fare
count,668.0,668.0,536.0,668.0,668.0,668.0
mean,447.450599,2.333832,29.421343,0.553892,0.372754,32.179397
std,258.038366,0.823707,14.52601,1.185279,0.795588,51.604012
min,1.0,1.0,0.42,0.0,0.0,0.0
25%,221.75,2.0,20.75,0.0,0.0,7.925
50%,452.5,3.0,28.0,0.0,0.0,14.4
75%,673.5,3.0,38.0,1.0,0.0,30.5
max,891.0,3.0,80.0,8.0,6.0,512.3292


In [267]:
X_test

Unnamed: 0,PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
1,893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0000,,S
2,894,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
4,896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S
...,...,...,...,...,...,...,...,...,...,...,...
413,1305,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.0500,,S
414,1306,1,"Oliva y Ocana, Dona. Fermina",female,39.0,0,0,PC 17758,108.9000,C105,C
415,1307,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.2500,,S
416,1308,3,"Ware, Mr. Frederick",male,,0,0,359309,8.0500,,S


## Feature Engineering

The titanic dataset is quite small, so generating new data could be crucial to our model sucess.

In [268]:
# Adding a new feature because cabin has lots of NaNs
X_train['Has_Cabin'] = X_train['Cabin'].notnull().astype(int)
X_val['Has_Cabin'] = X_val['Cabin'].notnull().astype(int)
X_test['Has_Cabin'] = X_test['Cabin'].notnull().astype(int)

In [269]:
# Getting ticket count (for families)
X_train['Ticket_Count'] = X_train.groupby('Ticket')['Ticket'].transform('count')
X_val['Ticket_Count'] = X_val.groupby('Ticket')['Ticket'].transform('count')
X_test['Ticket_Count'] = X_test.groupby('Ticket')['Ticket'].transform('count')

In [270]:
# Adding an IsAlone column, people with families might have a higher chance to survive
X_train["IsAlone"] = ((X_train["Parch"] == 0) & (X_train["SibSp"] == 0)).astype(int)
X_val["IsAlone"]   = ((X_val["Parch"] == 0) & (X_val["SibSp"] == 0)).astype(int)
X_test["IsAlone"]  = ((X_test["Parch"] == 0) & (X_test["SibSp"] == 0)).astype(int)

In [271]:
# Adding an IsChild column as children were prioritized
X_train["IsChild"] = (X_train["Age"] < 18).astype(int)
X_val["IsChild"] = (X_val["Age"] < 18).astype(int)
X_test["IsChild"] = (X_test["Age"] < 18).astype(int)

In [272]:
# Adding IsMaleChild because children were prioritized but male adults weren't
X_train["IsMaleChild"] = ((X_train["Sex"] == "male") & (X_train["IsChild"] == 1)).astype(int)
X_val["IsMaleChild"] = ((X_val["Sex"] == "male") & (X_val["IsChild"] == 1)).astype(int)
X_test["IsMaleChild"] = ((X_test["Sex"] == "male") & (X_test["IsChild"] == 1)).astype(int)

In [273]:
# Adding Title column to see how high-class they are
# Extract title from the name
for df in [X_train, X_val, X_test]:
    df["Title"] = df["Name"].str.extract(r",\s*([^\.]+)\.")

# Group rare titles
rare_titles = ['Lady', 'Countess', 'Capt', 'Col', 'Don', 
               'Dr', 'Major', 'Rev', 'Sir', 'Jonkheer', 'Dona']

for df in [X_train, X_val, X_test]:
    df["Title"] = df["Title"].replace(rare_titles, 'Rare')
    df["Title"] = df["Title"].replace(['Mlle', 'Ms'], 'Miss')
    df["Title"] = df["Title"].replace('Mme', 'Mrs')

In [274]:
X_train["IsMaleChild"]

298    0
884    0
247    0
478    0
305    1
      ..
106    0
270    0
860    0
435    0
102    0
Name: IsMaleChild, Length: 668, dtype: int64

In [275]:
# Removing ticket column as it will cause issues with encoding
X_train.drop('Ticket', axis=1, inplace=True)
X_val.drop('Ticket', axis=1, inplace=True)
X_test.drop('Ticket', axis=1, inplace=True)

In [276]:
# Removing cabin column as it will cause issues with encoding
X_train.drop('Cabin', axis=1, inplace=True)
X_val.drop('Cabin', axis=1, inplace=True)
X_test.drop('Cabin', axis=1, inplace=True)

In [277]:
# Removing name column as it will confuse the model
X_train.drop('Name', inplace=True, axis=1)
X_val.drop('Name', inplace=True, axis=1)
X_test.drop('Name', inplace=True, axis=1)

In [278]:
# Removing ID column as it will confuse the model
X_train.drop('PassengerId', inplace=True, axis=1)
X_val.drop('PassengerId', inplace=True, axis=1)
X_test_id = X_test.pop("PassengerId")

In [279]:
# Define numeric and categorical columns 
numeric_cols = X_train.select_dtypes(include=np.number).columns
categorical_cols = X_train.select_dtypes(include='object').columns

In [280]:
X_train[numeric_cols]

Unnamed: 0,Pclass,Age,SibSp,Parch,Fare,Has_Cabin,Ticket_Count,IsAlone,IsChild,IsMaleChild
298,1,,0,0,30.5000,1,1,1,0,0
884,3,25.00,0,0,7.0500,0,1,1,0,0
247,2,24.00,0,2,14.5000,0,2,0,0,0
478,3,22.00,0,0,7.5208,0,1,1,0,0
305,1,0.92,1,2,151.5500,1,4,0,1,1
...,...,...,...,...,...,...,...,...,...,...
106,3,21.00,0,0,7.6500,0,1,1,0,0
270,1,,0,0,31.0000,0,1,1,0,0
860,3,41.00,2,0,14.1083,0,1,0,0,0
435,1,14.00,1,2,120.0000,1,2,0,1,0


In [281]:
X_train[categorical_cols]

Unnamed: 0,Sex,Embarked,Title
298,male,S,Mr
884,male,S,Mr
247,female,S,Mrs
478,male,S,Mr
305,male,S,Master
...,...,...,...
106,female,S,Miss
270,male,S,Mr
860,male,S,Mr
435,female,S,Miss


## Preprocessing

After adding and removing a few columns, it's time to start cleaning up the data. It'll be broken down into 3 main parts:

1. Imputing NaN values in numeric columns by using sklearn's `SimpleImputer()`
2. Encoding categorical columns into one hot vectors using sklearn's `OneHotEncoder()`
3. Normalizing data (neural networks love this) using sklearn's `MinMaxScaler()`

In [282]:
X_train[numeric_cols].isna().sum()

Pclass            0
Age             132
SibSp             0
Parch             0
Fare              0
Has_Cabin         0
Ticket_Count      0
IsAlone           0
IsChild           0
IsMaleChild       0
dtype: int64

In [283]:
X_val[numeric_cols].isna().sum()

Pclass           0
Age             45
SibSp            0
Parch            0
Fare             0
Has_Cabin        0
Ticket_Count     0
IsAlone          0
IsChild          0
IsMaleChild      0
dtype: int64

In [284]:
X_test[numeric_cols].isna().sum()

Pclass           0
Age             86
SibSp            0
Parch            0
Fare             1
Has_Cabin        0
Ticket_Count     0
IsAlone          0
IsChild          0
IsMaleChild      0
dtype: int64

In [285]:
# Create imputer
imputer = SimpleImputer()

In [286]:
# Fit imputer on training data to prevent data leakage
imputer.fit(X_train[numeric_cols])

In [287]:
# Transform columns
X_train[numeric_cols] = imputer.transform(X_train[numeric_cols])
X_val[numeric_cols] = imputer.transform(X_val[numeric_cols])
X_test[numeric_cols] = imputer.transform(X_test[numeric_cols])

In [288]:
X_train[numeric_cols].isna().sum()

Pclass          0
Age             0
SibSp           0
Parch           0
Fare            0
Has_Cabin       0
Ticket_Count    0
IsAlone         0
IsChild         0
IsMaleChild     0
dtype: int64

In [289]:
# Create encoder
enc = OneHotEncoder(sparse_output=False)

In [290]:
# Fit encoder to training data
enc.fit(X_train[categorical_cols])

In [291]:
# Redefine numeric and categorical columns 
numeric_cols = X_train.select_dtypes(include=np.number).columns
categorical_cols = X_train.select_dtypes(include='object').columns

In [292]:
# Encoding

# Create encoded arrays
train_encoded_array = enc.transform(X_train[categorical_cols])
val_encoded_array = enc.transform(X_val[categorical_cols])
test_encoded_array = enc.transform(X_test[categorical_cols])

# Create encoded dataframes with matching index
train_encoded_df = pd.DataFrame(train_encoded_array, columns=enc.get_feature_names_out(), index=X_train.index)
val_encoded_df = pd.DataFrame(val_encoded_array, columns=enc.get_feature_names_out(), index=X_val.index)
test_encoded_df = pd.DataFrame(test_encoded_array, columns=enc.get_feature_names_out(), index=X_test.index)

# Drop original categorical columns
X_train.drop(categorical_cols, axis=1, inplace=True)
X_val.drop(categorical_cols, axis=1, inplace=True)
X_test.drop(categorical_cols, axis=1, inplace=True)

# Concatenate encoded columns
X_train = pd.concat([X_train, train_encoded_df], axis=1)
X_val = pd.concat([X_val, val_encoded_df], axis=1)
X_test = pd.concat([X_test, test_encoded_df], axis=1)


In [293]:
X_train

Unnamed: 0,Pclass,Age,SibSp,Parch,Fare,Has_Cabin,Ticket_Count,IsAlone,IsChild,IsMaleChild,...,Embarked_C,Embarked_Q,Embarked_S,Embarked_nan,Title_Master,Title_Miss,Title_Mr,Title_Mrs,Title_Rare,Title_the Countess
298,1.0,29.421343,0.0,0.0,30.5000,1.0,1.0,1.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
884,3.0,25.000000,0.0,0.0,7.0500,0.0,1.0,1.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
247,2.0,24.000000,0.0,2.0,14.5000,0.0,2.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
478,3.0,22.000000,0.0,0.0,7.5208,0.0,1.0,1.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
305,1.0,0.920000,1.0,2.0,151.5500,1.0,4.0,0.0,1.0,1.0,...,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
106,3.0,21.000000,0.0,0.0,7.6500,0.0,1.0,1.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
270,1.0,29.421343,0.0,0.0,31.0000,0.0,1.0,1.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
860,3.0,41.000000,2.0,0.0,14.1083,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
435,1.0,14.000000,1.0,2.0,120.0000,1.0,2.0,0.0,1.0,0.0,...,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0


In [294]:
X_test.isna().sum()

Pclass                0
Age                   0
SibSp                 0
Parch                 0
Fare                  0
Has_Cabin             0
Ticket_Count          0
IsAlone               0
IsChild               0
IsMaleChild           0
Sex_female            0
Sex_male              0
Embarked_C            0
Embarked_Q            0
Embarked_S            0
Embarked_nan          0
Title_Master          0
Title_Miss            0
Title_Mr              0
Title_Mrs             0
Title_Rare            0
Title_the Countess    0
dtype: int64

In [295]:
numeric_cols

Index(['Pclass', 'Age', 'SibSp', 'Parch', 'Fare', 'Has_Cabin', 'Ticket_Count',
       'IsAlone', 'IsChild', 'IsMaleChild'],
      dtype='object')

In [296]:
categorical_cols

Index(['Sex', 'Embarked', 'Title'], dtype='object')

In [297]:
# Create scaler
scaler = MinMaxScaler()

In [298]:
# Fit on training data
scaler.fit(X_train[numeric_cols])

In [299]:
# Transform numeric columns
X_train[numeric_cols] = scaler.transform(X_train[numeric_cols])
X_val[numeric_cols] = scaler.transform(X_val[numeric_cols])
X_test[numeric_cols] = scaler.transform(X_test[numeric_cols])

In [300]:
X_train[numeric_cols].describe()

Unnamed: 0,Pclass,Age,SibSp,Parch,Fare,Has_Cabin,Ticket_Count,IsAlone,IsChild,IsMaleChild
count,668.0,668.0,668.0,668.0,668.0,668.0,668.0,668.0,668.0,668.0
mean,0.666916,0.36443,0.069237,0.062126,0.06281,0.223054,0.107285,0.607784,0.127246,0.068862
std,0.411854,0.163477,0.14816,0.132598,0.100724,0.416606,0.193945,0.48861,0.333498,0.25341
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.5,0.271174,0.0,0.0,0.015469,0.0,0.0,0.0,0.0,0.0
50%,1.0,0.36443,0.0,0.0,0.028107,0.0,0.0,1.0,0.0,0.0
75%,1.0,0.434531,0.125,0.0,0.059532,0.0,0.166667,1.0,0.0,0.0
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [301]:
len(X_train.columns)

22

## Training the Keras model

Now, we will train our neural network using the keras `Sequential()` class. 
The structure will be something like this:

Input layer - 22 neurons  
Dense layer - 64 neuerons, activation=relu  
Dense layer - 128 neurons, activation=relu  
Dense layer - 64 neurons, activation=relu  
Output layer - 1 neurons, activation=sigmoid

In [302]:
# Convert to numpy array
X_train_tf = np.array(X_train)
y_train_tf = y_train.astype(int).to_numpy() # For labels

X_val_tf = np.array(X_val)
y_val_tf = y_val.astype(int).to_numpy() # For labels

X_test_tf = np.array(X_test)

In [303]:
n_cols = len(X_train.columns)

In [304]:
tf_model = tf.keras.Sequential([
    tf.keras.Input(shape=(n_cols,)), 
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid'),
])

In [305]:
from tensorflow.keras.optimizers import Adam

# Create an optimizer with a custom learning rate
optimizer = Adam(learning_rate=0.0005)  # default is usually 0.001

# Compile your model with this optimizer
tf_model.compile(optimizer=optimizer, 
              loss='binary_crossentropy', 
              metrics=['accuracy']
)


In [306]:
tf_model.fit(X_train_tf, y_train_tf, epochs=10)

Epoch 1/10
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 11ms/step - accuracy: 0.5516 - loss: 0.6812
Epoch 2/10
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.7384 - loss: 0.5141
Epoch 3/10
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.7964 - loss: 0.4752
Epoch 4/10
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.8089 - loss: 0.4536
Epoch 5/10
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.7966 - loss: 0.4410
Epoch 6/10
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.8019 - loss: 0.4451
Epoch 7/10
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.8214 - loss: 0.4172
Epoch 8/10
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.8360 - loss: 0.4180
Epoch 9/10
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m

<keras.src.callbacks.history.History at 0x3689f4110>

In [307]:
tf_model.evaluate(X_val_tf, y_val_tf)

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.8215 - loss: 0.4207 


[0.4232493042945862, 0.8071748614311218]

## Tuning hyperparameters and model improvement

We see that our model is not doing too well, so looks like we need to change some hyperparameters. If this doesn't work, I'll try to add some new features.

In [308]:
# Adding callbacks

# Saves best models in case performance drops
from tensorflow.keras.callbacks import ModelCheckpoint
checkpoint = ModelCheckpoint("best_model.keras", save_best_only=True)

# Changes learning rate if val loss is not improving
from tensorflow.keras.callbacks import ReduceLROnPlateau
lr_scheduler = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=10)

In [309]:
tf_model = tf.keras.Sequential([
    tf.keras.Input(shape=(n_cols,)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid'),
])

# Create an optimizer with a custom learning rate
optimizer = Adam(learning_rate=0.01)  # default is usually 0.001

# Compile your model with this optimizer
tf_model.compile(optimizer=optimizer, 
              loss='binary_crossentropy', 
              metrics=['accuracy']
)

tf_model.fit(X_train_tf, y_train_tf, epochs=200, batch_size=64, shuffle=True, validation_data=(X_val_tf, y_val_tf), callbacks=[checkpoint, lr_scheduler])

Epoch 1/200
[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 31ms/step - accuracy: 0.7309 - loss: 0.5519 - val_accuracy: 0.7982 - val_loss: 0.4759 - learning_rate: 0.0100
Epoch 2/200
[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.7882 - loss: 0.4598 - val_accuracy: 0.8027 - val_loss: 0.5371 - learning_rate: 0.0100
Epoch 3/200
[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.8161 - loss: 0.4906 - val_accuracy: 0.7534 - val_loss: 0.5052 - learning_rate: 0.0100
Epoch 4/200
[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.8210 - loss: 0.4621 - val_accuracy: 0.8161 - val_loss: 0.4288 - learning_rate: 0.0100
Epoch 5/200
[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.8205 - loss: 0.4306 - val_accuracy: 0.7982 - val_loss: 0.4462 - learning_rate: 0.0100
Epoch 6/200
[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0

<keras.src.callbacks.history.History at 0x3a0762b70>

In [310]:
tf_model.evaluate(X_val_tf, y_val_tf)

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.7684 - loss: 0.4493 


[0.4550056755542755, 0.7668161392211914]

In [311]:
from tensorflow import keras

# Load the best saved model
best_model = keras.models.load_model("best_model.keras")

loss, accuracy = best_model.evaluate(X_val_tf, y_val_tf)
print(f"Best model accuracy: {accuracy:.4f}")

# You can also use it to make predictions:
preds = best_model.predict(X_test_tf)

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.8209 - loss: 0.4220  
Best model accuracy: 0.8161
[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step 


In [312]:
# Submit

# Step 1: Predict using your trained model
pred_probs = best_model.predict(X_test_tf)  # assuming X_test is preprocessed
preds = (pred_probs > 0.5).astype(int).flatten()  # binary classification

# Step 2: Create submission DataFrame
# Make sure you have the PassengerId from the original test.csv!
submission = pd.DataFrame({
    "PassengerId": X_test_id,  # This should be a pandas Series or np.array
    "Survived": preds
})

# Step 3: Save it to a CSV
submission.to_csv("submission.csv", index=False)
print("✅ Submission saved as 'submission.csv'")

[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
✅ Submission saved as 'submission.csv'


## Build and Train Custom Neural Network

Our highest test score was 78.468% with the neural network from tensorflow, let's see if we can do better.

In [486]:
class NeuralNetwork:

    def __init__(self):
        self.X_train = np.array(X_train)
        self.y_train = np.array(y_train)
        self.X_val = np.array(X_val)
        self.y_val = np.array(y_val)
        self.X_test = np.array(X_test)

        self.n_cols = n_cols

        def he_init(fan_in, fan_out):
            return np.random.randn(fan_out, fan_in) * np.sqrt(2 / fan_in)

        def xavier_init(fan_in, fan_out):
            return np.random.randn(fan_out, fan_in) * np.sqrt(1 / fan_in)

        self.W1 = he_init(22, 64)
        self.b1 = np.zeros((64, 1))

        self.W2 = he_init(64, 128)
        self.b2 = np.zeros((128, 1))

        self.W3 = he_init(128, 64)
        self.b3 = np.zeros((64, 1))

        self.W4 = xavier_init(64, 1)
        self.b4 = np.zeros((1, 1)) 
        self.models = {}

    @staticmethod
    def relu(x):
        return np.maximum(0, x)

    @staticmethod
    def sigmoid(x):
        return 1 / (1 + np.exp(-x))
    
    @staticmethod
    def cross_entropy(y_true, y_pred):
        epsilon = 1e-15 
        y_pred = np.clip(y_pred, epsilon, 1 - epsilon)  
        loss = -(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)).mean()
        return loss
        

    def fit(self, X, y, epochs=100, alpha=0.001, batch_size=167):
        self.epochs = epochs
        self.alpha = alpha
        self.batch_size = batch_size
        self.X = X
        self.y = y

        for self.epoch in range(self.epochs):
            
            self.n_samples = self.X.shape[0]

            permutation_indices = np.random.permutation(self.n_samples)
            self.X_train_shuffled = self.X[permutation_indices]
            self.y_train_shuffled = self.y[permutation_indices]

            for start in range(0, self.n_samples, self.batch_size):

                self.stop = start + self.batch_size

                self.X_batch = self.X_train_shuffled[start:self.stop] 
                self.y_batch = self.y_train_shuffled[start:self.stop]

                self.A0 = self.X_batch.T

                self.Z1 = self.W1 @ self.A0 + self.b1
                self.A1 = self.relu(self.Z1)

                self.Z2 = self.W2 @ self.A1 + self.b2
                self.A2 = self.relu(self.Z2)

                self.Z3 = self.W3 @ self.A2 + self.b3
                self.A3 = self.relu(self.Z3)

                self.Z4 = self.W4 @ self.A3 + self.b4
                self.A4 = self.sigmoid(self.Z4)

                self.y_batch = self.y_batch.T
                self.loss = self.cross_entropy(y_true=self.y_batch, y_pred=self.A4)

                self.m = self.A3.shape[1] 

                self.dZ4 = self.A4 - self.y_batch
                self.dW4 = (1 / self.m) * self.dZ4 @ self.A3.T
                self.db4 = (1 / self.m) * np.sum(self.dZ4, axis=1, keepdims=True)

                self.dA3 = self.W4.T @ self.dZ4
                self.dZ3 = self.dA3 * (self.Z3 > 0)  
                self.dW3 = (1 / self.m) * self.dZ3 @ self.A2.T
                self.db3 = (1 / self.m) * np.sum(self.dZ3, axis=1, keepdims=True)

                self.dA2 = self.W3.T @ self.dZ3
                self.dZ2 = self.dA2 * (self.Z2 > 0) 
                self.dW2 = (1 / self.m) * self.dZ2 @ self.A1.T  
                self.db2 = (1 / self.m) * np.sum(self.dZ2, axis=1, keepdims=True) 

                self.dA1 = self.W2.T @ self.dZ2
                self.dZ1 = self.dA1 * (self.Z1 > 0)
                self.dW1 = (1 / self.m) * self.dZ1 @ self.A0.T  
                self.db1 = (1 / self.m) * np.sum(self.dZ1, axis=1, keepdims=True)

                self.W4 -= alpha * self.dW4
                self.b4 -= alpha * self.db4

                self.W3 -= alpha * self.dW3
                self.b3 -= alpha * self.db3

                self.W2 -= alpha * self.dW2
                self.b2 -= alpha * self.db2

                self.W1 -= alpha * self.dW1
                self.b1 -= alpha * self.db1

                self.models[self.loss] = {
                    "W1": self.W1.copy(),
                    "W2": self.W2.copy(),
                    "W3": self.W3.copy(),
                    "W4": self.W4.copy(),
                    "b1": self.b1.copy(),
                    "b2": self.b2.copy(),
                    "b3": self.b3.copy(),
                    "b4": self.b4.copy(),
                }

            print(
                f"Epoch: {self.epoch+1}\nLoss: {self.loss}"
            )
        
        best_loss = min(self.models.keys())
        self.best_model = self.models[best_loss]

        self.W1 = self.best_model["W1"]
        self.W2 = self.best_model["W2"]
        self.W3 = self.best_model["W3"]
        self.W4 = self.best_model["W4"]

        self.b1 = self.best_model["b1"]
        self.b2 = self.best_model["b2"]
        self.b3 = self.best_model["b3"]
        self.b4 = self.best_model["b4"]

    def predict(self, input):
        self.A0 = input.T
        self.Z1 = self.W1 @ self.A0 + self.b1
        self.A1 = self.relu(self.Z1)
        self.Z2 = self.W2 @ self.A1 + self.b2
        self.A2 = self.relu(self.Z2)
        self.Z3 = self.W3 @ self.A2 + self.b3
        self.A3 = self.relu(self.Z3)
        self.Z4 = self.W4 @ self.A3 + self.b4
        self.A4 = self.sigmoid(self.Z4)

        return (self.A4 > 0.5).astype(int).flatten()

    
    def evaluate(self):
        predictions = self.predict(self.X_val)
        true_labels = self.y_val
        accuracy = np.mean(predictions == true_labels)

        A0 = self.X_val.T  
        Z1 = self.W1 @ A0 + self.b1
        A1 = self.relu(Z1)
        Z2 = self.W2 @ A1 + self.b2
        A2 = self.relu(Z2)
        Z3 = self.W3 @ A2 + self.b3
        A3 = self.relu(Z3)
        Z4 = self.W4 @ A3 + self.b4
        A4 = self.sigmoid(Z4)

        loss = self.cross_entropy(y_true=self.y_val.T, y_pred=A4)

        print(f"Loss: {loss:.4f}, Accuracy: {accuracy * 100:.2f}%")
        return loss, accuracy
    
    def predict_proba(self, input):
        self.A0 = input.T

        self.Z1 = self.W1 @ self.A0 + self.b1
        self.A1 = self.relu(self.Z1)

        self.Z2 = self.W2 @ self.A1 + self.b2
        self.A2 = self.relu(self.Z2)

        self.Z3 = self.W3 @ self.A2 + self.b3
        self.A3 = self.relu(self.Z3)

        self.Z4 = self.W4 @ self.A3 + self.b4
        self.A4 = self.sigmoid(self.Z4)

        return self.A4


In [487]:
neural_network = NeuralNetwork()
neural_network.fit(neural_network.X_train, neural_network.y_train, epochs=10000)

Epoch: 1
Loss: 0.6953111204227926
Epoch: 2
Loss: 0.7109199806717506
Epoch: 3
Loss: 0.7303534248096607
Epoch: 4
Loss: 0.7124847741209376
Epoch: 5
Loss: 0.7373087316622833
Epoch: 6
Loss: 0.7244223633572361
Epoch: 7
Loss: 0.7187325116468444
Epoch: 8
Loss: 0.7086191382614392
Epoch: 9
Loss: 0.6979516227856011
Epoch: 10
Loss: 0.7075185428905747
Epoch: 11
Loss: 0.7059511698806209
Epoch: 12
Loss: 0.6949000024588156
Epoch: 13
Loss: 0.6580868373234935
Epoch: 14
Loss: 0.656609379333355
Epoch: 15
Loss: 0.663556082479009
Epoch: 16
Loss: 0.7179460103980595
Epoch: 17
Loss: 0.6896354313324129
Epoch: 18
Loss: 0.6978715667666802
Epoch: 19
Loss: 0.697414933257463
Epoch: 20
Loss: 0.6789986210887295
Epoch: 21
Loss: 0.7004625216787261
Epoch: 22
Loss: 0.6918213483910925
Epoch: 23
Loss: 0.6903312070397766
Epoch: 24
Loss: 0.6880954221012393
Epoch: 25
Loss: 0.6899080622062803
Epoch: 26
Loss: 0.7120710919560433
Epoch: 27
Loss: 0.6399147835751916
Epoch: 28
Loss: 0.6993082603371292
Epoch: 29
Loss: 0.68322284083688

In [488]:
preds = neural_network.predict(X_train_tf)
accuracy_score(y_pred=preds, y_true=neural_network.y_train)

0.8547904191616766

In [489]:
neural_network.evaluate()

Loss: 0.4386, Accuracy: 79.70%


(0.438640201102203, 0.7969924812030075)

In [490]:
# Submit

# Step 1: Predict using your trained model
pred_probs = neural_network.predict(X_test_tf)  # assuming X_test is preprocessed
preds = (pred_probs > 0.5).astype(int).flatten()  # binary classification

# Step 2: Create submission DataFrame
# Make sure you have the PassengerId from the original test.csv!
submission = pd.DataFrame({
    "PassengerId": X_test_id,  # This should be a pandas Series or np.array
    "Survived": preds
})

# Step 3: Save it to a CSV
submission.to_csv("submission.csv", index=False)
print("✅ Submission saved as 'submission.csv'")

✅ Submission saved as 'submission.csv'


## Who Wins?

The model from tensorflow ended up winning by about 0.47%.  

Final standings:

1. Tensorflow - 78.468%
2. Custom - 77.990%

## Model Stacking

Now, we're going to try and leverage model stacking to improve our scores. The meta model in question right now will be a decision tree classifer. After that we will move to a gradient boosted random forest, and then finally, a neural network.

The following models will be inputted into our meta model:

- Neural Network (from tensorflow)
- My Neural Network
- Logistic Regression
- XGBoost
- RandomForest

In [491]:
from sklearn.model_selection import KFold
from sklearn.linear_model import LogisticRegression
from xgboost import XGBClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier

In [None]:
# Initialize arrays to store test predictions from each fold
log_reg_test_preds = np.zeros((len(X_test), 5))
nn_test_preds = np.zeros((len(X_test), 5))
xg_test_preds = np.zeros((len(X_test), 5))
rf_test_preds = np.zeros((len(X_test), 5))
my_nn_test_preds = np.zeros((len(X_test), 5))

kf = KFold(n_splits=5, shuffle=True, random_state=42)

for fold, (train_idx, val_idx) in enumerate(kf.split(X_train)):
    # Split training fold
    X_tr, y_tr = X_train.iloc[train_idx], y_train.iloc[train_idx]
    
    # Logistic Regression
    log_model = LogisticRegression()
    log_model.fit(X_tr, y_tr)
    log_reg_test_preds[:, fold] = log_model.predict_proba(X_test)[:, 1]
    
    # TensorFlow NN
    tf_model = tf.keras.Sequential([
        tf.keras.Input(shape=(n_cols,)),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(1, activation='sigmoid'),
    ])
    optimizer = Adam(learning_rate=0.01)
    tf_model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    tf_model.fit(X_tr, y_tr, epochs=200, batch_size=64, verbose=0)
    nn_test_preds[:, fold] = tf_model.predict(X_test).flatten()
    
    # XGBoost
    xgb_model = XGBClassifier(n_jobs=-1, random_state=42, enable_categorical=True,
                              max_depth=5, n_estimators=335)
    xgb_model.fit(X_tr, y_tr)
    xg_test_preds[:, fold] = xgb_model.predict_proba(X_test)[:, 1]
    
    # Random Forest
    rf_model = RandomForestClassifier(n_jobs=-1, random_state=42, n_estimators=300, max_depth=5)
    rf_model.fit(X_tr, y_tr)
    rf_test_preds[:, fold] = rf_model.predict_proba(X_test)[:, 1]
    
    # Your from-scratch Neural Network
    my_nn_model = NeuralNetwork()
    my_nn_model.fit(X_tr.values, y_tr.values, epochs=1000)  # make sure your fit accepts inputs!
    my_nn_test_preds[:, fold] = my_nn_model.predict_proba(X_test.values).flatten()

# Average predictions over folds for each model
log_reg_test_mean = log_reg_test_preds.mean(axis=1)
nn_test_mean = nn_test_preds.mean(axis=1)
xg_test_mean = xg_test_preds.mean(axis=1)
rf_test_mean = rf_test_preds.mean(axis=1)
my_nn_test_mean = my_nn_test_preds.mean(axis=1)

[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step 
Epoch: 1
Loss: 0.6214343297489929
Epoch: 2
Loss: 0.6494841861771196
Epoch: 3
Loss: 0.6926363176591283
Epoch: 4
Loss: 0.666616338045954
Epoch: 5
Loss: 0.6550056565443786
Epoch: 6
Loss: 0.6310318127149627
Epoch: 7
Loss: 0.6242983644406
Epoch: 8
Loss: 0.6421522555680308
Epoch: 9
Loss: 0.6669606642128769
Epoch: 10
Loss: 0.6345548870956809
Epoch: 11
Loss: 0.6714193343012351
Epoch: 12
Loss: 0.6564196830819857
Epoch: 13
Loss: 0.6308796263178664
Epoch: 14
Loss: 0.6701059073461039
Epoch: 15
Loss: 0.63110640884284
Epoch: 16
Loss: 0.6256810552878402
Epoch: 17
Loss: 0.6984597595686604
Epoch: 18
Loss: 0.647393841224826
Epoch: 19
Loss: 0.6470958935803518
Epoch: 20
Loss: 0.6740104346342155
Epoch: 21
Loss: 0.6825996651665315
Epoch: 22
Loss: 0.6684921816549868
Epoch: 23
Loss: 0.602539242553723
Epoch: 24
Loss: 0.6857408887442924
Epoch: 25
Loss: 0.6708657312019174
Epoch: 26
Loss: 0.6607098389965954
Epoch: 27
Loss: 0.650542520178225

In [515]:
import tensorflow as tf
tf.config.run_functions_eagerly(True)

In [521]:
X_meta_train = np.column_stack([
    log_reg_oof_train,
    nn_oof_train,
    xg_oof_train,
    my_nn_oof_train,
    rf_oof_train,
])

X_meta_val = np.column_stack([
    oof_val_mean_log_reg,
    nn_oof_mean,
    oof_val_mean_xg,
    my_nn_oof_mean,
    oof_val_mean_rf,
])

X_meta_test = np.column_stack([
    log_reg_test_preds.mean(axis=1),
    nn_test_preds.mean(axis=1),
    xg_test_preds.mean(axis=1),
    my_nn_test_preds.mean(axis=1),
    rf_test_preds.mean(axis=1),
])


meta_train_labels = y_train.values
meta_train_labels = np.array(meta_train_labels)
meta_val_labels = y_val.values

In [522]:
model = tf.keras.Sequential([
    tf.keras.Input(shape=(5,)),
    tf.keras.layers.Dense(64, activation="relu"),
    tf.keras.layers.Dense(128, activation="relu"),
    tf.keras.layers.Dense(64, activation="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid"),
])

new_optimizer = Adam(learning_rate=0.01)

model.compile(
    optimizer=new_optimizer,
    loss="binary_crossentropy",
    metrics=["accuracy"]
)

model.fit(X_meta_train, meta_train_labels, epochs=100, )

Epoch 1/100
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 33ms/step - accuracy: 0.6822 - loss: 0.6389
Epoch 2/100
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 31ms/step - accuracy: 0.8390 - loss: 0.4444
Epoch 3/100
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 30ms/step - accuracy: 0.8105 - loss: 0.4501
Epoch 4/100
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 30ms/step - accuracy: 0.8508 - loss: 0.4035
Epoch 5/100
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 33ms/step - accuracy: 0.8481 - loss: 0.3935
Epoch 6/100
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 30ms/step - accuracy: 0.8588 - loss: 0.3824
Epoch 7/100
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 30ms/step - accuracy: 0.8541 - loss: 0.3969
Epoch 8/100
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 30ms/step - accuracy: 0.8355 - loss: 0.4285
Epoch 9/100
[1m21/21[0m [32m━━━━━━━━━

<keras.src.callbacks.history.History at 0x365ce7290>

In [523]:
test_preds_proba = model.predict(X_meta_test).flatten()

# Convert probabilities to binary predictions using 0.5 threshold
test_preds = (test_preds_proba >= 0.5).astype(int)

# Assuming your test dataframe has 'PassengerId' column
submission_df = pd.DataFrame({
    'PassengerId': X_test_id,  # replace with your actual test dataframe variable
    'Survived': test_preds
})

# Save to CSV without index column
submission_df.to_csv('submission.csv', index=False)


[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step
