# Deep Learning Challenge (optional)

![](../../public/titanic_intro.png)

In the early 20th century, the RMS Titanic was the pinnacle of luxury and innovation, a marvel of modern engineering. It was hailed as the "unsinkable" ship, carrying over 2,200 passengers and crew on its maiden voyage across the Atlantic. However, in the icy waters of the North Atlantic, disaster struck, and the unthinkable happened—the Titanic collided with an iceberg and sank, leading to one of the most tragic maritime disasters in history.

Now, over a century later, you are tasked with an important mission: to delve into the historical data and build a predictive model that could have foretold the fate of the passengers aboard the Titanic. This dataset contains detailed records of the passengers, including information such as age, gender, ticket class, family size, and more. **Your goal is to develop a neural network model that accurately predicts whether a passenger would have survived or perished on that fateful night.**

Your predictive model won't just be a technical achievement; it will serve as a lens through which we can better understand the human factors and decisions that played a critical role in survival. As you work through this challenge, you’ll follow the standard deep learning workflow, applying your skills to each stage:

- Data Collection: The data you need has already been gathered from historical records.
- Data Preprocessing: Clean and prepare the data for analysis (partially done for you).
- Exploratory Data Analysis (EDA): Investigate the data and uncover key patterns (partially done for you).
- Feature Engineering: Create or modify features to enhance your model’s performance (paritally done for you).
- Model Architecture Design: Choose an appropriate structure for your neural network model.
- Training: Train your model using the provided dataset.
- Evaluation: Assess your model's accuracy using a validation set and other techniques.
- Hyperparameter Tuning: Fine-tune the model’s parameters to improve performance.
- Model Testing: Test your final model on a separate test set.



Please include ALL your work and thought process in this notebook.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

import torch
from torch import nn
from torch.utils.data import DataLoader, TensorDataset

import tensorflow as tf

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# feel free to add any additional packages as you see fit


In [None]:
# sibsp        # number of siblings / spouses aboard the Titanic 	
# parch        # number of parents / children aboard the Titanic 	
# ticket       # Ticket number 	
# fare         # Passenger fare
# cabin        # Cabin number
# embark_town  # Port of Embarkation 	

#sibsp: The dataset defines family relations in this way:
#Sibling = brother, sister, stepbrother, stepsister
#Spouse = husband, wife (mistresses and fiancés were ignored)

#parch: The dataset defines family relations in this way:
#Parent = mother, father
#Child = daughter, son, stepdaughter, stepson
#Some children travelled only with a nanny, therefore parch=0 for them.

# load titanic dataset (DO NOT MODIFY)
df = sns.load_dataset("titanic")

## Exploratory Data Analysis
Provided below is some starter code to help familiarize yourself with the Titanic dataset. Further data analysis is not required, but is encouraged to obtain information for the development of your model.

In [None]:
df.head()

####################################
# Uncomment to See Result
####################################

# df.shape
# df.isna().sum()
# df.describe()

##---> Countplot for Survived
#plt.figure(figsize=(8, 6))
#sns.countplot(x='survived', data=df, palette='Set2')
#plt.title('Survival Count')
#plt.xlabel('Survived (0 = No, 1 = Yes)')
#plt.ylabel('Count')
#plt.show()

##---> Countplot for Pclass
#plt.figure(figsize=(8, 6))
#sns.countplot(x='pclass', data=df, palette='Set3')
#plt.title('Passenger Class Distribution')
#plt.xlabel('Passenger Class')
#plt.ylabel('Count')
#plt.show()

##---> Distribution of Age
#plt.figure(figsize=(10, 6))
#sns.histplot(df['age'].dropna(), kde=True, bins=30, color='blue')
#plt.title('Age Distribution of Passengers')
#plt.xlabel('Age')
#plt.ylabel('Frequency')
#plt.show()

##---> Survival by Sex
#plt.figure(figsize=(8, 6))
#sns.countplot(x='sex', hue='survived', data=df, palette='Set1')
#plt.title('Survival by Sex')
#plt.xlabel('Sex')
#plt.ylabel('Count')
#plt.show()

##---> Survival by Passenger Class
#plt.figure(figsize=(8, 6))
#sns.countplot(x='pclass', hue='survived', data=df, palette='Set2')
#plt.title('Survival by Passenger Class')
#plt.xlabel('Passenger Class')
#plt.ylabel('Count')
#plt.show()

##---> Survival by Embark Town
#plt.figure(figsize=(8, 6))
#sns.countplot(x='embark_town', hue='survived', data=df, palette='Set1')
#plt.title('Survival by Embarkation Town')
#plt.xlabel('Embarkation Town')
#plt.ylabel('Count')
#plt.show()

## Data Preprocessing

In [None]:
# drop columns that are redundant or contain many NaN values
df = df.drop(["pclass", "alive", "embarked", "alone", "adult_male", "deck", "age"], axis = 1)
df = df.dropna(subset=["embark_town"])

# Further data preprocessing (optional)


## Feature Engineering

In [None]:
# One Hot Encode categorical variables
df["sex"] = df["sex"].map({"male": 0, "female": 1})
for label in ["class", "who", "embark_town"]:
    df = df.join(pd.get_dummies(df[label], prefix=label))
    df = df.drop(label, axis=1)

# Further feature engineering (optional)

## Model Development

Provided is some starter code to help with getting the data loaded in. You are free to modify any of the code provided as you see fit.

We recommend using pytorch (preferred) or tensorflow for developing your neural network model. Note that you will have to adapt the quickstart guides below to accept a different input shape.

### PyTorch Resources:
[PyTorch Quickstart](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html)

[torch.nn documentation](https://pytorch.org/docs/stable/nn.html)

[torch.nn.Sequential() docs](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html)

### TensorFlow Resources:
[TensorFlow Quickstart](https://www.tensorflow.org/tutorials/quickstart/beginner)

[Training \& evaluation with the built-in methods](https://www.tensorflow.org/guide/keras/training_with_built_in_methods)

[Making new layers and models via subclassing](https://www.tensorflow.org/guide/keras/making_new_layers_and_models_via_subclassing)

[tf.keras.Sequential() docs](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential)


In [None]:
X = df.drop(columns=['survived'])
y = df['survived']

# Split data into train, validation, and test sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_val, y_val, test_size=0.4, random_state=42)


### PyTorch:

In [None]:
# PyTorch

# Convert data to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
X_val = torch.tensor(X_val, dtype=torch.float32)
y_train = torch.tensor(y_train.values, dtype=torch.float32).unsqueeze(1)
y_val = torch.tensor(y_val.values, dtype=torch.float32).unsqueeze(1)

train_dataset = TensorDataset(X_train, y_train)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.fc1 = nn.Linear(in_channels=X_train.shape[1], out_channels=1)
        # ...
    
    def forward(self, x):
        x = self.fc1(x)
        # ...
        return x

# Alternative method of defining your model:
# model = nn.Sequential(
#     nn.Linear(X_train.shape[1], 64),
#     ...,
#     )
model = NeuralNetwork()

# TODO

### TensorFlow:

In [None]:
# TensorFlow
class NeuralNetwork(tf.keras.Model):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.fc1 = tf.keras.layers.Dense(1)
        # ...
    
    def call(self, x):
        x = self.fc1(x)
        # ...
        return x

# Alternative method of defining your model:
# model = tf.keras.models.Sequential([
#    tf.keras.layers.Dense(64, X_train.shape[1]),
#    ...,
#    ])
model = NeuralNetwork()

# TODO