# Deep Learning Challenge (optional)

![](./titanic_intro.png)

In the early 20th century, the RMS Titanic was the pinnacle of luxury and innovation, a marvel of modern engineering. It was hailed as the "unsinkable" ship, carrying over 2,200 passengers and crew on its maiden voyage across the Atlantic. However, in the icy waters of the North Atlantic, disaster struck, and the unthinkable happened—the Titanic collided with an iceberg and sank, leading to one of the most tragic maritime disasters in history.

Now, over a century later, you are tasked with an important mission: to delve into the historical data and build a predictive model that could have foretold the fate of the passengers aboard the Titanic. This dataset contains detailed records of the passengers, including information such as age, gender, ticket class, family size, and more. **Your goal is to develop a neural network model that accurately predicts whether a passenger would have survived or perished on that fateful night.**

Your predictive model won't just be a technical achievement; it will serve as a lens through which we can better understand the human factors and decisions that played a critical role in survival. As you work through this challenge, you’ll follow the standard deep learning workflow, applying your skills to each stage:

- Data Collection: The data you need has already been gathered from historical records.
- Data Preprocessing: Clean and prepare the data for analysis (partially done for you).
- Exploratory Data Analysis (EDA): Investigate the data and uncover key patterns (partially done for you).
- Feature Engineering: Create or modify features to enhance your model’s performance (paritally done for you).
- Model Architecture Design: Choose an appropriate structure for your neural network model.
- Training: Train your model using the provided dataset.
- Evaluation: Assess your model's accuracy using a validation set and other techniques.
- Hyperparameter Tuning: Fine-tune the model’s parameters to improve performance.
- Model Testing: Test your final model on a separate test set.



Please include ALL your work and thought process in this notebook. We recommend using pytorch (preferred) or tensorflow for developing your neural network model.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# any other packages you would like to use
# (scikit-learn, pytorch, tensorflow, etc.)

%matplotlib inline



In [None]:
# sibsp        # number of siblings / spouses aboard the Titanic 	
# parch        # number of parents / children aboard the Titanic 	
# ticket       # Ticket number 	
# fare         # Passenger fare
# cabin        # Cabin number
# embark_town  # Port of Embarkation 	

#sibsp: The dataset defines family relations in this way:
#Sibling = brother, sister, stepbrother, stepsister
#Spouse = husband, wife (mistresses and fiancés were ignored)

#parch: The dataset defines family relations in this way:
#Parent = mother, father
#Child = daughter, son, stepdaughter, stepson
#Some children travelled only with a nanny, therefore parch=0 for them.

# load titanic dataset
df = sns.load_dataset("titanic")

### Exploratory Data Analysis
Provided below is some starter code to help familiarize yourself with the Titanic dataset. Further data analysis is not required, but is encouraged to obtain information for the development of your model.

In [None]:
df.head()

In [None]:
# df.shape
# df.isna().sum()
# df.describe()

# Countplot for Survived
#plt.figure(figsize=(8, 6))
#sns.countplot(x='survived', data=df, palette='Set2')
#plt.title('Survival Count')
#plt.xlabel('Survived (0 = No, 1 = Yes)')
#plt.ylabel('Count')
#plt.show()

# Countplot for Pclass
#plt.figure(figsize=(8, 6))
#sns.countplot(x='pclass', data=df, palette='Set3')
#plt.title('Passenger Class Distribution')
#plt.xlabel('Passenger Class')
#plt.ylabel('Count')
#plt.show()

# Distribution of Age
#plt.figure(figsize=(10, 6))
#sns.histplot(df['age'].dropna(), kde=True, bins=30, color='blue')
#plt.title('Age Distribution of Passengers')
#plt.xlabel('Age')
#plt.ylabel('Frequency')
#plt.show()

# Survival by Sex
#plt.figure(figsize=(8, 6))
#sns.countplot(x='sex', hue='survived', data=df, palette='Set1')
#plt.title('Survival by Sex')
#plt.xlabel('Sex')
#plt.ylabel('Count')
#plt.show()

# Survival by Passenger Class
#plt.figure(figsize=(8, 6))
#sns.countplot(x='pclass', hue='survived', data=df, palette='Set2')
#plt.title('Survival by Passenger Class')
#plt.xlabel('Passenger Class')
#plt.ylabel('Count')
#plt.show()

# Survival by Embark Town
#plt.figure(figsize=(8, 6))
#sns.countplot(x='embark_town', hue='survived', data=df, palette='Set1')
#plt.title('Survival by Embarkation Town')
#plt.xlabel('Embarkation Town')
#plt.ylabel('Count')
#plt.show()

### Data Preprocessing

In [None]:
# drop columns that are redundant or contain many NaN values
df = df.drop(["pclass", "alive", "embarked", "alone", "adult_male", "deck", "age"], axis = 1)
df = df.dropna(subset=["embark_town"])

# TODO: Further data preprocessing (optional)


### Feature Engineering

In [None]:
# One Hot Encode categorical variables
df["sex"] = df["sex"].map({"male": 0, "female": 1})
for label in ["class", "who", "embark_town"]:
    df = df.join(pd.get_dummies(df[label], prefix=label))
    df = df.drop(label, axis=1)

# TODO: Further feature engineering (optional)

### Model Development

In [None]:
# TODO