# Titanic Survival Prediction

**Name:** Elen  
**Date:** March 18, 2025

## Introduction
In this project, we will build machine learning models to predict the survival of passengers on the Titanic. Using the Titanic dataset from Seaborn, we will train multiple classification models: Decision Tree Classifier, Support Vector Machine (SVM), and Neural Network (NN), evaluate their performance, and interpret the results. We will focus on various input features to predict the target variable, "survived."

The steps involve data cleaning, feature engineering, model training, performance evaluation, and comparisons. We will explore different feature combinations to observe how they affect the accuracy of the models.

## Importing Libraries

In this section, we will import the necessary Python libraries to perform data manipulation, model training, and evaluation. These libraries will help us load the Titanic dataset, handle missing values, perform machine learning tasks, and visualize results.


In [2]:
# Import necessary libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report, confusion_matrix

## Section 1: Import and Inspect the Data

In this section, we will load the Titanic dataset using the `seaborn` library, which provides easy access to the dataset. We'll perform a quick inspection of the data to understand its structure, including the number of rows and columns, data types, and any missing values.

### Load Titanic Dataset

We will use the `seaborn` library to load the Titanic dataset. This dataset includes information about passengers on the Titanic, including features like age, sex, class, and whether they survived.

In [3]:
# Load Titanic dataset
titanic = sns.load_dataset('titanic')

# Display the first few rows of the dataset
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


## Section 2: Data Exploration and Preparation

### 2.1 Handle Missing Values

In this step, we will handle any missing values in the dataset. Specifically, we'll impute missing values for the `age` column using the median value of the column, and for the `embark_town` column using the mode (most frequent value).

In [7]:
# Check for missing values before imputation
print("Missing values before imputation:")
print(titanic.isnull().sum())

# Fill missing values
titanic['age'] = titanic['age'].fillna(titanic['age'].median())
titanic['embark_town'] = titanic['embark_town'].fillna(titanic['embark_town'].mode()[0])

# Check for missing values after imputation
print("\nMissing values after imputation:")
print(titanic.isnull().sum())

Missing values before imputation:
survived         0
pclass           0
sex              0
age              0
sibsp            0
parch            0
fare             0
embarked         2
class            0
who              0
adult_male       0
deck           688
embark_town      0
alive            0
alone            0
dtype: int64

Missing values after imputation:
survived         0
pclass           0
sex              0
age              0
sibsp            0
parch            0
fare             0
embarked         2
class            0
who              0
adult_male       0
deck           688
embark_town      0
alive            0
alone            0
dtype: int64


### 2.2 Feature Engineering

In this step, we will create new features and convert categorical features into numerical representations. The specific transformations we will apply are:

1. **Create a `family_size` feature**: 
   - This new feature will be the sum of the `sibsp` (siblings/spouses aboard) and `parch` (parents/children aboard), plus 1 to account for the individual themselves.

2. **Convert categorical features into numerical values**:
   - `sex`: Map the values 'male' to 0 and 'female' to 1.
   - `embarked`: Map the values 'C', 'Q', and 'S' to 0, 1, and 2, respectively.
   - `alone`: Convert the boolean values `True` and `False` into 1 and 0, respectively.

Here is the Python code to perform these transformations:

In [9]:
# Create new features and transform the categorical ones
titanic['family_size'] = titanic['sibsp'] + titanic['parch'] + 1
titanic['sex'] = titanic['sex'].map({'male': 0, 'female': 1})
titanic['embarked'] = titanic['embarked'].map({'C': 0, 'Q': 1, 'S': 2})
titanic['alone'] = titanic['alone'].astype(int)

# Display the first few rows to check the changes
print(titanic[['family_size', 'sex', 'embarked', 'alone']].head())

   family_size  sex  embarked  alone
0            2  NaN       NaN      0
1            2  NaN       NaN      0
2            1  NaN       NaN      1
3            2  NaN       NaN      0
4            1  NaN       NaN      1
