# Titanic ML/DL Challenge
### Use of Deep Learning DNN programm to solve Titanic challenge
#### You can retrieve description about this challenge in the README.md file
**Link of the challenge : https://www.kaggle.com/competitions/titanic** 

## Prepare Data

#### First, we must import each dependancies we're going to use

In [210]:
# We need tensorflow, keras, pandas and numpy dependancies

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras

In [211]:
# Version of each dependancies
# Uncomment if you want to check versions

# print("numpy version:")
# print(np.__version__)

# print() # Uncomment this one to display properly

# print("pandas version:")
# print(pd.__version__)

# print() # Uncomment this one to display properly

# print("tensorflow version:")
# print(tf.__version__)

# print() # Uncomment this one to display properly

# print("keras version:")
# print(keras.__version__)

#### Read data in .csv files
**Use pandas to read data from .csv file**

In [212]:
# Convert 'train.csv' in dataframe format and keep only 'Survived', 'Pclass', 'Sex', 'Age', 'Parch', 'SibSp' and 'Fare' columns
# Read README.md to know why I keep theses
data_train = pd.read_csv("train.csv", usecols=["Survived", "Pclass", "Sex", "Age", "Parch", "SibSp", "Fare"])

# Read data_train

# print(data_train.head())
# Uncomment the previous lines if you want to check the result
# print() # Uncomment this one to display properly
print("Number of person in data_train before cleaning:")
print(len(data_train))

Number of person in data_train before cleaning:
891


In [213]:
# Clean data to keep only person with all columns filled

data_train = data_train.dropna()

# print(data_train.head())
# Uncomment the previous line if you want to check the result
# print() # Uncomment this one to display properly
print("Number of person in the data_train after cleaning:")
print(len(data_train)) # Number of person in data_train after cleaning

Number of person in the data_train after cleaning:
714


#### Change options in 'Sex' column
We must to change 'male' and 'female' options in 'Sex' column because Deep Learning DNN programm can only take number to produce results

**One possibility is to change 'male' to '1' and 'female' to '0' to differenciate them with number (or '0' to male and '1' to 'female' it doesn't matter)**


In [214]:
# Change 'male' to '1' and 'female' to '0'

data_train["Sex"] = data_train["Sex"].map({'male': 1, 'female': 0}).fillna(data_train["Sex"])

# print(data_train.head())
# Uncomment the previous line if you want to check the result

## Create DNN

#### To create the DNN, we use Keras from tensorflow
**We must import the sequential model, Dense layers**

**Also, we need scikit-learn to normalize data via MinMaxScaler**

In [215]:
# import all we need 

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.preprocessing import MinMaxScaler

**Then, we need to transform our data_train to be used in our model**

In [216]:
# Transform data_train

# Extract variables and label

X_train = data_train.drop('Survived', axis=1) # Remove Survived column
Y_train = data_train['Survived'] # Extract label

# Convert in numpy array

X_train = X_train.values
Y_train = Y_train.values

# print(X_train) # Uncomment to see the result


In [217]:
# Build DNN

# Get input shape
input_size = X_train.shape[1]

# Normalize data
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)


# Create Model
model = Sequential([
    Dense(8, activation='relu', input_shape=(input_size,)), # input layer (6 nodes)
    Dense(8, activation='relu'),
    Dense(1, activation='sigmoid'),
])

# Compile Model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Summary Model
model.summary()

Model: "sequential_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_30 (Dense)            (None, 8)                 56        
                                                                 
 dense_31 (Dense)            (None, 8)                 72        
                                                                 
 dense_32 (Dense)            (None, 1)                 9         
                                                                 
Total params: 137
Trainable params: 137
Non-trainable params: 0
_________________________________________________________________


**Train model**

In [218]:
# Training

model.fit(X_train, Y_train, epochs=100, batch_size=32, validation_split=0.2) 



Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x7bd65042b3a0>

#### Then we need to get data test and clean it

**Import test data and transform it in dataframe**

**Keep 'Sex', 'Pclass', 'Age', 'Parch', 'Fare', 'SibSp' columns**

In [219]:
data_test_with_passengerid = pd.read_csv("test.csv", usecols=["PassengerId", "Sex", "Pclass", "Age", "Parch", "Fare", "SibSp"])

# print(data_test.head()) # Uncomment to see result
print()
# print("Number of person:") # Uncomment to see result
# print(len(data_test)) # Uncomment to see result




In [220]:
# Remove every person without complete row

data_test_with_passengerid = data_test_with_passengerid.dropna()

# print(data_test.head()) # Uncomment to see result
print()
# print("Number of person:") # Uncomment to see result
# print(len(data_test)) # Uncomment to see result




In [221]:
# Change 'male' to '1' and 'female' to '0'

data_test_with_passengerid["Sex"] = data_test_with_passengerid["Sex"].map({'male': 1, 'female': 0}).fillna(data_test_with_passengerid["Sex"])

# print(data_train.head())
# Uncomment the previous line if you want to check the result

**Extract all columns except 'PassengerId'**

In [222]:
# Extract

data_test = data_test_with_passengerid[["Sex", "Pclass", "Age", "Parch", "Fare", "SibSp"]]

# print(data_test.head()) # Uncomment to see the result

**Transform data_test in numpy array**

In [223]:
# Transform

X_test = data_test.values

# Normalize data
scaler = MinMaxScaler()
X_test = scaler.fit_transform(X_test)



## Use of DNN programm
#### Prediction

**We're going to use our model to predict label for the data_test**

In [224]:
# Prediction

predictions = model.predict(X_test)

# print(predictions) # Uncomment to see the result before classification

predictions = (predictions >= 0.5).astype(int)

# print (predictions) # Uncomment to see the result after classification




#### Convert data to csv
**Now, we're going to add 'predictions' dataframe to data_test_with_passengerid dataframe**

In [225]:
# Add 'predictions' as column in data_test

data_test_with_passengerid['Survived'] = predictions

# print(data_test.head(50)) # Uncomment to see the result

**Extract 'PassengerId' and 'Survived' columns**

In [226]:
# Extract

results = data_test_with_passengerid[['PassengerId', 'Survived']]

# print(results.head(50)) # Uncomment to see the result

**Convert 'result' to csv**

In [227]:
# Convert to 'results.csv' file

results.to_csv("results.csv", index=False)


#### End of the challenge, results are in the 'results.csv' file in the current directory