# Multilayer Perceptron 

In [1]:
import pandas as pd
import numpy as np
from sklearn import preprocessing

In [2]:
import tensorflow as tf
from tensorflow.keras import layers

print(tf.version.VERSION)
print(tf.keras.__version__)

1.14.0
2.2.4-tf


# Objective:

Implement a MLP Neural Network for predicting churn from a telecom dataset. 

#### Content

Each row represents a customer, each column contains customer’s attributes described on the column Metadata.

The data set includes information about:

* Customers who left within the last month – the column is called Churn
* Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
* Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly  charges, and total charges
* Demographic info about customers – gender, age range, and if they have partners and dependents


### Columns 
* **customerID**: Customer ID
* **gender**: Whether the customer is a male or a female
* **SeniorCitizen**: Whether the customer is a senior citizen or not (1, 0)
* **Partner**: Whether the customer has a partner or not (Yes, No)
* **Dependents**: Whether the customer has dependents or not (Yes, No)
* **tenureNumber**: of months the customer has stayed with the company
* **PhoneService**: Whether the customer has a phone service or not (Yes, No)
* **MultipleLines**: Whether the customer has multiple lines or not (Yes, No, No phone service)
* **InternetService**: Customer’s internet service provider (DSL, Fiber optic, No)
* **OnlineSecurity**: Whether the customer has online security or not (Yes, No, No internet service)
* **OnlineBackup**: Whether the customer has online backup or not (Yes, No, No internet service)
* **DeviceProtection**: Whether the customer has device protection or not (Yes, No, No internet service)
* **TechSupport**: Whether the customer has tech support or not (Yes, No, No internet service)
* **StreamingTV**: Whether the customer has streaming TV or not (Yes, No, No internet service)
* **StreamingMovies**: Whether the customer has streaming movies or not (Yes, No, No internet service)
* **Contract**: The contract term of the customer (Month-to-month, One year, Two year)
* **PaperlessBilling**: Whether the customer has paperless billing or not (Yes, No)
* **PaymentMethod**: The customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))
* **MonthlyCharges**: The amount charged to the customer monthly
* **TotalCharges**: The total amount charged to the customer
* **Churn**: Whether the customer churned or not (Yes or No)

In [3]:
df = pd.read_csv("telco-customer-churn\churn_df.csv")

In [6]:
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


## Quick exploration of variables 

In [7]:
df.describe()

Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges
count,7043.0,7043.0,7043.0
mean,0.162147,32.371149,64.761692
std,0.368612,24.559481,30.090047
min,0.0,0.0,18.25
25%,0.0,9.0,35.5
50%,0.0,29.0,70.35
75%,0.0,55.0,89.85
max,1.0,72.0,118.75


In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
customerID          7043 non-null object
gender              7043 non-null object
SeniorCitizen       7043 non-null int64
Partner             7043 non-null object
Dependents          7043 non-null object
tenure              7043 non-null int64
PhoneService        7043 non-null object
MultipleLines       7043 non-null object
InternetService     7043 non-null object
OnlineSecurity      7043 non-null object
OnlineBackup        7043 non-null object
DeviceProtection    7043 non-null object
TechSupport         7043 non-null object
StreamingTV         7043 non-null object
StreamingMovies     7043 non-null object
Contract            7043 non-null object
PaperlessBilling    7043 non-null object
PaymentMethod       7043 non-null object
MonthlyCharges      7043 non-null float64
TotalCharges        7043 non-null object
Churn               7043 non-null object
dtypes: float64(1), int64(2), obj

# Data Preprocessing 

To do's:

* Encode text variables in order to feed them to a neural network.
* Drop customerID as it is of no use 
* Deal with missing data.

In [4]:
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
encoder = preprocessing.LabelEncoder()
onehotencoder = OneHotEncoder(categorical_features = [0])
scaler = MinMaxScaler()

In [5]:
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'],errors='coerce')
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'],errors='coerce')
df['TotalCharges'] = df['TotalCharges'].fillna(0.0)
df[['TotalCharges', 'MonthlyCharges']]= scaler.fit_transform(df[['TotalCharges', 'MonthlyCharges']])
df['MonthlyCharges'] = pd.to_numeric(df['MonthlyCharges'],errors='coerce')
df['Churn_encoded'] = encoder.fit_transform(df['Churn'])
df['gender_encoded'] = encoder.fit_transform(df['gender'])
df['Partner_encoded'] = encoder.fit_transform(df['Partner'])
df['Dependents_encoded'] = encoder.fit_transform(df['Dependents'])
df['PhoneService_encoded'] = encoder.fit_transform(df['PhoneService'])
df['PaperlessBilling_encoded'] = encoder.fit_transform(df['PaperlessBilling'])

In [6]:
def dummy_creator(col_name,dataset):
    dummies = pd.get_dummies(dataset[col_name]).rename(columns=lambda x: col_name + '_'+ str(x))
    #dataset = pd.concat([dataset, dummies], axis=1)
    return dummies



In [7]:
df = pd.concat([df, dummy_creator('TechSupport',df)], axis=1)
df = pd.concat([df, dummy_creator('MultipleLines',df)], axis=1)
df = pd.concat([df, dummy_creator('InternetService',df)], axis=1)
df = pd.concat([df, dummy_creator('OnlineSecurity',df)], axis=1)
df = pd.concat([df, dummy_creator('OnlineBackup',df)], axis=1)
df = pd.concat([df, dummy_creator('DeviceProtection',df)], axis=1)
df = pd.concat([df, dummy_creator('TechSupport',df)], axis=1)
df = pd.concat([df, dummy_creator('StreamingTV',df)], axis=1)
df = pd.concat([df, dummy_creator('StreamingMovies',df)], axis=1)
df = pd.concat([df, dummy_creator('Contract',df)], axis=1)
df = pd.concat([df, dummy_creator('PaymentMethod',df)], axis=1)

### dropping not encoded columns 

In [8]:
to_drop= ['customerID','gender','Churn','Partner','Dependents','PhoneService','PaperlessBilling',
          'TechSupport','MultipleLines','InternetService','OnlineSecurity',
          'OnlineBackup','DeviceProtection','TechSupport',
          'StreamingTV','StreamingMovies','Contract','PaymentMethod']

df.drop(to_drop,axis=1).columns

Index(['SeniorCitizen', 'tenure', 'MonthlyCharges', 'TotalCharges',
       'Churn_encoded', 'gender_encoded', 'Partner_encoded',
       'Dependents_encoded', 'PhoneService_encoded',
       'PaperlessBilling_encoded', 'TechSupport_No',
       'TechSupport_No internet service', 'TechSupport_Yes',
       'MultipleLines_No', 'MultipleLines_No phone service',
       'MultipleLines_Yes', 'InternetService_DSL',
       'InternetService_Fiber optic', 'InternetService_No',
       'OnlineSecurity_No', 'OnlineSecurity_No internet service',
       'OnlineSecurity_Yes', 'OnlineBackup_No',
       'OnlineBackup_No internet service', 'OnlineBackup_Yes',
       'DeviceProtection_No', 'DeviceProtection_No internet service',
       'DeviceProtection_Yes', 'TechSupport_No',
       'TechSupport_No internet service', 'TechSupport_Yes', 'StreamingTV_No',
       'StreamingTV_No internet service', 'StreamingTV_Yes',
       'StreamingMovies_No', 'StreamingMovies_No internet service',
       'StreamingMovies_Yes'

In [9]:
df = df.drop(to_drop,axis= 1)

In [15]:
df.head()

Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges,TotalCharges,Churn_encoded,gender_encoded,Partner_encoded,Dependents_encoded,PhoneService_encoded,PaperlessBilling_encoded,...,StreamingMovies_No,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check
0,0,1,0.115423,0.003437,0,0,1,0,0,1,...,1,0,0,1,0,0,0,0,1,0
1,0,34,0.385075,0.217564,0,1,0,0,1,0,...,1,0,0,0,1,0,0,0,0,1
2,0,2,0.354229,0.012453,1,1,0,0,1,1,...,1,0,0,1,0,0,0,0,0,1
3,0,45,0.239303,0.211951,0,1,0,0,0,0,...,1,0,0,0,1,0,1,0,0,0
4,0,2,0.521891,0.017462,1,0,0,0,1,1,...,1,0,0,1,0,0,0,0,1,0


## Model implementation

labels represent our variable to predict. df_features are our independent variables.

In [10]:
labels = df['Churn_encoded']
labels.head()
df_features = df.drop(['Churn_encoded'],axis=1)

In [11]:
df_features.shape

(7043, 43)

##  Partioning the datasets 
Data sets will be split in three: training, validation and testing sets. 

In [12]:
### Splitting the datsets

from sklearn.model_selection import train_test_split
X_train, X_test_validation, y_train, y_test_validation = train_test_split(df_features, labels,
                                                    stratify=labels, 
                                                    test_size=0.4)

In [13]:
### Splitting the datsets
X_validation, X_test, y_validation, y_test = train_test_split(X_test_validation, y_test_validation,
                                                    stratify=y_test_validation, 
                                                    test_size=0.5)

In [14]:
X_test.shape
X_validation.shape

(1409, 43)

In [19]:
X_train

Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges,TotalCharges,gender_encoded,Partner_encoded,Dependents_encoded,PhoneService_encoded,PaperlessBilling_encoded,TechSupport_No,...,StreamingMovies_No,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check
4923,0,52,0.313433,0.291953,0,0,0,1,1,1,...,1,0,0,1,0,0,1,0,0,0
5074,0,49,0.011940,0.106082,0,0,1,1,0,0,...,0,1,0,0,1,0,0,1,0,0
2574,1,39,0.770149,0.432831,1,0,0,1,1,1,...,0,0,1,1,0,0,1,0,0,0
5121,0,29,0.523881,0.226211,1,0,0,1,0,1,...,0,0,1,0,1,0,0,1,0,0
4663,1,4,0.563682,0.037047,0,0,0,1,1,1,...,1,0,0,1,0,0,0,0,1,0
1431,0,43,0.463682,0.334861,0,1,0,1,1,0,...,1,0,0,0,1,0,1,0,0,0
2730,0,49,0.879602,0.595074,1,0,0,1,1,0,...,0,0,1,0,1,0,1,0,0,0
2614,0,67,0.757214,0.730178,0,1,1,1,1,0,...,0,0,1,0,1,0,0,1,0,0
2190,0,71,0.656219,0.685894,1,1,0,1,0,0,...,0,0,1,0,0,1,0,1,0,0
1927,0,47,0.886567,0.589415,1,1,1,1,1,0,...,0,0,1,1,0,0,1,0,0,0


In [20]:
type(y_train)

pandas.core.series.Series

## Experiment 1:
* First layer with 43 input features combined with a dropout layer.
* Second layer with 64 neurons combined with a dropout layer.
* Output layer with sigmoid activation 
* rmsprop optimizer
* cross entropy as loss function 


### Creating a ModelCheckpoint 

Model checkpoints will be implemented for every experiment. This is done in order to preserve the best performing models and to checkpoint our data to avoid data loss.

* ModelCheckpoint will save the best model based on validation loss while training. 
* EarlyStopping will stop training when validation loss is no longer decreasing

In [16]:
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

callbacks_1 = [EarlyStopping(monitor='val_loss', patience=5),
             ModelCheckpoint(('mlp/experiment_1/model.h5'), save_best_only=True, 
                             save_weights_only=False)]

In [22]:
model_1 = tf.keras.Sequential()
model_1.add(layers.Dense(64, input_dim=43, activation='sigmoid'))
model_1.add(layers.Dropout(0.5))
model_1.add(layers.Dense(64, activation='sigmoid'))
model_1.add(layers.Dropout(0.5))
model_1.add(layers.Dense(1, activation='sigmoid'))
sgd = tf.keras.optimizers.SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
model_1.compile(loss='binary_crossentropy',
              #optimizer = tf.train.AdamOptimizer(0.001),
              optimizer='rmsprop',
              #optimizer = sgd,
              metrics=['accuracy'])

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


In [29]:
epochs = 100
batch_size = 32
model_1.fit(X_train, y_train,epochs=epochs,batch_size=batch_size, 
            callbacks = callbacks_1, validation_data=(X_validation, y_validation))
#model_1.fit(X_train,y_train, batch_size=batch_size, epochs=epochs, validation_split=0.2, callbacks =callbacks_1)
score_1 = model_1.evaluate(X_test, y_test, batch_size=batch_size)

Train on 4225 samples, validate on 1409 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100


Even though 100 epochs where given as a hyperparameter, our model stopped at the 19th iterations. This is due to the fact that our validation loss did not improve on the last 5 iterations. 
training accuracy, validation accuracy are all very similar. This means that we have a well fitted and generalized model.

## Experiment 2 
* First layer with 43 input features combined with a dropout layer.
* Second layer with 64 neurons combined with a dropout layer.
* Output layer with sigmoid activation 
* Adam optimizer
* cross entropy as loss function

In [35]:
callbacks_2 = [EarlyStopping(monitor='val_acc', patience=10),
             ModelCheckpoint(('mlp/experiment_2/model.h5'), save_best_only=True, 
                             save_weights_only=False)]

In [44]:
model_2 = tf.keras.Sequential()
model_2.add(layers.Dense(64, input_dim=43, activation='sigmoid'))
model_2.add(layers.Dropout(0.5))
model_2.add(layers.Dense(64, activation='sigmoid'))
model_2.add(layers.Dropout(0.5))
model_2.add(layers.Dense(1, activation='sigmoid'))
#sgd = tf.keras.optimizers.SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
model_2.compile(loss='binary_crossentropy',
              optimizer = tf.train.AdamOptimizer(0.01),
              #optimizer = sgd,
              metrics=['accuracy'])

In [45]:
epochs = 100
batch_size = 256
model_2.fit(X_train, y_train,epochs=epochs,batch_size=batch_size, 
            callbacks = callbacks_2, validation_data=(X_validation, y_validation))
#model_1.fit(X_train,y_train, batch_size=batch_size, epochs=epochs, validation_split=0.2, callbacks =callbacks_1)
score_2 = model_2.evaluate(X_test, y_test, batch_size=batch_size)

Train on 4225 samples, validate on 1409 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100


In [20]:
model = tf.keras.Sequential()
model.add(layers.Dense(64, input_dim=43, activation='sigmoid'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(64, activation='sigmoid'))
#model.add(layers.Dense(64, activation='sigmoid'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(1, activation='linear'))
#sgd = tf.keras.optimizers.SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy',
              #optimizer = tf.train.AdamOptimizer(0.001),
              optimizer='rmsprop',
              #optimizer = sgd,
              metrics=['accuracy'])

model.fit(X_train, y_train,
          epochs=100,
          batch_size=32)
score = model.evaluate(X_test, y_test, batch_size=batch_size)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

For this model, we determined a patience of 10 epochs. The model performed almost equally to compared to Experiment's 1 model. 

## Experiment 3
* First layer with 43 input features combined with a dropout layer.
* Second layer with 64 neurons combined with a dropout layer.
* Output layer with sigmoid activation 
* Stochastic Gradient Descent as optimizer using NAG. 
* cross entropy as loss function

In [17]:
callbacks_3 = [EarlyStopping(monitor='val_acc', patience=10),
             ModelCheckpoint(('mlp/experiment_3/model.h5'), save_best_only=True, 
                             save_weights_only=False)]

In [20]:
model_3 = tf.keras.Sequential()
model_3.add(layers.Dense(64, input_dim=43, activation='sigmoid'))
model_3.add(layers.Dropout(0.5))
model_3.add(layers.Dense(64, activation='sigmoid'))
model_3.add(layers.Dropout(0.5))
model_3.add(layers.Dense(1, activation='linear'))
sgd = tf.keras.optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model_3.compile(loss='binary_crossentropy',
              optimizer = sgd,
              metrics=['accuracy'])

epochs = 100
batch_size = 32
model_3.fit(X_train, y_train,
          epochs=epochs,
          batch_size=batch_size, callbacks = callbacks_3, validation_data =(X_validation, y_validation))
score_3 = model_3.evaluate(X_test, y_test, batch_size=batch_size)

Train on 4225 samples, validate on 1409 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100


Using SGD as an optimizer result in poor results. SGD is known to have a high variance in error due to the fact that each observation is fed individually. Training time took longer than the previous experiments.

This model was tested with several learning rates and the results did not improved.

## Experiment 4
* First layer with 43 input features combined with a dropout layer.
* Second layer with 64 neurons combined with a dropout layer.
* Output layer with 128 neurons. 
* Output layer with sigmoid activation 
* Adam Optimizer.
* cross entropy as loss function

In [55]:
callbacks_4 = [EarlyStopping(monitor='val_loss', patience=5),
             ModelCheckpoint(('mlp/experiment_4/model.h5'), save_best_only=True, 
                             save_weights_only=False)]

In [72]:
model_4 = tf.keras.Sequential()
model_4.add(layers.Dense(64, input_dim=43, activation='sigmoid'))
model_4.add(layers.Dropout(0.5))
model_4.add(layers.Dense(128, activation='relu'))
model_4.add(layers.Dropout(0.5))
model_4.add(layers.Dense(1, activation='sigmoid'))
model_4.compile(loss='binary_crossentropy',
              optimizer = tf.train.AdamOptimizer(0.001),
              #optimizer='rmsprop',
              #optimizer = sgd,
              metrics=['accuracy'])

In [73]:
model_4.fit(X_train, y_train,
          epochs=epochs,
          batch_size=batch_size, callbacks=callbacks_4, validation_data=(X_test,y_test))
score_4 = model_4.evaluate(X_test, y_test, batch_size=batch_size)

Train on 4225 samples, validate on 1409 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100


This neural net was more complex than the ones created on previous experiments, as it had twice the amount of neurons in one layer than previous implementations. Accuracy for the test set improved by 0.01

## Experiment 5 

The objective of this experiment is to create a complex and biased neural network that will perform exceptionally on the training set but poorly on the testing set. Previous architectures where not able to improve accuracy over 0.82.

* First layer with 43 input features combined with a dropout layer.
* Second,third, and fifth  layer with 128 neurons combined with a dropout layer.
* Fourth layer with 256 neurons.
* Output layer with sigmoid activation 
* Adam optimizer
* cross entropy as loss function

In [74]:
callbacks_5 = [EarlyStopping(monitor='val_loss', patience=5),
             ModelCheckpoint(('mlp/experiment_5/model.h5'), save_best_only=True, 
                             save_weights_only=False)]

In [75]:
model_5 = tf.keras.Sequential()
model_5.add(layers.Dense(64, input_dim=43, activation='sigmoid'))
model_5.add(layers.Dropout(0.5))
model_5.add(layers.Dense(128, activation='relu'))
model_5.add(layers.Dropout(0.5))
model_5.add(layers.Dense(128, activation='sigmoid'))
model_5.add(layers.Dense(256, activation='tanh'))
model_5.add(layers.Dense(128, activation='sigmoid'))
model_5.add(layers.Dropout(0.5))
model_5.add(layers.Dense(1, activation='sigmoid'))
model_5.compile(loss='binary_crossentropy',
              optimizer = tf.train.AdamOptimizer(0.001),
              #optimizer='rmsprop',
              #optimizer = sgd,
              metrics=['accuracy'])

In [77]:
model_5.fit(X_train, y_train,
          epochs=epochs,
          batch_size=batch_size, callbacks=callbacks_5, validation_data=(X_test,y_test))
score_5 = model_5.evaluate(X_test, y_test, batch_size=batch_size)

Train on 4225 samples, validate on 1409 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100


Surprisingly, this model performed poorly compared to previous experiments even though it was more complex. 

# Conclusions 

Several neural networks where implemented for this dataset. Experiment 1 and 2 performed almost equally. Experiment 3 performed poorly due to the use of SGD as an optimizer. Experiment 4 had the best accuracy and is, overall the best model. Experiment 5 performed poorly considering that it was the most complex. This shows the importance of choosing the correct architecture to avoid underfitting and overfitting when making inferences.