# Typeform: ML Case: Deep Neural Network

refs[0]: https://towardsdatascience.com/deep-neural-networks-for-regression-problems-81321897ca33


In [1]:
%matplotlib inline

In [2]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
from keras.callbacks import ModelCheckpoint
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error 

Using TensorFlow backend.


In [4]:
import warnings 
warnings.filterwarnings('ignore')
warnings.filterwarnings('ignore', category=DeprecationWarning)

## Typeform: ML Case: DNN : Read dataframe

We are using the same dataframe cleaned in the python-sklearn exercise without eliminating any feature

In [5]:
df_typeform = pd.read_pickle('./data/df_typeform.pkl')

In [6]:
df_typeform.shape

(1031283, 48)

## Typeform: ML Case: DNN : Training set and Test set split

In [7]:
y = df_typeform[['completion_rate']]
X = df_typeform[list(df_typeform.columns[:-1])]

In [8]:
X_train, X_test, y_train, y_test = \
    train_test_split(X, y, test_size=0.33, random_state=42)

In [9]:
X_train.shape

(690959, 47)

In [10]:
X_test.shape

(340324, 47)

## Typeform: DNN: Model creation and fitting
- Define a sequential model
- Add some dense layers
- Use ‘relu’ as the activation function for the hidden layers
- Use a ‘normal’ initializer as the kernal_intializer

**Creating the DNN model**

In [11]:
DNN_model = Sequential()

# The Input Layer :
DNN_model.add(Dense(128, kernel_initializer='normal',input_dim = X_train.shape[1], activation='relu'))

# The Hidden Layers :
DNN_model.add(Dense(256, kernel_initializer='normal',activation='relu'))
DNN_model.add(Dense(256, kernel_initializer='normal',activation='relu'))
DNN_model.add(Dense(256, kernel_initializer='normal',activation='relu'))

# The Output Layer :
DNN_model.add(Dense(1, kernel_initializer='normal',activation='linear'))

# Compile the network :
DNN_model.compile(loss='mean_absolute_error', optimizer='adam', metrics=['mean_absolute_error'])
DNN_model.summary()

W1012 19:47:39.415066 4582729152 deprecation_wrapper.py:119] From /Users/davidquer/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W1012 19:47:39.435066 4582729152 deprecation_wrapper.py:119] From /Users/davidquer/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W1012 19:47:39.438317 4582729152 deprecation_wrapper.py:119] From /Users/davidquer/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:4115: The name tf.random_normal is deprecated. Please use tf.random.normal instead.

W1012 19:47:39.595732 4582729152 deprecation_wrapper.py:119] From /Users/davidquer/anaconda3/lib/python3.7/site-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.



_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 128)               6144      
_________________________________________________________________
dense_2 (Dense)              (None, 256)               33024     
_________________________________________________________________
dense_3 (Dense)              (None, 256)               65792     
_________________________________________________________________
dense_4 (Dense)              (None, 256)               65792     
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 257       
Total params: 171,009
Trainable params: 171,009
Non-trainable params: 0
_________________________________________________________________


**Model fitting and checkpoints definition**

In [13]:
checkpoint_name = './checkpoints/Weights-{epoch:03d}--{val_loss:.5f}.hdf5' 
checkpoint = ModelCheckpoint(checkpoint_name, monitor='val_loss', verbose = 1, save_best_only = True, mode ='auto')
callbacks_list = [checkpoint]

Using only 10 epochs so it ends in a reasonable time

In [14]:
DNN_model.fit(X_train, y_train, epochs=10, batch_size=64, validation_split = 0.2, callbacks=callbacks_list)

W1012 19:48:36.837498 4582729152 deprecation_wrapper.py:119] From /Users/davidquer/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

W1012 19:48:36.952099 4582729152 deprecation_wrapper.py:119] From /Users/davidquer/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:973: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.



Train on 552767 samples, validate on 138192 samples
Epoch 1/10

Epoch 00001: val_loss improved from inf to 2.14035, saving model to ./checkpoints/Weights-001--2.14035.hdf5
Epoch 2/10

Epoch 00002: val_loss improved from 2.14035 to 2.13722, saving model to ./checkpoints/Weights-002--2.13722.hdf5
Epoch 3/10

Epoch 00003: val_loss improved from 2.13722 to 2.13704, saving model to ./checkpoints/Weights-003--2.13704.hdf5
Epoch 4/10

Epoch 00004: val_loss improved from 2.13704 to 2.13658, saving model to ./checkpoints/Weights-004--2.13658.hdf5
Epoch 5/10

Epoch 00005: val_loss improved from 2.13658 to 2.13493, saving model to ./checkpoints/Weights-005--2.13493.hdf5
Epoch 6/10

Epoch 00006: val_loss improved from 2.13493 to 2.13423, saving model to ./checkpoints/Weights-006--2.13423.hdf5
Epoch 7/10

Epoch 00007: val_loss improved from 2.13423 to 2.13413, saving model to ./checkpoints/Weights-007--2.13413.hdf5
Epoch 8/10

Epoch 00008: val_loss did not improve from 2.13413
Epoch 9/10

Epoch 000

<keras.callbacks.History at 0x1a477ffef0>

**Selecting the best model based on MAE**

In [15]:
wights_file = './checkpoints/Weights-009--2.13214.hdf5'

DNN_model.load_weights(wights_file)
DNN_model.compile(loss='mean_absolute_error', optimizer='adam', metrics=['mean_absolute_error'])

## Evaluate the model with the Test set

In [16]:
y_pred = DNN_model.predict(X_test)

In [17]:
y_test['y_pred'] = y_pred

In [18]:
y_test.columns = ['y_test','y_pred']

In [19]:
y_test['MAE'] = y_test.y_test - y_test.y_pred
y_test['MAE'] = y_test.MAE.apply(lambda x : np.absolute(x))
y_test.MAE.mean()

2.1487201872250994