# In-depth analysis: predictive modeling using Deep Learning

### 3.2  Deep Neural Networks

Importing relevant packages:

In [1]:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

In [2]:
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_absolute_error 

In [3]:
import warnings 
warnings.filterwarnings('ignore')
warnings.filterwarnings('ignore', category=DeprecationWarning)

In [4]:
from keras.callbacks import ModelCheckpoint
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

Using TensorFlow backend.


Definition of the model and input layer:

In [5]:
X_train = pd.read_csv('../Data/processed/x_train.csv')
y_train = pd.read_csv('../Data/processed/y_train.csv')
X_test = pd.read_csv('../Data/processed/x_test.csv')
y_test = pd.read_csv('../Data/processed/y_test.csv')

In [6]:
X = pd.concat([X_train, X_test])
X.head()

Unnamed: 0,# accommodates,bathrooms,bedrooms,host_response_rate_float,extra_people_float,availability_30,availability_60,availability_90,availability_365,calculated_host_listings_count_entire_homes,...,property_type_encode_26,property_type_encode_27,property_type_encode_28,property_type_encode_29,property_type_encode_30,bed_type_encode_0,bed_type_encode_1,bed_type_encode_2,bed_type_encode_3,bed_type_encode_4
0,0.117647,0.1875,0.083333,0.9,0.107143,0.4,0.7,0.8,0.19726,0.013889,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
1,0.0,0.125,0.083333,1.0,0.110714,0.6,0.8,0.866667,0.967123,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
2,0.176471,0.125,0.166667,1.0,0.053571,0.933333,0.966667,0.977778,0.994521,0.013889,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
3,0.058824,0.1875,0.083333,1.0,0.0,0.0,0.0,0.0,0.0,0.013889,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
4,0.176471,0.25,0.166667,0.88,0.053571,0.0,0.0,0.0,0.008219,0.013889,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


In [7]:
X.fillna(value=0, inplace=True)

In [8]:
y = pd.concat([y_train, y_test])

Feel free to check Keras documentation [here](https://keras.io/getting-started/sequential-model-guide/) for more details about the packages imported in this section and the methods mentioned below.

Once imported **Sequential**, **Dense** and **Activation** we create the Sequential model followed by the addition of layers to the constructor via .add() method. 

The first layer has as nodes as dimensions containing the feature vectors. This information is extracted as the number of columns of the training set.

In [9]:
model = Sequential()

model.add(Dense(128, kernel_initializer='normal', input_dim = X.shape[1], activation='relu'))

Instructions for updating:
Colocations handled automatically by placer.


Configuration if **hidden layers** with a **normal** initializer as **RELU** activation function:

In [10]:
model.add(Dense(256, kernel_initializer='normal',activation='relu'))
model.add(Dense(256, kernel_initializer='normal',activation='relu'))
model.add(Dense(256, kernel_initializer='normal',activation='relu'))

Configuration of the **output layer** as one node and a **linear** activation function:

In [11]:
model.add(Dense(1, kernel_initializer='normal',activation='linear'))

Next, we configure the learning process, which receives an optimizer, a loss function and a list of metrics. 

In [12]:
model.compile(loss='mean_absolute_error', optimizer='rmsprop', metrics=['mean_absolute_error', 'mae'])

In [13]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 128)               7168      
_________________________________________________________________
dense_2 (Dense)              (None, 256)               33024     
_________________________________________________________________
dense_3 (Dense)              (None, 256)               65792     
_________________________________________________________________
dense_4 (Dense)              (None, 256)               65792     
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 257       
Total params: 172,033
Trainable params: 172,033
Non-trainable params: 0
_________________________________________________________________


Checkpoint callback:

In [14]:
checkpoint_name = 'Weights-{epoch:03d}--{val_loss:.5f}.hdf5' 
checkpoint = ModelCheckpoint(checkpoint_name, monitor='val_loss', verbose = 1, save_best_only = True, mode ='auto')
callbacks_list = [checkpoint]

In [15]:
model.fit(X, y, epochs=100, batch_size=32, validation_split = 0.2, callbacks=callbacks_list)

Instructions for updating:
Use tf.cast instead.
Train on 8534 samples, validate on 2134 samples
Epoch 1/100

Epoch 00001: val_loss improved from inf to 57.80685, saving model to Weights-001--57.80685.hdf5
Epoch 2/100

Epoch 00002: val_loss improved from 57.80685 to 53.80597, saving model to Weights-002--53.80597.hdf5
Epoch 3/100

Epoch 00003: val_loss did not improve from 53.80597
Epoch 4/100

Epoch 00004: val_loss improved from 53.80597 to 53.16834, saving model to Weights-004--53.16834.hdf5
Epoch 5/100

Epoch 00005: val_loss did not improve from 53.16834
Epoch 6/100

Epoch 00006: val_loss did not improve from 53.16834
Epoch 7/100

Epoch 00007: val_loss did not improve from 53.16834
Epoch 8/100

Epoch 00008: val_loss did not improve from 53.16834
Epoch 9/100

Epoch 00009: val_loss did not improve from 53.16834
Epoch 10/100

Epoch 00010: val_loss did not improve from 53.16834
Epoch 11/100

Epoch 00011: val_loss did not improve from 53.16834
Epoch 12/100

Epoch 00012: val_loss improved 


Epoch 00029: val_loss did not improve from 50.43066
Epoch 30/100

Epoch 00030: val_loss did not improve from 50.43066
Epoch 31/100

Epoch 00031: val_loss improved from 50.43066 to 50.39913, saving model to Weights-031--50.39913.hdf5
Epoch 32/100

Epoch 00032: val_loss did not improve from 50.39913
Epoch 33/100

Epoch 00033: val_loss did not improve from 50.39913
Epoch 34/100

Epoch 00034: val_loss did not improve from 50.39913
Epoch 35/100

Epoch 00035: val_loss did not improve from 50.39913
Epoch 36/100

Epoch 00036: val_loss did not improve from 50.39913
Epoch 37/100

Epoch 00037: val_loss did not improve from 50.39913
Epoch 38/100

Epoch 00038: val_loss did not improve from 50.39913
Epoch 39/100

Epoch 00039: val_loss did not improve from 50.39913
Epoch 40/100

Epoch 00040: val_loss did not improve from 50.39913
Epoch 41/100

Epoch 00041: val_loss did not improve from 50.39913
Epoch 42/100

Epoch 00042: val_loss did not improve from 50.39913
Epoch 43/100

Epoch 00043: val_loss did 


Epoch 00060: val_loss improved from 49.88119 to 49.75503, saving model to Weights-060--49.75503.hdf5
Epoch 61/100

Epoch 00061: val_loss did not improve from 49.75503
Epoch 62/100

Epoch 00062: val_loss did not improve from 49.75503
Epoch 63/100

Epoch 00063: val_loss did not improve from 49.75503
Epoch 64/100

Epoch 00064: val_loss did not improve from 49.75503
Epoch 65/100

Epoch 00065: val_loss did not improve from 49.75503
Epoch 66/100

Epoch 00066: val_loss did not improve from 49.75503
Epoch 67/100

Epoch 00067: val_loss did not improve from 49.75503
Epoch 68/100

Epoch 00068: val_loss did not improve from 49.75503
Epoch 69/100

Epoch 00069: val_loss did not improve from 49.75503
Epoch 70/100

Epoch 00070: val_loss improved from 49.75503 to 49.38082, saving model to Weights-070--49.38082.hdf5
Epoch 71/100

Epoch 00071: val_loss did not improve from 49.38082
Epoch 72/100

Epoch 00072: val_loss did not improve from 49.38082
Epoch 73/100

Epoch 00073: val_loss did not improve from 


Epoch 00091: val_loss did not improve from 49.38082
Epoch 92/100

Epoch 00092: val_loss did not improve from 49.38082
Epoch 93/100

Epoch 00093: val_loss did not improve from 49.38082
Epoch 94/100

Epoch 00094: val_loss did not improve from 49.38082
Epoch 95/100

Epoch 00095: val_loss did not improve from 49.38082
Epoch 96/100

Epoch 00096: val_loss did not improve from 49.38082
Epoch 97/100

Epoch 00097: val_loss did not improve from 49.38082
Epoch 98/100

Epoch 00098: val_loss did not improve from 49.38082
Epoch 99/100

Epoch 00099: val_loss did not improve from 49.38082
Epoch 100/100

Epoch 00100: val_loss did not improve from 49.38082


<keras.callbacks.callbacks.History at 0x1a18513c18>

Let's create a method that create the model:

In [18]:
def deep_learning_function():
    model = Sequential()

    model.add(Dense(128, kernel_initializer='normal', input_dim = X.shape[1], activation='relu'))
    model.add(Dense(256, kernel_initializer='normal',activation='relu'))
    model.add(Dense(256, kernel_initializer='normal',activation='relu'))
    model.add(Dense(256, kernel_initializer='normal',activation='relu'))
    model.add(Dense(1, kernel_initializer='normal',activation='linear'))
    model.compile(loss='mean_absolute_error', optimizer='rmsprop', metrics=['mean_absolute_error', 'mae'])
    
    return model

Evaluation of the baseline model using 5-fold cross validation:

In [None]:
# Regression Example With Boston Dataset: Standardized and Wider
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# evaluate model with standardized dataset
estimator = KerasRegressor(build_fn=deep_learning_function, epochs=100, batch_size=5, verbose=0)
kfold = KFold(n_splits=5)
results = cross_val_score(estimator, X, y, cv=kfold)
print("Baseline: %.2f (%.2f) MSE" % (results.mean(), results.std()))