# DL models in Keras

Import pandas, numpy,<br />
Dense layer and Sequential model from keras

In [1]:
# importing
import pandas as pd
import keras
from keras.layers import Dense
from keras.models import Sequential
import io

Using TensorFlow backend.


Read the dataset

In [0]:
dataset = pd.read_csv('hourly_wages.csv')

Explore the datasets first five samples

In [7]:
dataset.head()

Unnamed: 0,wage_per_hour,union,education_yrs,experience_yrs,age,female,marr,south,manufacturing,construction
0,5.1,0,8,21,35,1,1,0,1,0
1,4.95,0,9,42,57,1,1,0,1,0
2,6.67,0,12,1,19,0,0,0,1,0
3,4.0,0,12,4,22,0,0,0,0,0
4,7.5,0,12,17,35,0,1,0,0,0


Check the datasets structure by calling the <i>describe</i> method

In [42]:
dataset.describe()

Unnamed: 0,wage_per_hour,union,education_yrs,experience_yrs,age,female,marr,south,manufacturing,construction
count,534.0,534.0,534.0,534.0,534.0,534.0,534.0,534.0,534.0,534.0
mean,9.024064,0.179775,13.018727,17.822097,36.833333,0.458801,0.655431,0.292135,0.185393,0.044944
std,5.139097,0.38436,2.615373,12.37971,11.726573,0.498767,0.475673,0.45517,0.388981,0.207375
min,1.0,0.0,2.0,0.0,18.0,0.0,0.0,0.0,0.0,0.0
25%,5.25,0.0,12.0,8.0,28.0,0.0,0.0,0.0,0.0,0.0
50%,7.78,0.0,12.0,15.0,35.0,0.0,1.0,0.0,0.0,0.0
75%,11.25,0.0,15.0,26.0,44.0,1.0,1.0,1.0,0.0,0.0
max,44.5,1.0,18.0,55.0,64.0,1.0,1.0,1.0,1.0,1.0


Split the dataset into inputs and outputs. Here, you want to predict the hourly wage number.

In [10]:
predictors = dataset.drop(columns=['wage_per_hour'])
target = dataset['wage_per_hour']
target.head()

0    5.10
1    4.95
2    6.67
3    4.00
4    7.50
Name: wage_per_hour, dtype: float64

You have to define the input shape for the first layer in neural network. The shape of the <i>predictors</i> (number of columns in particular) should help you with this.

In [28]:
predictors.shape[1]

9

In [0]:
n_cols = predictors.shape[1]

Build the simple sequential model of NN

In [0]:
model = Sequential()

<b>Add</b> the first layer with 50 units (neurons). Give it the activation of <b>relu</b> type and provide it with the <b>input shape</b>

In [0]:
model.add(Dense(50, activation = 'relu', input_shape = (n_cols,)))

<b>Add</b> the first layer with 32 units. Give it the activation of <b>relu</b> type. Is it necessary to define the input shape? What are your thoughts?

In [0]:
model.add(Dense(32, activation = 'relu'))

<b>Add</b> the output layer. How many units should be here? Recall that you have a regression problem here.

In [0]:
model.add(Dense(1))

Check the models summary. The model should contain only dense layers, among them the last one is an output layer.

In [34]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_5 (Dense)              (None, 50)                500       
_________________________________________________________________
dense_6 (Dense)              (None, 32)                1632      
_________________________________________________________________
dense_7 (Dense)              (None, 1)                 33        
Total params: 2,165
Trainable params: 2,165
Non-trainable params: 0
_________________________________________________________________


You're now going to compile the model you specified earlier. To compile the model, you need to specify the optimizer and loss function to use.

The Adam optimizer is an excellent choice. You can read more about it as well as other keras optimizers <a href="https://keras.io/optimizers/#adam">here</a>, and if you are really curious to learn more, you can read the original <a href="https://arxiv.org/abs/1412.6980v8">paper</a> that introduced the Adam optimizer.

In this exercise, you'll use the Adam optimizer and the mean squared error loss function.

In [0]:
model.compile(optimizer='adam', loss='mean_squared_error', metrics = ['accuracy'])

You'll now fit the model. Recall that the data to be used as predictive features is loaded in a NumPy matrix called predictors and the data to be predicted is stored in a NumPy matrix called target.

You can specify the number of epochs. By default it is ten.

In [37]:
model.fit(predictors, target, epochs=20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f85619e6550>

Now you can either predict the values feeding into the <i>predict</i> method your <i>predictors</i> array, OR you can go back to the beginning and split the dataset into train and test. Then you can test it on the test set

In [0]:
y_preds = model.predict(predictors)

You can check the predicted values and difference between actual ones

In [39]:
for i,j in zip(target,y_preds):
    print('true ', i, ' pred ', j, ' diff ', i-j)

true  5.1  pred  [6.650491]  diff  [-1.5504913]
true  4.95  pred  [7.568009]  diff  [-2.618009]
true  6.67  pred  [7.2951193]  diff  [-0.6251192]
true  4.0  pred  [7.8989186]  diff  [-3.8989186]
true  7.5  pred  [10.101671]  diff  [-2.6016712]
true  13.07  pred  [10.764593]  diff  [2.3054066]
true  4.45  pred  [8.236111]  diff  [-3.7861109]
true  19.47  pred  [9.115704]  diff  [10.354296]
true  13.28  pred  [12.104703]  diff  [1.1752968]
true  8.75  pred  [9.115704]  diff  [-0.36570358]
true  11.35  pred  [11.283351]  diff  [0.06664944]
true  11.5  pred  [11.784107]  diff  [-0.2841072]
true  6.5  pred  [6.8639336]  diff  [-0.36393356]
true  6.25  pred  [8.667067]  diff  [-2.4170666]
true  19.98  pred  [7.62041]  diff  [12.35959]
true  7.3  pred  [10.407075]  diff  [-3.1070747]
true  8.0  pred  [6.5009418]  diff  [1.4990582]
true  22.2  pred  [11.950312]  diff  [10.249689]
true  3.65  pred  [9.29683]  diff  [-5.64683]
true  20.55  pred  [10.354056]  diff  [10.195943]
true  5.71  pred  [

Save the model using load_model (check the lecture slides)

import the module first

In [0]:
from keras.models import load_model

Save the model to the .h5 file

In [0]:
model.save('model.h5')