# Deep Learning models using Keras

Keras is a user-friendly neural network library written in Python. We will build a regression model to predict an employee's wage per hour.

Note: The datasets we will be using are relatively clean, so we will not perform any data preprocessing in order to get our data ready for modeling. Datasets that you will use in future projects may not be so clean - for example, they may have missing values - so you may need to use data preprocessing techniques to alter your datasets to get more accurate results.

# Regression Model using Keras

For our regression deep learning model, the first step is to read in the data we will use as input. For this example, we are using the 'hourly wages' dataset. To start, we will use Pandas to read in the data. I will not go into detail on Pandas, but it is a library you should become familiar with if you're looking to dive further into data science and machine learning.

'df' stands for dataframe. Pandas read in the CSV file as a dataframe. The 'head()' function will show the first 5 rows of the dataframe so you can check that the data has been read in properly and can take an initial look at how the data is structured.

In [15]:
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import EarlyStopping
from keras.utils import to_categorical

In [16]:
#read in training data
#train_df = pd.read_html("https://github.com/eijaz1/Deep-Learning-in-Keras-Tutorial/blob/master/data/hourly_wages_data.csv")
train_df = pd.read_csv('Datasets/hourly_wages_data.csv')

#view data structure
train_df.head()

#print (train_df)

Unnamed: 0,wage_per_hour,union,education_yrs,experience_yrs,age,female,marr,south,manufacturing,construction
0,5.1,0,8,21,35,1,1,0,1,0
1,4.95,0,9,42,57,1,1,0,1,0
2,6.67,0,12,1,19,0,0,0,1,0
3,4.0,0,12,4,22,0,0,0,0,0
4,7.5,0,12,17,35,0,1,0,0,0


Next, we need to split up our dataset into inputs (train_X) and our target (train_y). Our input will be every column except 'wage_per_hour' because 'wage_per_hour' is what we will be attempting to predict. Therefore, 'wage_per_hour' will be our target.

We will use the pandas 'drop' function to drop the column 'wage_per_hour' from our dataframe and store it in the variable 'train_X'. This will be our input.

In [17]:
#create a dataframe with all training data except the target column
train_X = train_df.drop(columns=['wage_per_hour'])

#check that the target variable has been removed
train_X.head()

Unnamed: 0,union,education_yrs,experience_yrs,age,female,marr,south,manufacturing,construction
0,0,8,21,35,1,1,0,1,0
1,0,9,42,57,1,1,0,1,0
2,0,12,1,19,0,0,0,1,0
3,0,12,4,22,0,0,0,0,0
4,0,12,17,35,0,1,0,0,0


In [18]:
#create a dataframe with only the target column
train_y = train_df[['wage_per_hour']]

#view dataframe
train_y.head()

Unnamed: 0,wage_per_hour
0,5.1
1,4.95
2,6.67
3,4.0
4,7.5


The model type that we will be using is Sequential. Sequential is the easiest way to build a model in Keras. It allows you to build a model layer by layer. Each layer has weights that correspond to the layer the follows it.

We use the 'add()' function to add layers to our model. We will add two layers and an output layer.

'Dense' is the layer type. Dense is a standard layer type that works for most cases. In a dense layer, all nodes in the previous layer connect to the nodes in the current layer.

We have 10 nodes in each of our input layers. This number can also be in the hundreds or thousands. Increasing the number of nodes in each layer increases model capacity. I will go into further detail about the effects of increasing model capacity shortly.

'Activation' is the activation function for the layer. An activation function allows models to take into account nonlinear relationships. For example, if you are predicting diabetes in patients, going from age 10 to 11 is different than going from age 60â??61.

The activation function we will be using is ReLU or Rectified Linear Activation. Although it is two linear pieces, it has been proven to work well in neural networks.

The first layer needs an input shape. The input shape specifies the number of rows and columns in the input. The number of columns in our input is stored in 'n_cols'. There is nothing after the comma which indicates that there can be any amount of rows.

The last layer is the output layer. It only has one node, which is for our prediction.

Next, we need to compile our model. Compiling the model takes two parameters: optimizer and loss.

The optimizer controls the learning rate. We will be using 'Adam' as our optimizer. Adam is generally a good optimizer to use for many cases. The Adam optimizer adjusts the learning rate throughout training.

The learning rate determines how fast the optimal weights for the model are calculated. A smaller learning rate may lead to more accurate weights (up to a certain point), but the time it takes to compute the weights will be longer.

For our loss function, we will use 'mean_squared_error'. It is calculated by taking the average squared difference between the predicted and actual values. It is a popular loss function for regression problems. The closer to 0 this is, the better the model performed.



In [19]:
from keras import backend as K
import tensorflow as tf

gpus = K.tensorflow_backend._get_available_gpus()
processor = ('/gpu:' + str(len(gpus)-1)) if len(gpus) > 0 else '/cpu:0'

with tf.device(processor):
    #create model
    model = Sequential()

    #get number of columns in training data
    n_cols = train_X.shape[1]

    #add model layers
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))

    #compile model using mse as a measure of model performance
    model.compile(optimizer='adam', loss='mean_squared_error')

    #set early stopping monitor so the model stops training when it won't improve anymore
    early_stopping_monitor = EarlyStopping(patience=3)

Now we will train our model. To train, we will use the 'fit()' function on our model with the following five parameters: training data (train_X), target data (train_y), validation split, the number of epochs and callbacks.

The validation split will randomly split the data into use for training and testing. During training, we will be able to see the validation loss, which gives the mean squared error of our model on the validation set. We will set the validation split at 0.2, which means that 20% of the training data we provide in the model will be set aside for testing model performance.

The number of epochs is the number of times the model will cycle through the data. The more epochs we run, the more the model will improve, up to a certain point. After that point, the model will stop improving during each epoch. In addition, the more epochs, the longer the model will take to run. To monitor this, we will use 'early stopping'.

Early stopping will stop the model from training before the number of epochs is reached if the model stops improving. We will set our early stopping monitor to 3. This means that after 3 epochs in a row in which the model doesn't improve, training will stop. Sometimes, the validation loss can stop improving then improve in the next epoch, but after 3 epochs in which the validation loss doesn't improve, it usually won't improve again.

In [20]:
#train model
model.fit(train_X, train_y, validation_split=0.2, epochs=30, callbacks=[early_stopping_monitor])

Train on 427 samples, validate on 107 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30


<keras.callbacks.History at 0x1d4b0907940>

Making predictions on new data

If you want to use this model to make predictions on new data, we would use the 'predict()' function, passing in our new data.
The output would be 'wage_per_hour' predictions.

In [24]:
#example on how to use our newly trained model on how to make predictions on unseen data
#we will pretend our new data is saved in a dataframecalled 'test_X')

test_df = pd.read_csv('Datasets/hourly_wages_data.csv')
test_X = test_df.drop(columns=['wage_per_hour'])

test_y_predictions = model.predict(test_X)

print (test_y_predictions)

[[ 6.6394253]
 [ 8.429611 ]
 [ 7.4064507]
 [ 7.497562 ]
 [ 8.316547 ]
 [ 8.6579075]
 [ 8.355645 ]
 [ 7.8252325]
 [ 9.649518 ]
 [ 7.8252325]
 [ 8.735253 ]
 [ 8.969587 ]
 [ 7.4947495]
 [ 8.556985 ]
 [ 8.039786 ]
 [ 9.733948 ]
 [ 8.201249 ]
 [ 9.395367 ]
 [ 7.870005 ]
 [ 9.365094 ]
 [ 8.3863325]
 [ 8.374075 ]
 [ 7.8252325]
 [ 7.884708 ]
 [ 8.709753 ]
 [ 8.027948 ]
 [ 7.7970443]
 [ 8.340557 ]
 [ 7.4433513]
 [ 7.7267385]
 [ 8.915307 ]
 [10.075039 ]
 [ 6.7917438]
 [ 9.267153 ]
 [ 8.550881 ]
 [ 9.594659 ]
 [ 8.798596 ]
 [10.0215845]
 [ 7.432028 ]
 [ 8.93478  ]
 [ 7.285078 ]
 [ 9.6156225]
 [ 8.190249 ]
 [ 6.725722 ]
 [ 8.459114 ]
 [ 6.010983 ]
 [ 9.715052 ]
 [ 7.4381657]
 [ 8.12793  ]
 [ 7.8625784]
 [ 7.6612043]
 [10.199292 ]
 [ 7.3778176]
 [ 7.9676247]
 [ 7.7088604]
 [ 7.672914 ]
 [ 8.83378  ]
 [ 7.296778 ]
 [ 9.166189 ]
 [ 8.754036 ]
 [ 7.9355392]
 [ 9.300709 ]
 [ 8.698705 ]
 [ 7.7174745]
 [ 8.117642 ]
 [ 8.2506695]
 [ 9.115358 ]
 [ 8.973037 ]
 [ 9.409379 ]
 [ 9.729176 ]
 [ 7.2670965]
 [ 9.7

As you increase the number of nodes and layers in a model, the model capacity increases. Increasing model capacity can lead to a more accurate model, up to a certain point, at which the model will stop improving. Generally, the more training data you provide, the larger the model should be. We are only using a tiny amount of data, so our model is pretty small. The larger the model, the more computational capacity it requires and it will take longer to train.

Let's create a new model using the same training data as our previous model. This time, we will add a layer and increase the nodes in each layer to 200. We will train the model to see if increasing the model capacity will improve our validation score.

In [25]:
#training a new model on the same data to show the effect of increasing model capacity

#create model
model_mc = Sequential()

#add model layers
model_mc.add(Dense(200, activation='relu', input_shape=(n_cols,)))
model_mc.add(Dense(200, activation='relu'))
model_mc.add(Dense(200, activation='relu'))
model_mc.add(Dense(1))

#compile model using mse as a measure of model performance
model_mc.compile(optimizer='adam', loss='mean_squared_error')

In [33]:
#train model
model_mc.fit(train_X, train_y, validation_split=0.2, epochs=30, callbacks=[early_stopping_monitor])

Train on 427 samples, validate on 107 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30


<keras.callbacks.History at 0x1d4b723bf60>

We can see that by increasing our model capacity, we have improved our validation loss from 29.65 in our old model to 28.39 in our new model.

# Classification Model using Keras

Now let's move on to building our model for classification. Since many steps will be a repeat from the previous model, I will only go over new concepts.

For this next model, we are going to predict if patients have diabetes or not.

In [10]:
#read in training data
train_df_2 = pd.read_csv('Datasets/diabetes_data.csv')

#view data structure
train_df_2.head()

Unnamed: 0,pregnancies,glucose,diastolic,triceps,insulin,bmi,dpf,age,diabetes
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [11]:
#create a dataframe with all training data except the target column
train_X_2 = train_df_2.drop(columns=['diabetes'])

#check that the target variable has been removed
train_X_2.head()

Unnamed: 0,pregnancies,glucose,diastolic,triceps,insulin,bmi,dpf,age
0,6,148,72,35,0,33.6,0.627,50
1,1,85,66,29,0,26.6,0.351,31
2,8,183,64,0,0,23.3,0.672,32
3,1,89,66,23,94,28.1,0.167,21
4,0,137,40,35,168,43.1,2.288,33


When separating the target column, we need to call the 'to_categorical()' function so that column will be 'one-hot encoded'. Currently, a patient with no diabetes is represented with a 0 in the diabetes column and a patient with diabetes is represented with a 1. 

With one-hot encoding, the integer will be removed and a binary variable is inputted for each category. In our case, we have two categories: no diabetes and diabetes.

A patient with no diabetes will be represented by [1 0] and a patient with diabetes will be represented by [0 1].

In [12]:
#one-hot encode target column
train_y_2 = to_categorical(train_df_2.diabetes)

#vcheck that target column has been converted
train_y_2[0:5]

array([[0., 1.],
       [1., 0.],
       [0., 1.],
       [1., 0.],
       [0., 1.]], dtype=float32)

The last layer of our model has 2 nodes - one for each option: the patient has diabetes or they don't.

The activation is 'softmax'. Softmax makes the output sum up to 1 so the output can be interpreted as probabilities. The model will then make its prediction based on which option has a higher probability.

We will use 'categorical_crossentropy' for our loss function. This is the most common choice for classification. A lower score indicates that the model is performing better.

To make things even easier to interpret, we will use the 'accuracy' metric to see the accuracy score on the validation set at the end of each epoch.

In [13]:
with tf.device(processor):
    #create model
    model_2 = Sequential()

    #get number of columns in training data
    n_cols_2 = train_X_2.shape[1]

    #add layers to model
    model_2.add(Dense(250, activation='relu', input_shape=(n_cols_2,)))
    model_2.add(Dense(250, activation='relu'))
    model_2.add(Dense(250, activation='relu'))
    model_2.add(Dense(2, activation='softmax'))

    #compile model using accuracy to measure model performance
    model_2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [14]:
#train model
model_2.fit(train_X_2, train_y_2, epochs=30, validation_split=0.2, callbacks=[early_stopping_monitor])

Train on 614 samples, validate on 154 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30


<keras.callbacks.History at 0x1d4a1accd30>