#### Predicting wages with a Keras neural network

In this analysis I'll build a nueral network model using the Keras interface to the TensorFlow deep learning library. One of the goals with this analysis is to quickly be able to run more complex neural network models on larger datasets. Speaking of datasets, I'm using one that should allow for the prediction of an individuals hourly wages given characteristics like their industry, education and level of experience. Unfortunately, this dataset is (seemingly) no longer hosted anywhere notable, so I'll make it available on my [personal github page](https://github.com/brukeg/notebooks/tree/master/datasets/predicting-wages.csv).

In [10]:
# Import the necessary modules
import pandas as pd
import numpy as np

from keras.layers import Dense
from keras.models import Sequential
from sklearn.model_selection import train_test_split

In [11]:
# Import the dataset
df = pd.read_csv('datasets/hourly_wages.csv')

# Create arrays for the features (predictors) and target variable
target = df.wage_per_hour.values
predictors = df.drop('wage_per_hour', axis=1).values

# Create training and test datasets
X_train, X_test, y_train, y_test = train_test_split(predictors, 
                                                    target, 
                                                    test_size=0.3, 
                                                    random_state=42)

# Explore the data
df.head()

Unnamed: 0,wage_per_hour,union,education_yrs,experience_yrs,age,female,marr,south,manufacturing,construction
0,5.1,0,8,21,35,1,1,0,1,0
1,4.95,0,9,42,57,1,1,0,1,0
2,6.67,0,12,1,19,0,0,0,1,0
3,4.0,0,12,4,22,0,0,0,0,0
4,7.5,0,12,17,35,0,1,0,0,0


#### Getting Started 
<p>To start off with, I'll take a skeleton of a neural network and add hidden layers and an output layer. As refresher, a nueral network contains an input layer, at least 1 hidden layer, and an output layer. I'll then fit that model and let Keras do the optimization so the model continually gets better.</p>

<p><img src="datasets/Hidden_Layer_print.png" width="50%" align="center"></p>

A Keras work flow has four steps. First you specify the architecture like how many layers you want, how many nodes in each layer, what activation function to use, etc. Next you compile the model, this specifies the loss function and some details about how optimization should work. Third, you fit the model, which is the cycle of backpropagation that optimizes models weights with the data. Finally, you use the model to make predictions about the data. I'll demonstrate this four step process below. In each cell, I'll often re-use code from the previous cell just to show how simple this can be.

In [12]:
# Specify the architecture:

# Save the number of columns in predictors as the number of input nodes in the model
n_cols = predictors.shape[1]

# Instanstiate a sequential NN model
model = Sequential()

# Add the first hidden layer specifying the input shape (nodes in the model)
model.add(Dense(50, activation='relu', input_shape=(n_cols,)))

# Add the second hidden layer
model.add(Dense(32, activation='relu'))

# Add the output layer
model.add(Dense(1))

#### Compiling and Fitting the Model
I'm now going to compile the model I specified above; this sets up the model to do backpropagation over an optimizer. To compile the model, I'll simply need to specify two neccessary arguments. Namely, the optimizer and loss function to use. The Adam optimizer is an excellent first choice, and MSE will work just fine for my purposes as a loss function. You can do further reading about it and other keras optimizers [here](https://keras.io/optimizers/#adam), and if you are really curious to learn more, you can read the [original paper](https://arxiv.org/abs/1412.6980v8) that introduced the Adam optimizer.

In the following cell I'll fit the model using the `.fit()` method. The fit step is where backpropagtion and gradient descent are applied to update the weights between each node.

In [13]:
# Specify the model just as before
n_cols = predictors.shape[1]
model = Sequential()
model.add(Dense(50, activation='relu', input_shape = (n_cols,)))
model.add(Dense(32, activation='relu'))
model.add(Dense(1))

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Verify that model contains information from compiling
print("Loss function: " + model.loss)

Loss function: mean_squared_error


In [14]:
# Specify the model just as before
n_cols = predictors.shape[1]
model = Sequential()
model.add(Dense(50, activation='relu', input_shape = (n_cols,)))
model.add(Dense(32, activation='relu'))
model.add(Dense(1))

# Compile the model just as before
model.compile(optimizer='adam', loss='mean_squared_error')

# Fit the model specifying the desired number of epochs (iterations over the entire X and y data provided)
model.fit(X_train, y_train, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1a3c295250>

In [24]:
# Calculate predictions
predictions = model.predict(X_test)

# Calculate predicted wages
predicted_prob_true = predictions[:,0]

# print predicted_prob_true
print(predicted_prob_true)

# print the differnce between known wages and our predicted ones
print(model.summary())

[ 9.745859   7.433459   8.60953    8.851855   8.920377   8.124661
  8.034364   7.4847617 10.203319   8.024747   9.201648  10.27475
  8.332324   8.541685   7.6393     8.155673  10.479332   9.415298
 11.145835   6.7195163  8.82191    9.374919   8.291409   8.046743
  6.9928975 10.752646  11.401799  11.09357    7.798319   9.605726
  9.470518  10.133234   9.4612055 11.071573  12.550696   7.165855
 10.214699   8.345549   8.485641  10.776885   9.597451   8.694691
  9.450538  10.017768   7.443108   8.281192   8.161546   8.344739
  8.952694   6.2367454  9.017238   9.313125   8.7930975  8.5409155
  8.403498   6.9806027  8.778039   7.5068192  9.972711   7.6570725
  8.29796    8.568919   7.583703  11.379802   7.841994   8.596922
  8.568919   7.5881543  8.417796   8.3093405  8.334565   8.146658
  9.6117735  7.26537    7.523249  10.596723  11.230952   7.027748
  7.8493376  8.798048   9.019937   7.340445   6.529385   9.118409
  8.074105   8.161546   8.003276  10.497347   8.332324   8.828879
  7.36682

#### Model Optimization 


In [40]:
n_cols = predictors.shape[1]

def nn_model(epochs):
    """create the NN model"""
    model = Sequential()
    model.add(Dense(50, activation='relu', input_shape = (n_cols,)))
    model.add(Dense(32, activation='relu'))
    model.add(Dense(4, activation='relu'))
    model.add(Dense(1))
    # Compile then return the model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model


In [44]:
# Import neccessary modules
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold

steps = [('standardize', StandardScaler()),
         ('model', KerasRegressor(build_fn=nn_model, 
                                  epochs=10,
                                  batch_size=5, 
                                  verbose=0))]

pipeline = Pipeline(steps)

pipeline.fit(X_train, y_train)

scaled_predictions = pipeline.predict(X_test)

kfold = KFold(n_splits=2, random_state=7)
results = cross_val_score(pipeline, X_test, y_test, cv=kfold)
print("Wider: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Wider: -26.20 (5.34) MSE
