Keras a warpper API that runs on top of Tensorflow or theano is very popular and easy to use. Scikit learn also very popular libraries for machine learning.In this post I will show how to use keras and scikit learn to build neural network architecture in python and develop a regression linear model.

### Load dependancies

In [1]:
import numpy as np
import pandas as pd
import keras
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV

from sklearn.utils import shuffle
import math
import warnings
warnings.filterwarnings("ignore")

print(sklearn.__version__)
print(keras.__version__)

Using TensorFlow backend.


0.19.0
2.1.1


### Load Keras APIs

In [2]:
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense, Dropout, BatchNormalization
from keras.constraints import maxnorm
from keras.wrappers.scikit_learn import KerasRegressor

The problem that we will solve here is to predict the peer rank of mutual funds based on the funs characteristics.
We will use train.csv file that has fund characteristics and peerrank for 3 separate periods and 24 different rank categories for each fund.
Our objective is to train a regression model in keras to be able to predict peer rank for new funds.

In [3]:
def columnNameBuilder(columnName, numberOfItems):
    columns = [ columnName + ' ' + str(i) for i in range(numberOfItems)]
    return columns

scikit learn works with numerical data, so we need to transform our categorical non-numeric data into numerical representation. we will create a function to engineer certain features and drop columns that are not required.

In [4]:
def feature_engineer(dataset):
    
    le = LabelEncoder()
    prospectus_objective = dataset["Prospectus Objective"].values
    prospectus_objective = le.fit_transform(prospectus_objective)
    dataset["Prospectus Objective"] = prospectus_objective
    
    col_names = columnNameBuilder("Average Credit Quality", 36)
    for i in range(36):
        col_name = col_names[i]
        dataset[col_name].fillna('0', inplace=True)
        col_val = dataset[col_name].values
        dataset[col_name] = le.fit_transform(col_val)
            
    dataset.loc[dataset['Available In Insurance Product'] == 'Yes', 'Available In Insurance Product'] = 1
    dataset.loc[dataset['Available In Insurance Product'] == 'No', 'Available In Insurance Product'] = 0
    dataset.loc[dataset['Available For Retirement Plan'] == 'Yes', 'Available For Retirement Plan'] = 1
    dataset.loc[dataset['Available For Retirement Plan'] == 'No', 'Available For Retirement Plan'] = 0
    dataset["Available In Insurance Product"] = dataset["Available In Insurance Product"].astype(float)
    dataset["Available For Retirement Plan"] = dataset["Available For Retirement Plan"].astype(float)

    dataset.drop("Primary Prospectus Benchmark", inplace = True, axis=1)
    dataset.drop("Firm Name", inplace = True, axis=1)
    dataset.drop('.id', inplace = True, axis=1)
    dataset.drop("Name", inplace = True, axis=1)
    dataset.fillna(dataset.mean(), inplace=True)

    return dataset

In [5]:
df_train = pd.read_csv("./data/train.csv")
df_test = pd.read_csv("./data/test.csv")

df_test.insert(0, 'PeerRank', 0)
dataset = pd.concat(objs=[df_train, df_test], axis=0)

dataset = feature_engineer(dataset)
#log transform skewed numeric features:
numeric_feats = dataset.dtypes[dataset.dtypes != "object"].index

skewed_feats = df_train[numeric_feats].apply(lambda x: skew(x.dropna())) #compute skewness
skewed_feats = skewed_feats[skewed_feats > 0.75]
skewed_feats = skewed_feats.index

dataset[skewed_feats] = np.log1p(dataset[skewed_feats])
dataset = pd.get_dummies(dataset)

train_num = len(df_train)
df_train = dataset[:train_num]
df_test = dataset[train_num:]


ytrain = df_train['PeerRank']
ytrain = np.log1p(ytrain)

df_train.drop("PeerRank", inplace=True, axis=1) 
df_train.drop("period", inplace = True, axis=1)
df_train.drop("RankCategory", axis=1, inplace=True)
    
df_test.drop("period", inplace = True, axis=1)
df_test.drop("RankCategory", axis=1, inplace=True)       
df_test.drop('PeerRank', inplace=True, axis=1)

The KerasClassifier and KerasRegressor classes in Keras take an argument build_fn which is the name of the function to call to get your model.

You must define a function called whatever you like that defines your model, compiles it and returns it.

In the example, below we define a function baseline_model() that create a simple multi-layer neural network for the problem.

We pass this function name to the KerasRegressor class by the build_fn argument. We also pass in additional arguments of nb_epoch=150 and batch_size=10. These are automatically bundled up and passed on to the fit() function which is called internally by the KerasClassifier class.

In this example, we use the scikit-learn StratifiedKFold to perform 10-fold stratified cross-validation. This is a resampling technique that can provide a robust estimate of the performance of a machine learning model on unseen data.

We use the scikit-learn function cross_val_score() to evaluate our model using the cross-validation scheme and print the results.

In [7]:
def baseline_model(dropout_rate=0.2, optimizer='rmsprop', init='glorot_uniform', weight_constraint=0):
    # create model
    model = Sequential()
    model.add(Dense(4096, activation='relu', kernel_initializer='uniform', input_shape=(891,), kernel_constraint=maxnorm(weight_constraint)))
    model.add(BatchNormalization())
    model.add(Dropout(dropout_rate))
    model.add(Dense(1024, activation='relu', kernel_initializer='uniform'))
    model.add(Dropout(dropout_rate))
    model.add(BatchNormalization())
    model.add(Dense(512, activation='relu', kernel_initializer='uniform'))
    model.add(Dropout(dropout_rate))
    model.add(BatchNormalization())
    model.add(Dense(256, activation='relu', kernel_initializer='uniform'))
    model.add(Dropout(dropout_rate))
    model.add(BatchNormalization())
    model.add(Dense(128, activation='relu', kernel_initializer='uniform'))
    model.add(Dropout(dropout_rate))
    model.add(BatchNormalization())
    model.add(Dense(64, activation='relu', kernel_initializer='uniform'))
    model.add(Dropout(dropout_rate))
    model.add(BatchNormalization())
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='mean_squared_error', optimizer=optimizer, metrics=['accuracy'])
    return model

In [8]:
model = baseline_model()
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 4096)              3653632   
_________________________________________________________________
batch_normalization_1 (Batch (None, 4096)              16384     
_________________________________________________________________
dropout_1 (Dropout)          (None, 4096)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 1024)              4195328   
_________________________________________________________________
dropout_2 (Dropout)          (None, 1024)              0         
_________________________________________________________________
batch_normalization_2 (Batch (None, 1024)              4096      
_________________________________________________________________
dense_3 (Dense)              (None, 512)               524800    
__________

In [12]:
def train_neural_network(df_train, ytrain):
    # fix random seed for reproducibility
    seed = 7
    numpy.random.seed(seed)
    X_train, X_val, y_train, y_val = train_test_split(df_train, ytrain, test_size=0.1, random_state=42)
    
    print("Train Data:", X_train.shape)
    # evaluate model with standardized dataset
    estimator = KerasRegressor(build_fn=baseline_model, nb_epoch=100, batch_size=5, verbose=0)
    
    estimator.fit(X_train.values, y_train.values)
    rmse = math.sqrt(mean_squared_error(y_val.values, estimator.predict(X_val.values)))
    print(rmse)
    
    # evaluate using 10-fold cross validation
    kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
    results = cross_val_score(estimator, df_train.values, ytrain.values, cv=kfold)
    print("mean score:", results.mean())

Grid Search Deep Learning Model Parameters

The previous example showed how easy it is to wrap your deep learning model from Keras and use it in functions from the scikit-learn library.

In this example, we go a step further. The function that we specify to the build_fn argument when creating the KerasRegressor wrapper can take arguments. We can use these arguments to further customize the construction of the model. In addition, we know we can provide arguments to the fit() function.

In this example, we use a grid search to evaluate different configurations for our neural network model and report on the combination that provides the best-estimated performance.

The create_model() function is defined to take two arguments optimizer and init, both of which must have default values. This will allow us to evaluate the effect of using different optimization algorithms and weight initialization schemes for our network.

After creating our model, we define arrays of values for the parameter we wish to search, specifically:

Optimizers for searching different weight values.
Initializers for preparing the network weights using different schemes.
Epochs for training the model for a different number of exposures to the training dataset.
Batches for varying the number of samples before a weight update.
The options are specified into a dictionary and passed to the configuration of the GridSearchCV scikit-learn class. This class will evaluate a version of our neural network model for each combination of parameters (2 x 3 x 3 x 3 for the combinations of optimizers, initializations, epochs and batches). Each combination is then evaluated using the default of 3-fold stratified cross validation.

That is a lot of models and a lot of computation. This is not a scheme that you want to use lightly because of the time it will take. It may be useful for you to design small experiments with a smaller subset of your data that will complete in a reasonable time. This is reasonable in this case because of the small network and the small dataset (less than 1000 instances and 9 attributes).

Finally, the performance and combination of configurations for the best model are displayed, followed by the performance of all combinations of parameters.

This might take about 5 minutes to complete on your workstation executed on the CPU (rather than CPU). running the example shows the results below.

We can see that the grid search discovered that using a uniform initialization scheme, rmsprop optimizer, 150 epochs and a batch size of 5 achieved the best cross-validation score of approximately 75% on this problem.

In [13]:
train_neural_network(df_train, ytrain)

Train Data: (4071, 891)
58.04536393479158
mean score: -3431.64293268


In [22]:
def gridSearch_neural_network(df_train, ytrain):
    # fix random seed for reproducibility
    seed = 7
    numpy.random.seed(seed)
    X_train, X_val, y_train, y_val = train_test_split(df_train, ytrain, test_size=0.1, random_state=42)
    
    print("Train Data:", X_train.shape)
    print("Train label:", y_train.shape)
    #print("Test Data:", Xtest.shape)
    # evaluate model with standardized dataset
    estimator = KerasRegressor(build_fn=baseline_model, nb_epoch=100, batch_size=5, verbose=0)
    
    #estimator.fit(X_train.values, y_train.values)
    #rmse = math.sqrt(mean_squared_error(y_val.values, estimator.predict(X_val.values)))
    #print(rmse)
 
    # grid search epochs, batch size and optimizer
    optimizers = ['rmsprop', 'adam']
    dropout_rate = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
    init = ['glorot_uniform', 'normal', 'uniform']
    epochs = [50, 100, 150]
    batches = [5, 10, 20]
    weight_constraint = [1, 2, 3, 4, 5]
    param_grid = dict(optimizer=optimizers, 
                      dropout_rate=dropout_rate, 
                      epochs=epochs, 
                      batch_size=batches, 
                      weight_constraint=weight_constraint, 
                      init=init)
    
    grid = GridSearchCV(estimator=estimator, param_grid=param_grid)
    grid_result = grid.fit(X_train.values, y_train.values)
    # summarize results
    print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
    means = grid_result.cv_results_['mean_test_score']
    stds = grid_result.cv_results_['std_test_score']
    params = grid_result.cv_results_['params']
    for mean, stdev, param in zip(means, stds, params):
        print("%f (%f) with: %r" % (mean, stdev, param))

Finally, the performance and combination of configurations for the best model are displayed, followed by the performance of all combinations of parameters.

In [None]:
gridSearch_neural_network(df_train, ytrain)

Train Data: (4071, 891)
Train label: (4071,)


### Summary

In this post, you discovered how you can wrap your Keras deep learning models and use them in the scikit-learn general machine learning library.

You can see that using scikit-learn for standard machine learning operations such as model evaluation and model hyperparameter optimization can save a lot of time over implementing these schemes yourself.

Wrapping your model allowed you to leverage powerful tools from scikit-learn to fit your deep learning models into your general machine learning process.

Do you have any questions about using Keras models in scikit-learn or about this post? Ask your question in the comments or send me email at GTK@capgroup.com and I will do my best to answer.