<a href="https://colab.research.google.com/github/Deep-Learning-Challenge/challenge-notebooks/blob/master/1.Multilayer%20Perceptrons/1.Lessons/3.%20Use%20Models%20With%20Scikit-Learn%20For%20Fine%20Tunning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" /></a>

# Use Models With Scikit-Learn For Fine Tunning

The scikit-learn library is the most popular library for general machine learning in Python. This lesson will explore how you can use deep learning models from Keras with the scikit-learn library in Python. After completing this lesson, you will know:

* How to wrap a Keras model for use with the scikit-learn machine learning library.
* How to easily evaluate Keras models using cross-validation in scikit-learn.
* How to tune Keras model hyperparameters using grid search in scikit-learn.

Let's get started.

## Overview

Keras is a popular library for deep learning in Python, but its focus is deep learning, not machine learning. It strives for minimalism, focusing on only what you need to define and build deep learning models quickly. The scikit-learn library in Python is built upon the SciPy stack for efficient numerical computation. It is a fully-featured library for general-purpose machine learning and provides many utilities useful in developing deep learning models. Not least:

* Evaluation of models using resampling methods like k-fold cross-validation.
* Efficient search and evaluation of model hyperparameters.

The Keras library provides a convenient wrapper for deep learning models for
classification or regression estimators in scikit-learn. In the next sections, we will work through examples of using the `KerasClassifier` wrapper for a classification neural network created in Keras and utilized in the scikit-learn library. The test problem is the Pima Indians onset of diabetes classification dataset.

## Runtime Setup

In [1]:
import sys
#!{sys.executable} -m pip install numpy keras tensorflow sklearn
#!conda install --yes --prefix {sys.prefix} numpy keras tensorflow sklearn

dataset_name = "pima-indians-diabetes.data.csv"
if 'google.colab' in sys.modules:
    DATASET = f"https://github.com/Deep-Learning-Challenge/challenge-notebooks/raw/master/datasets/{dataset_name}"
else:
    DATASET = f"../../datasets/{dataset_name}"
    
DATASET

'../../datasets/pima-indians-diabetes.data.csv'

## Evaluate Models with Cross-Validation

The `KerasClassifier` and `KerasRegressor` classes in Keras take an argument `build_fn`, which is the name of the function to call to create your model. You must define a function called whatever you like that defines your model, compiles it, and returns it. In the example below, we define a function `create_model()` that creates a simple multilayer neural network for the problem.

We pass this function name to the `KerasClassifier` class by the `build_fn` argument. We also pass in additional arguments of `epochs=150` and `batch_size=10`. These are automatically bundled up and passed on to the `fit()` function, which is called internally by the `KerasClassifier` class. In this example, we use the scikit-learn `StratifiedKFold` to perform 10-fold stratified cross-validation. This is a resampling technique that can provide a robust estimate of a machine learning model's performance on unseen data. We use the scikit-learn function `cross_val_score()` to evaluate our model using the cross-validation scheme and print the results.

Running the example displays the skill of the model for each epoch. A total of 10 models are created and evaluated and the final average accuracy is displayed.

You can see that when the Keras model is wrapped that estimating model accuracy can be greatly streamlined, compared to the manual enumeration of cross-validation folds performed in the previous lesson.

## Grid Search Deep Learning Model Parameters

The previous example showed how easy it is to wrap your deep learning model from Keras and use it in functions from the scikit-learn library. In this example, we go a step further. We already know we can provide arguments to the `fit()` function. The function that we specify to the `build_fn` argument when creating the `KerasClassifier` wrapper can also take arguments. We can use these arguments to customize the construction of the model further.

In this example, we use a grid search to evaluate different configurations for our neural network model and report on the combination that provides the best-estimated performance. The `create_model()` function is d to take two arguments, `optimizer` and `init`, which must have default values. This will allow us to evaluate using different optimization algorithms and weight initialization schemes for our network. After creating our model, we define arrays of values for the parameter we wish to search, specifically:

* Optimizers for exploring different weight values.
* Initializers for preparing the network weights using different schemes.
* The number of epochs for training the model for the different number of exposures to the training dataset.
* Batches for varying the number of samples before weight updates.

The options are specified into a dictionary and passed to the configuration of the `GridSearchCV` scikit-learn class. This class will evaluate a version of our neural network model for each combination of parameters (2 x 3 x 3 x 3) for the combinations of optimizers, initializations, epochs, and batches). Each combination is then evaluated using the default of 3-fold stratified
cross-validation.

That is a lot of models and a lot of computation. This is not a scheme that you want to use lightly because of its time to compute. It may be useful for you to design small experiments with a smaller subset of your data that will complete in a reasonable time. This experiment is reasonable in this case because of the small network and the small dataset (less than 1,000 instances and nine attributes). Finally, the performance and combination of configurations for the best model are displayed, followed by the performance of all combinations of parameters.

This might take about 5 minutes to complete on your workstation executed on the CPU. Running the example shows the results below. We can see that the grid search discovered that using a uniform initialization scheme, adam optimizer, 150 epochs and a batch size of 5 achieved the best cross-validation score of approximately 75% on this problem.

In [2]:
# MLP for Pima Indians Dataset with grid search via sklearn
import tensorflow as tf

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
import numpy

# Function to create model, required for KerasClassifier
def create_model(optimizer='rmsprop', init='glorot_uniform'):
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=8, kernel_initializer=init, activation='relu'))
    model.add(Dense(8, kernel_initializer=init, activation='relu'))
    model.add(Dense(1, kernel_initializer=init, activation='sigmoid'))
    
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])

    return model

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# load pima indians dataset
dataset = numpy.loadtxt(DATASET, delimiter=",")

# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]

# create model
model = KerasClassifier(build_fn=create_model, verbose=1)

# grid search epochs, batch size and optimizer
optimizers = ['rmsprop', 'adam']
inits = ['glorot_uniform', 'normal', 'uniform']
epochs = [50, 100, 150]
batches = [5, 10, 20]
param_grid = dict(optimizer=optimizers, epochs=epochs, batch_size=batches, init=inits)
grid = GridSearchCV(estimator=model, param_grid=param_grid, verbose = 0, n_jobs=-1)
grid_result = grid.fit(X, Y)

# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

2021-10-06 22:44:31.932874: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-10-06 22:44:31.933205: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-10-06 22:44:31.933920: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the ap

Epoch 1/50
Epoch 1/50
Epoch 1/50
Epoch 1/50
Epoch 2/50
Epoch 2/50
Epoch 2/50
Epoch 3/50
Epoch 3/50
Epoch 3/50
Epoch 3/50
Epoch 4/50
Epoch 4/50
Epoch 4/50
 20/123 [===>..........................] - ETA: 0s - loss: 0.7589 - accuracy: 0.6300Epoch 4/50
Epoch 5/50
Epoch 5/50
Epoch 5/50
 20/123 [===>..........................] - ETA: 0s - loss: 0.7659 - accuracy: 0.5400Epoch 5/50
Epoch 6/50
Epoch 6/50
Epoch 6/50
Epoch 6/50
Epoch 7/50
Epoch 7/50
 22/123 [====>.........................] - ETA: 0s - loss: 0.6565 - accuracy: 0.6182Epoch 7/50
 16/123 [==>...........................] - ETA: 0s - loss: 0.6423 - accuracy: 0.6250Epoch 7/50
Epoch 8/50
Epoch 8/50
Epoch 8/50
  1/123 [..............................] - ETA: 0s - loss: 0.4718 - accuracy: 0.8000Epoch 8/50
Epoch 9/50
Epoch 9/50
Epoch 9/50
Epoch 10/50
Epoch 10/50
Epoch 10/50
  1/123 [..............................] - 0s 3ms/step - loss: 0.6547 - accuracy: 0.6889
 - ETA: 0s - loss: 0.9057 - accuracy: 0.4000Epoch 11/50
Epoch 12/50
Epoch 12/50
E

## Summary

In this lesson, you discovered how you could wrap your Keras deep learning models and use them in the scikit-learn general machine learning library. You learned:

* Specifically how to wrap Keras models so that they can be used with the scikit-learn machine learning library.
* How to use a wrapped Keras model as part of evaluating model performance in scikit-learn.
* How to perform hyperparameter tuning in scikit-learn using a wrapped Keras model.

You can see that using scikit-learn for standard machine learning operations such as model evaluation and model hyperparameter optimization can save much time over implementing from scratch.