<a href="https://colab.research.google.com/github/MohamedElsayed002/DeepLearning_Study/blob/master/GridSearchwithKeras7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hyperparameter Optimization for keras with scikit-Learn

We already know how to use `RandomizedSearchCV` and `GridSearchCV` for hyperparameter tuning in machine learning models - linear regression,  decision trees, and so on. It turns out that we can utilize the same functionality easily for neural networks! Keras offers a scikit-learn wrapper that lets us perform randomized/grid search on its models using the same syntax (example `fit()`, `.best_score_`). In this lab, we will take a look at how to do so for a Sequential model.

As a reminder, both search types may take a long time to run for this lab.

## **Table of Contents**

<ol>
    <li><a href="https://#Objectives">Objectives</a></li>
    <li>
        <a href="https://#Setup">Setup</a>
        <ol>
            <li><a href="https://#Installing-Required-Libraries">Installing Required Libraries</a></li>
            <li><a href="https://#Importing-Required-Libraries">Importing Required Libraries</a></li>
            <li><a href="https://#Defining-Helper-Functions">Defining Helper Functions</a></li>
        </ol>
    </li>
    <li>
        <a href="https://#Create-the-Model">Create the Model</a>
        <ol>
            <li><a href="https://#Load-the-Data">Load the Data</a></li>
            <li><a href="https://#Data-Wrangling">Data Wrangling</a></li>
            <li><a href="https://#Build-the-Base-Model">Build the Base Model</a></li>
        </ol>
    </li>  
    <li>
        <a href="https://#Randomized-Search">Randomized Search</a>
        <ol>
            <li><a href="https://#Parameters">Parameters</a></li>
            <li><a href="https://#Define-and-Fit-RandomizedSearchCV">Define and Fit RandomizedSearchCV</a></li>
            <li><a href="https://#Performance-Evaluation">Performance Evaluation</a></li>
        </ol>
    </li>
    <li>
        <a href="https://#Exercised">Exercises</a>
        <ol>
            <li><a href="https://#Exercise-1:-Build-the-Base-Model">Exercise 1: Build the Base Model</a></li>
            <li><a href="https://#Exercise-2:-Define-Search-Parameters">Exercise 2: Define Search Parameters</a></li>
            <li><a href="https://#Exercise-3:-Fit-RandomizedSearchCV">Exercise 3: Fit RandomizedSearchCV</a></li>
        </ol>
    </li>    
</ol>


In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' # tensorflow INFO and WARNING messages are not printed
# You can also use this section to suppress warnings generated by your code:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn
warnings.filterwarnings('ignore')

from tqdm import tqdm
import numpy as np
%matplotlib inline

import tensorflow as tf
import keras
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
# from keras.wrappers.scikit_learn import KerasClassifier
import skillsnetwork

### Defining Helper Functions


In [None]:
# Vectorize integer sequence
def vectorize_sequence(sequence, dimensions):
    results = np.zeros((len(sequence), dimensions))
    for index,value in enumerate(sequence):
        if max(value) < dimensions:
            results[index, value] = 1
    return results

# Convert label into one-hot format
def one_hot_label(labels, dimensions):
    results = np.zeros((len(labels), dimensions))
    for index,value in enumerate(labels):
        if value < dimensions:
            results[index, value] = 1
    return results

In [None]:
await skillsnetwork.prepare("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-ML311-Coursera/labs/Module2/L1/reuters.npz", overwrite=True)


Downloading reuters.npz:   0%|          | 0/2110848 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]

Saved to '.'


In [None]:
X = np.load("x.npy", allow_pickle=True)
y = np.load ("y.npy", allow_pickle=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

In [None]:
word_to_ind = tf.keras.datasets.reuters.get_word_index(path="reuters_word_index.json")

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/reuters_word_index.json


# Data Wrangling

Since each observation is a list of words that appear in the newswire, the length varies. Hence, we will vectorize the dataset using `vectorize_sequence()` to ensure that all inputs to our model have the same dimension. Labels are also one-hot encoded with `one_hot_label()` because classes (news topic) are not ordinal.


In [None]:
dim_x = max([max(sequence) for sequence in X_train])+1
dim_y = max(y_train)+1

X_train_vec = vectorize_sequence(X_train, dim_x)
X_test_vec = vectorize_sequence(X_test, dim_x)
y_train_hot = one_hot_label(y_train, dim_y)
y_test_hot = one_hot_label(y_test, dim_y)

# Build the Base Model
In order to apply `RandomizedSearchCV` on Keras models, we will be using `KerasClassifier` from `keras.wrappers.scikit_learn` library, which will let us apply scikit-learn functions on the model.

We define `create_model()` below to detail which layers we want to include in the model. Recall that the final Dense layer has 46 units to correspond to the number of classes. This also prompts us to use categorical cross entropy as a loss function. Here, `neuron` is included as a parameter with default value because we want to tune it later.

In [None]:
# Create Keras Sequential Model as base model
def create_model(neurons = 10):
    model = Sequential()
    model.add(Dense(neurons, activation='linear'))
    model.add(Dense(64, activation='relu'))
    model.add(Dense(46, activation='softmax'))
    model.compile(optimizer='RMSprop', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

For the base model, we won't change any parameters so that we can compare them with results after hyperparameter tuning. We also specify some of the default values for hyperparameters that don't appear in create_model() (example batch_size, epochs) such that they are defined when applying randomized search.


In [None]:
# from keras.wrappers.scikit_learn import KerasClassifier
# !pip install KerasClassifier
# np.random.seed(0)
# base_model = KerasClassifier(build_fn=create_model, verbose=0, batch_size=10, epochs=1)

In [None]:
# Get pre-tuned results
# base_model.fit(X_train_vec, y_train_hot)
# base_score = base_model.score(X_test_vec, y_test_hot)
# print("The baseline accuracy is: %.3f" % base_score)

# Randomized Search

## Parameters

 As you might already know from performing randomized search on machine learning models, we have to create a dictionary for the hyperparameter values. Let's start by defining the values we want to experiment with! Note that if you would like to test other parameters, they must be defined in the base model as well.

In [None]:
batch_size = [10,20,60,80]
epochs = [1,3,5]
neurons = [1,10,20,30]

params = dict(batch_size=batch_size,epochs=epochs,neurons=neurons)
params

{'batch_size': [10, 20, 60, 80],
 'epochs': [1, 3, 5],
 'neurons': [1, 10, 20, 30]}

# Define and Fit RandomizedSearchCV

In [None]:
# search = RandomizedSearchCV(estimator=base_model,param_distributions=params,cv=3)

Now, fit randomized search on `X_train_vec` and `y_train_hot` as you would for any other model. **Note that this may take a while to run (10+ minutes)**, especially if there are a lot of parameter combinations, or if the epoch size is big. If you have the resources, you could also switch out `RandomizedSearchCV` for `GridSearchCV` to search over every combination of hyperparameters (takes even more time to run).

In [None]:
# search_result = search.fit(X_train_vec, y_train_hot)


# Performance Evaluation

Let's take a look at the results from this search! In particular, we will examine the mean and standard deviation of the cross-validation score under different hyperparameter combinations.


In [None]:
# means = search_result.cv_results_['mean_test_score']
# stds = search_result.cv_results_['std_test_score']
# params = search_result.cv_results_['params']

In [None]:
# RandomizdSearchCV also has attributes for us to access the best score and parameters directly

# print("Best mean cross-validated score: {} using {}".format(round(search_result.best_score_,3), search_result.best_params_))



In [None]:
# for mean, stdev, param in zip(means, stds, params):
    # print("Mean cross-validated score: {} ({}) using: {}".format(round(mean,3), round(stdev,3), param))

From this, we can see how different the other models' scores are compared to the optimal model's performance. Some are pretty close to the best score, whereas there are combinations that yield much lower scores.Thank goodness we didn't pick those! With randomized search on neural networks, we are able to determine the best values in an automated way.


In [None]:
# Using the best estimator, let's get the test score

# print("Best test score: %.3f" % search_result.best_estimator_.score(X_test_vec, y_test_hot))

In [None]:
def create_model(optimizer = 'RMSprop', optimizer__learning_rate = 0.1, dropout_rate = 0.2):
    model = Sequential()
    model.add(Dense(64, activation='linear'))
    model.add(Dropout(dropout_rate))
    model.add(Dense(64, activation='relu'))
    model.add(Dense(46, activation='softmax'))
    model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [None]:
# np.random.seed(0)
# base_model = KerasClassifier(build_fn=create_model, verbose=0, batch_size=100, epochs=1)
# base_model.fit(X_train_vec, y_train_hot)
# base_score = base_model.score(X_test_vec, y_test_hot)
# print("The baseline accuracy is: {}".format(base_score))

NameError: name 'KerasClassifier' is not defined

In [None]:
optimizer = ['SGD','RMSprop','Adam']
learning_rate = [0.01, 0.1, 1]
dropout_rate = [0.1, 0.3, 0.6, 0.9]
params = dict(optimizer=optimizer, optimizer__learning_rate=learning_rate, dropout_rate = dropout_rate)

In [None]:
# search = RandomizedSearchCV(estimator=base_model, param_distributions=params, cv=3)
# search_result = search.fit(X_train_vec, y_train_hot)

In [None]:
# print("Best mean cross-validated score: {} using {}".format(round(search_result.best_score_,3), search_result.best_params_))
# print("Best test score: %.3f" % search_result.best_estimator_.score(X_test_vec, y_test_hot))