

# Tune Practice

## *Data Science Unit 4 Sprint 2 Assignment 3*

# Gridsearch Hyperparameters

In the guided project, you learned how to use sklearn's GridsearchCV and keras-tuner library to tune the hyperparameters of a neural network model. For your module project, you'll continue using these two libraries; however, we will make things a little more interesting for you. 

Continue to use TensorFlow Keras & a sample of the [Quickdraw dataset](https://github.com/googlecreativelab/quickdraw-dataset) to build a sketch classification model. The dataset has been sampled to only 10 classes and 10000 observations per class. 



**Don't forget to switch to GPU on Colab!**

In [None]:
# native python libraries imports 
import math
from time import time

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


# sklearn imports 
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV

# keras imports 
import keras
from keras import Sequential
from keras.layers import Dense
from kerastuner.tuners import RandomSearch, BayesianOptimization, Sklearn
from kerastuner.engine.hyperparameters import HyperParameters
from keras.activations import relu, sigmoid
from keras.optimizers import Adam, SGD
from keras.utils import get_file

# required for compatibility between sklearn and keras
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

ModuleNotFoundError: ignored

In [None]:
def load_quickdraw10():
    """
    Fill out this docstring, and comment the code for practice in writing the kind of code that will get you hired. 
    """
    
    URL_ = "https://github.com/LambdaSchool/DS-Unit-4-Sprint-2-Neural-Networks/blob/main/quickdraw10.npz?raw=true"
    
    path_to_zip = get_file('quickdraw10.npz', origin=URL_, extract=False)

    data = np.load(path_to_zip)
    
    # normalize your image data
    max_pixel_value = 255
    X = data['arr_0']/max_pixel_value
    Y = data['arr_1']
        
    return train_test_split(X, Y, shuffle=True)

In [None]:
X_train, X_test, y_train, y_test = load_quickdraw10()

NameError: ignored

In [None]:
X_train.shape

NameError: ignored

In [None]:
y_train.shape

NameError: ignored

_____

# Experiment 1

## Tune Hyperparameters Using Enhanced GridsearchCV 

We will use GridsearchCV again to tune a deep learning model; however, we will add some additional functionality to our gridsearch. Specifically, we will automate away the generation of how many nodes to use in a layer and how many layers to use in a model!

By the way, yes, there is a function within a function. Try not to let that bother you. An alternative to this would be to create a class. If you're up for the challenge, give it a shot. However, consider this a stretch goal that you come back to after going through this assignment. 


### Objective 

This experiment aims to show you how to automate the generation of layers and layer nodes for gridsearch. Up until now, we've been manually selecting the number of layers and layer nodes.

In [None]:
# Function to create model, required for KerasClassifier
def create_model(n_layers,  first_layer_nodes, last_layer_nodes, act_funct ="relu", negative_node_incrementation=True):
    """"
    Returns a complied keras model 
    
    Parameters
    ----------
    n_layers: int 
        number of hidden layers in the model 
        To be clear, this excludes the input and output layers.
        
    first_layer_nodes: int
        Number of nodes in the first hidden layer 

    last_layer_nodes: int
        Number of nodes in the last hidden layer (this is the layer before the output layer)
        
     act_funct: string 
         Name of activation function to use in hidden layers (this excludes the output layer)
        
    Returns
    -------
    model: keras object 
    """
    
    def gen_layer_nodes(n_layers, first_layer_nodes, last_layer_nodes, negative_node_incrementation=True):
        """
        Generates and returns the number of nodes in each hidden layer. 
        To be clear, this excludes the input and output layers. 

        Note
        ----
        The number of nodes in each layer is linearly incremented. 
        For example, gen_layer_nodes(5, 500, 100) will generate [500, 400, 300, 200, 100]

        Parameters
        ----------
        n_layers: int
            The number of hidden layers
            These values should be 2 or greater 

        first_layer_nodes: int

        last_layer_nodes: int

        Returns
        -------
        layers: list of ints
            Contains the number of nodes for each layer 
        """

        # throws an error if n_layers is less than 2 
        assert n_layers >= 2, "n_layers must be 2 or greater"

        layers = []

        # PROTIP: IF YOU WANT THE NODE INCREMENTATION TO BE SPACED DIFFERENTLY
        # THEN YOU'LL NEED TO CHANGE THE WAY THAT IT'S CALCULATED - HAVE FUN!
        # when set to True number of nodes, are decreased for subsequent layers 
        if negative_node_incrementation:
            # subtract this amount from the previous layer's nodes to increment towards smaller numbers 
            nodes_increment = (last_layer_nodes - first_layer_nodes)/ (n_layers-1)
            
        # when set to False number of nodes are increased for subsequent layers
        else:
            # add this amount from previous layer's nodes in order to increment towards larger numbers 
            nodes_increment = (first_layer_nodes - last_layer_nodes)/ (n_layers-1)

        nodes = first_layer_nodes

        for i in range(1, n_layers+1):

            layers.append(math.ceil(nodes))

            # increment nodes for next layer 
            nodes = nodes + nodes_increment

        return layers
    
    # create model
    model = Sequential()
    
    n_nodes = gen_layer_nodes(n_layers, first_layer_nodes, last_layer_nodes, negative_node_incrementation)
    
    for i in range(1, n_layers):
        if i==1:
            model.add(Dense(first_layer_nodes, input_dim=X_train.shape[1], activation=act_funct))
        else:
            model.add(Dense(n_nodes[i-1], activation=act_funct))
            
            
    # output layer 
    model.add(Dense(10, # 10 unit/neurons in output layer because we have 10 possible labels to predict  
                    activation='softmax')) # use softmax for a label set greater than 2            
    
    # Compile model
    model.compile(loss='sparse_categorical_crossentropy', 
                  optimizer='adam', # adam is a good default optimizer 
                  metrics=['accuracy'])
    
    # do not include model.fit() inside the create_model function
    # KerasClassifier is expecting a compiled model 
    return model


## Explore Create_Model

Let's build a few different models to understand how the above code works in practice. 

### Build Model 

Use `create_model` to build a model. 

- Set `n_layers = 10` 
- Set `first_layer_nodes = 500`
- Set `last_layer_nodes = 100`
- Set `act_funct = "relu"`
- Make sure that `negative_node_incrementation = True`

In [None]:
# use create_model to create a model 

# YOUR CODE HERE

# def create_model(n_layers,  first_layer_nodes, last_layer_nodes, act_funct ="relu", negative_node_incrementation=True):

create_model(10, 500, 100, "relu", True)

NameError: ignored

In [None]:
# run model.summary() and make sure that you understand the model architecture that you just built 
# Notice in the model summary how the number of nodes has been linearly incremented in decreasing values. 
model.summary()

### Build Model 

Use `create_model` to build a model. 

- Set `n_layers = 10` 
- Set `first_layer_nodes = 500`
- Set `last_layer_nodes = 100`
- Set `act_funct = "relu"`
- Make sure that `negative_node_incrementation = False`

In [None]:
# use create_model to create a model 

# YOUR CODE HERE

# def create_model(n_layers,  first_layer_nodes, last_layer_nodes, act_funct ="relu", negative_node_incrementation=True):

create_model(10, 500, 100, "relu", True)

In [None]:
# run model.summary() and make sure that you understand the model architecture that you just built 
# Notice in the model summary how the number of nodes has been linearly incremented in increasing values.
# The output layer must have 10 nodes because there are 10 labels to predict 
model.summary()

In [None]:
# feel free to play around with parameters to gain additional insight as to how the create_model function works 



Ok, now that we've played around a bit with  `create_model` to understand how it works, let's build a much simpler model that we'll be running gridsearches. 

### Build Model 

Use `create_model` to build a model. 

- Set `n_layers = 2` 
- Set `first_layer_nodes = 500`
- Set `last_layer_nodes = 100`
- Set `act_funct = "relu"`
- Make sure that `negative_node_incrementation = True`

In [None]:
# use create_model to create a model 

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# run model.summary() and make sure that you understand the model architecture that you just built 
model.summary()

In [None]:
# define the grid search parameters
param_grid = {'n_layers': [2, 3],
              'epochs': [3], 
              "first_layer_nodes": [500, 300],
              "last_layer_nodes": [100, 50]
             }

In [None]:
model = KerasClassifier(create_model)

In [None]:
# Create Grid Search
grid = GridSearchCV(estimator=model, 
                    param_grid=param_grid, 
                    n_jobs=-2, 
                    verbose=1, 
                    cv=3)

grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")

means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

In [None]:
best_model = grid_result.best_estimator_

In [None]:
best_model.get_params()

-----

# Experiment 2

## Benchmark Different Optimization Algorithms 

In this section, we are going to use the same model and dataset to benchmark 3 different gridsearch approaches: 

- Random Search
- Bayesian Optimization
- Brute Force Gridsearch

Our goal in this experiment is two-fold. We want to see which approach: 

- Scores the highest accuracy
- Has the shortest run time 

We want to see how these 3 gridsearch approaches handle these trade-offs and give you a sense of those trade-offs.

### Trade-Offs

`Brute Force Gridsearch` will train a model on every unique hyperparameter combination; this guarantees that you'll get the highest possible accuracy from your parameter set, but your gridsearch might have a very long run-time.

`Random Search` will randomly sample from your parameter set, which, depending on how many samples, the run-time might be significantly cut down. Still, you might or might not sample the parameters that correspond to the highest possible accuracies.

`Bayesian Optimization` has a bit of intelligence built into its search algorithm, but you must manually select some parameters that greatly influence the model learning outcomes.

-------
### Build our model

In [None]:
# because gridsearching can take a lot of time, and we are bench-marking 3 different approaches
# let's build a simple model to minimize run time 

def build_model(hp):
    
    """
    Returns a compiled keras model ready for keras-tuner gridsearch algorithms 
    """
    
    model = Sequential()
    
    # hidden layer
    model.add(Dense(units=hp.get('units'),activation=hp.get("activation")))
    
    # output layer
    model.add(Dense(10, activation='softmax'))
    
    model.compile(
        optimizer=keras.optimizers.Adam(hp.get('learning_rate')),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy'])
    
    return model

In [None]:
# build out our hyperparameter dictionary 
hp = HyperParameters()
hp.Int('units', min_value=32, max_value=512, step=32)
hp.Choice('learning_rate',values=[1e-1, 1e-2, 1e-3])
hp.Choice('activation',values=["relu", "sigmoid"])

------
# Run the Gridsearch Algorithms 

### Random Search

Be sure to check out the [**docs for Keras-Tuner**](https://keras-team.github.io/keras-tuner/documentation/tuners/). Here you can read about the input parameters for the `RandomSearch` tuner.

In [None]:
# how many unique hyperparameter combinations do we have? 
# HINT: take the product of the number of possible values for each hyperparameter 
# save your answer to n_unique_hparam_combos

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# how many of these do we want to randomly sample?
# let's pick 25% of n_unique_hparam_combos param combos to sample
# save this number to n_param_combos_to_sample

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
random_tuner = RandomSearch(
            build_model,
            objective='val_accuracy',
            max_trials=n_param_combos_to_sample, # number of times to sample the parameter set and build a model 
            seed=1234,
            hyperparameters=hp, # pass in our hyperparameter dictionary
            directory='./keras-tuner-trial',
            project_name='random_search')

In [None]:
# take note of Total elapsed time in print out
random_tuner.search(X_train, y_train,
                    epochs=3,
                    validation_data=(X_test, y_test))

In [None]:
# identify the best score and hyperparameter (should be at the top since scores are ranked)
random_tuner.results_summary()

 ### Results
 
Identify and write the best performing hyperparameter combination and model score. Note that because this is a random search, multiple runs might have slightly different outcomes.
 
 

YOUR ANSWER HERE

------
### Bayesian Optimization

![](https://upload.wikimedia.org/wikipedia/commons/0/02/GpParBayesAnimationSmall.gif)

Be sure to check out the [**docs for Keras-Tuner**](https://keras-team.github.io/keras-tuner/documentation/tuners/). Here you can read about the input parameters for the `BayesianOptimization` tuner.

Pay special attention to these `BayesianOptimization` parameters: `num_initial_points` and `beta`. 

`num_initial_points`: 

Number of randomly selected hyperparameter combinations to try before applying Bayesian probability to determine the likelihood of which param combo to try next based on expected improvement


`beta`: 

Larger values mean more willingness to explore new hyperparameter combinations (analogous to searching for the global minimum in gradient descent). Conversely, smaller values mean less willingness to try new hyperparameter combinations (analogous to getting stuck in a local minimum in gradient descent). 

As a start, err on the side of larger values. What defines a small or large value, you ask? That question would pull us into the mathematical intricacies of Bayesian Optimization and Gaussian Processes. For simplicity, notice that the default value is 2.6 and work from there. 

In [None]:
# we know that 24 samples is about 25% of 96 possible hyper-parameter combos
# because BO isn't random (after num_initial_points number of trails) let's see if 15 max trials gives good results
# feel free to play with any of these numbers
max_trials=15
num_initial_points=5
beta=5.0

In [None]:
bayesian_tuner = BayesianOptimization(
                    build_model,
                    objective='val_accuracy',
                    max_trials=max_trials,
                    hyperparameters=hp, # pass in our hyperparameter dictionary
                    num_initial_points=num_initial_points, 
                    beta=beta, 
                    seed=1234,
                    directory='./keras-tuner-trial',
                    project_name='bayesian_optimization_4')

In [None]:
bayesian_tuner.search(X_train, y_train,
               epochs=3,
               validation_data=(X_test, y_test))

In [None]:
bayesian_tuner.results_summary()

 ### Results
 
Identify and write the best performing hyperparameter combination and model score. Note that because this is Bayesian Optimization, multiple runs might have slightly different outcomes.
 
 

YOUR ANSWER HERE

---------
## Brute Force Gridsearch Optimization


### Populate a Sklearn Compatible Parameter Dictionary

In [None]:
# build out our hyperparameter dictionary 
hyper_parameters = {
    # BUG Fix: cast array as list otherwise GridSearchCV will throw error
    "units": np.arange(32, 544, 32).tolist(),
    "learning_rate": [1e-1, 1e-2, 1e-3],
    "activation":["relu", "sigmoid"]
}

In [None]:
hyper_parameters

### Build a Sklearn Compatible Model Function

In [None]:
def build_model(units, learning_rate, activation):
    
    """
    Returns a compile keras model ready for keras-tuner gridsearch algorithms 
    """
    
    model = Sequential()
    
    # hidden layer
    model.add(Dense(units, activation=activation))
    
    # output layer
    model.add(Dense(10, activation='softmax'))
    
    model.compile(
        optimizer=Adam(learning_rate),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy'])
    
    return model

In [None]:
model = KerasClassifier(build_fn = build_model)

In [None]:
# save start time 
start = time()

# Create Grid Search
grid = GridSearchCV(estimator=model, 
                    param_grid=hyper_parameters, 
                    n_jobs=-2, 
                    verbose=1, 
                    cv=3)

grid_result = grid.fit(X_train, y_train)

# save end time 
end = time()

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")

means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

In [None]:
# total run time 
total_run_time_in_minutes = (end - start)/60
total_run_time_in_minutes

In [None]:
grid_result.best_params_

In [None]:
# because all other optimization approaches are reporting test set score
# let's calculate the test set score in this case 
best_model = grid_result.best_estimator_
test_acc = best_model.score(X_test, y_test)

In [None]:
test_acc

 ### Results
 
Identify and write the best performing hyperparameter combination and model score.
 
 

YOUR ANSWER HERE

_______

# Conclusion

The spirit of this experiment is to expose you to the idea of benchmarking and comparing the trade-offs of various gridsearch approaches. 

Even if we found a way to pass the original test set into GridSearchCV, we could see that both Random Search and Bayesian Optimization are arguably better alternatives to a brute force grid search. 

----

# Stretch Goals

- Feel free to run whatever gridsearch experiments on whatever models you like!

In [None]:
# this is your open playground - be free to explore as you wish 