## Hyperparameter Tuning

In this exercise you will be building a Neural network for which, you will be tuning the **Model Parameters** to find out the parameters with which the model perform its best.

You will using 

1. `Grid Search`
2. `Random Search`


### 1. Import the Packages

In [1]:
import os
import pandas as pd
import wrangle as wr
from numpy import nan

from keras.utils import to_categorical
from keras.wrappers.scikit_learn import KerasClassifier
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten

from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV


Using TensorFlow backend.


In [2]:
#Read the dataset with pandas
df = pd.read_csv('data2.csv')

### 2.Basic Data Cleaning


1.   Drop the Unnamed:32` and `id` columns
2.   Consider `diagnosis` column as the labels(y) while the rest of the columns as features (X) 

**Note:**

Convert the labels in terms of 0 and 1 where 1 corresponds to M and 0 Corresponds to 0



In [3]:
def breast_cancer(df):
    '''Load and preprocess(cleaning) the dataset
    Input: Dataframe
    Output: x,y
    x:Features
    y:Labels in form of 0 and 1
    '''
    df=df.drop(['id','Unnamed: 32'],axis=1)
    x=df.drop('diagnosis',axis=1)
    y=df['diagnosis']
    y=y.replace(['M','B'],[1,0])
    return x, y

In [4]:
#Call the Datacleaning Function
x, y = breast_cancer(df)

# Normalize every feature in x to mean 0, std 1 with wrangle rescale_meanzero function
x =wr.df_rescale_meanzero(x)

#Initialise the input feature dimension
input_dim =x.shape[1]

### 3.Decide on the Parameters to be tuned and create the model
We will be creating a 2-layer Neural Network.

In this example we will be tuning only the model parameters while the hyperparameters can be tuned in later exercise.


**Model Parameters to be tuned**
1. `first_neurons`:Number of neurons in the First layer
2. `activation`: Activation function to be used in First layer.
3. `kernel_initializer`:Initializer in both the layers
4. `optimizer`:Optimizer to be used when compiling the model.


 
 **Hyper Parameters**
 1. `epochs`
 2. `batch_size`
 3. `dropout_rate`

 
----------------------------------------------------------------
**Create the array of values for each  parameters**
1. first_neurons with values 8,9
2. activation with values relu and tanh
3. kernel_initializer with values uniform,he_uniform
4. optimizer with values Adam and SGD



**Note: Make sure to initialize the values in the same order**

In [5]:
# Model Design Components
first_neurons = [8,9]
activation =  ['relu','tanh']
kernel_initializer = ['uniform','he_uniform']
optimizer = ['Adam','SGD']


# Hyperparameters
epochs = [10]
batch_size = [1024]
dropout_rate = [0.0]

### 4.Creating Model

In [6]:


# Function to create model, required for KerasClassifier
def create_model(first_neuron=9,activation='relu', kernel_initializer='uniform',
                 dropout_rate=0,optimizer='Adam'):
  

    '''

    Input: Model params and Hyper Params to be tuned
    Output: Compiled model

    '''
    
    #1.Create sequential model

    model = Sequential()

    #2. Add the First Dense layer with params 
    #first_neuron,input_dim,kernel_initailizer,activation assigned values from actual function parameters
    
    model.add(Dense(first_neuron, input_dim=input_dim, kernel_initializer=kernel_initializer, activation=activation))
    

    #3. Add dropout to the with the value from actual function parameter dropout_rate

    model.add(Dropout(dropout_rate))


    #4. Add the Second Dense layer with params
    #Number of neurons =1
    #Kernel_initializer from function parameter
    #activation=sigmoid


    model.add(Dense(1,kernel_initializer=kernel_initializer, activation='sigmoid') )


    #5.Compile model with
    #loss='binary_crossentropy'
    #optimizer from function parameter
    #metrics=accuracy

    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'] )
    

    return model

### 5. Create a Keras Classifier

In [7]:
model = KerasClassifier(build_fn=create_model)


### 6. Hyperparameter Tuning 1 - Grid Search
1. Create a GridSearchCV model with parameters
    - Estimator
    - Param_grid
    - n_jobs=1
    - cv=3
    - verbose=2

2. Fit the model with x,y
  

In [8]:
#parameter grid
param_grid = dict(epochs=epochs, 
                  batch_size=batch_size, 
                  optimizer=optimizer,
                  dropout_rate=dropout_rate,
                  activation=activation,
                  kernel_initializer=kernel_initializer,
                  first_neuron=first_neurons)

In [9]:
#create GridSearchCv model
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1,cv=3,verbose=2)

#Fit the model and return the result
grid_result = grid.fit(x, y)

Fitting 3 folds for each of 16 candidates, totalling 48 fits
[CV] activation=relu, kernel_initializer=uniform, optimizer=Adam, epochs=10, batch_size=1024, first_neuron=8, dropout_rate=0.0 
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[CV]  activation=relu, kernel_initializer=uniform, optimizer=Adam, epochs=10, batch_size=1024, first_neuron=8, dropout_rate=0.0, total=   0.8s
[CV] activation=relu, kernel_initializer=uniform, optimizer=Adam, epochs=10, batch_size=1024, first_neuron=8, dropout_rate=0.0 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.8s remaining:    0.0s


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[CV]  activation=relu, kernel_initializer=uniform, optimizer=Adam, epochs=10, batch_size=1024, first_neuron=8, dropout_rate=0.0, total=   0.7s
[CV] activation=relu, kernel_initializer=uniform, optimizer=Adam, epochs=10, batch_size=1024, first_neuron=8, dropout_rate=0.0 
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[CV]  activation=relu, kernel_initializer=uniform, optimizer=Adam, epochs=10, batch_size=1024, first_neuron=8, dropout_rate=0.0, total=   0.9s
[CV] activation=relu, kernel_initializer=uniform, optimizer=SGD, epochs=10, batch_size=1024, first_neuron=8, dropout_rate=0.0 
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[CV]  activation=relu, kernel_initializer=uniform, optimizer=SGD, epochs=10, batch_size=1024, first_neuron=8, dropout_rate=0.0

[Parallel(n_jobs=1)]: Done  48 out of  48 | elapsed:   32.7s finished


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [10]:
#Print the Best Params
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

#Explore the others
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.929701 using {'activation': 'tanh', 'kernel_initializer': 'uniform', 'optimizer': 'Adam', 'epochs': 10, 'batch_size': 1024, 'first_neuron': 9, 'dropout_rate': 0.0}
0.906854 (0.053674) with: {'activation': 'relu', 'kernel_initializer': 'uniform', 'optimizer': 'Adam', 'epochs': 10, 'batch_size': 1024, 'first_neuron': 8, 'dropout_rate': 0.0}
0.641476 (0.126326) with: {'activation': 'relu', 'kernel_initializer': 'uniform', 'optimizer': 'SGD', 'epochs': 10, 'batch_size': 1024, 'first_neuron': 8, 'dropout_rate': 0.0}
0.699473 (0.132056) with: {'activation': 'relu', 'kernel_initializer': 'he_uniform', 'optimizer': 'Adam', 'epochs': 10, 'batch_size': 1024, 'first_neuron': 8, 'dropout_rate': 0.0}
0.527241 (0.100949) with: {'activation': 'relu', 'kernel_initializer': 'he_uniform', 'optimizer': 'SGD', 'epochs': 10, 'batch_size': 1024, 'first_neuron': 8, 'dropout_rate': 0.0}
0.868190 (0.079504) with: {'activation': 'relu', 'kernel_initializer': 'uniform', 'optimizer': 'Adam', 'epochs': 10,

### 7. Hyperparameter Tuning 1 - Randomized Search
1. Create a RandomizedSearchCV model with parameters
    - Estimator as model
    - param_dist
    - n_iter=8
    - n_jobs=1
    - cv=3
    - verbose=2

2. Fit the model with x,y
  

In [11]:
param_dist = dict(epochs=epochs, 
                  batch_size=batch_size, 
                  optimizer=optimizer,
                  dropout_rate=dropout_rate,
                  activation=activation,
                  kernel_initializer=kernel_initializer,
                  first_neuron=first_neurons)

In [12]:
#Create the randomsearccv model
random_search = RandomizedSearchCV(estimator=model,param_distributions=param_dist,n_iter=8,n_jobs=1,cv=3,verbose=2)

#Fit the model
random_search.fit(x,y)

Fitting 3 folds for each of 8 candidates, totalling 24 fits
[CV] activation=tanh, kernel_initializer=uniform, optimizer=SGD, epochs=10, batch_size=1024, first_neuron=9, dropout_rate=0.0 
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[CV]  activation=tanh, kernel_initializer=uniform, optimizer=SGD, epochs=10, batch_size=1024, first_neuron=9, dropout_rate=0.0, total=   0.5s
[CV] activation=tanh, kernel_initializer=uniform, optimizer=SGD, epochs=10, batch_size=1024, first_neuron=9, dropout_rate=0.0 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.5s remaining:    0.0s


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[CV]  activation=tanh, kernel_initializer=uniform, optimizer=SGD, epochs=10, batch_size=1024, first_neuron=9, dropout_rate=0.0, total=   0.6s
[CV] activation=tanh, kernel_initializer=uniform, optimizer=SGD, epochs=10, batch_size=1024, first_neuron=9, dropout_rate=0.0 
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[CV]  activation=tanh, kernel_initializer=uniform, optimizer=SGD, epochs=10, batch_size=1024, first_neuron=9, dropout_rate=0.0, total=   0.5s
[CV] activation=relu, kernel_initializer=he_uniform, optimizer=SGD, epochs=10, batch_size=1024, first_neuron=8, dropout_rate=0.0 
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[CV]  activation=relu, kernel_initializer=he_uniform, optimizer=SGD, epochs=10, batch_size=1024, first_neuron=8, dropout_rate=

[Parallel(n_jobs=1)]: Done  24 out of  24 | elapsed:   15.9s finished


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


RandomizedSearchCV(cv=3, error_score='raise',
          estimator=<keras.wrappers.scikit_learn.KerasClassifier object at 0x7f2b6a6b01d0>,
          fit_params=None, iid=True, n_iter=8, n_jobs=1,
          param_distributions={'activation': ['relu', 'tanh'], 'kernel_initializer': ['uniform', 'he_uniform'], 'optimizer': ['Adam', 'SGD'], 'epochs': [10], 'batch_size': [1024], 'first_neuron': [8, 9], 'dropout_rate': [0.0]},
          pre_dispatch='2*n_jobs', random_state=None, refit=True,
          return_train_score='warn', scoring=None, verbose=2)

In [13]:
# Print the best params 
print("Best: %f using %s" % (random_search.best_score_, random_search.best_params_))


#Explore the others
means = random_search.cv_results_['mean_test_score']
stds = random_search.cv_results_['std_test_score']
params = random_search.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.898067 using {'activation': 'tanh', 'kernel_initializer': 'uniform', 'optimizer': 'Adam', 'epochs': 10, 'batch_size': 1024, 'first_neuron': 8, 'dropout_rate': 0.0}
0.662566 (0.168912) with: {'activation': 'tanh', 'kernel_initializer': 'uniform', 'optimizer': 'SGD', 'epochs': 10, 'batch_size': 1024, 'first_neuron': 9, 'dropout_rate': 0.0}
0.514938 (0.221873) with: {'activation': 'relu', 'kernel_initializer': 'he_uniform', 'optimizer': 'SGD', 'epochs': 10, 'batch_size': 1024, 'first_neuron': 8, 'dropout_rate': 0.0}
0.766257 (0.098833) with: {'activation': 'tanh', 'kernel_initializer': 'he_uniform', 'optimizer': 'Adam', 'epochs': 10, 'batch_size': 1024, 'first_neuron': 8, 'dropout_rate': 0.0}
0.898067 (0.027806) with: {'activation': 'tanh', 'kernel_initializer': 'uniform', 'optimizer': 'Adam', 'epochs': 10, 'batch_size': 1024, 'first_neuron': 8, 'dropout_rate': 0.0}
0.692443 (0.027378) with: {'activation': 'relu', 'kernel_initializer': 'he_uniform', 'optimizer': 'Adam', 'epochs': 

### Save your answers by running the cell below

In [14]:
import pickle
with open('grid1.pkl', 'wb') as handle:
  pickle.dump(grid.param_grid, handle, protocol=pickle.HIGHEST_PROTOCOL)
with open('grid2.pkl', 'wb') as handle:
  pickle.dump(grid.n_jobs, handle, protocol=pickle.HIGHEST_PROTOCOL)
with open('grid3.pkl', 'wb') as handle:
  pickle.dump((grid.classes_).tolist(), handle, protocol=pickle.HIGHEST_PROTOCOL)

with open('ran1.pkl', 'wb') as handle:
  pickle.dump(random_search.param_distributions, handle, protocol=pickle.HIGHEST_PROTOCOL)
with open('ran2.pkl', 'wb') as handle:
  pickle.dump(random_search.n_iter, handle, protocol=pickle.HIGHEST_PROTOCOL)
with open('ran3.pkl', 'wb') as handle:
  pickle.dump(random_search.n_splits_, handle, protocol=pickle.HIGHEST_PROTOCOL)

save_model=create_model()
save_model.save('model.h5')


Don't stop! your learning ! Tune more to explore more.

1. Tune the activations with other values like 'sigmoid','hard_sigmoid','linear',etc.


2. Tune the Kernel initializers with values like normal and zero


3. Tune the Optimizers with RMSprop, Adamax etc

