# GridSearch and RandomSearch for Optimizing Model HyperParameters
Two of the easiest way to look for better parameters is to use a GridSearch or RandomSearch that goes through a grid(dictionary) of values and tries those combined with cross-validation. In regression, we do not need to worry too much about the splits, for a classifier the splits should usually take the classes into account.
The Random Search doesn't go through all the list values rather tries some specified number (n_iter) of random combinations. 

The default values of hyperparameters are good enough for certain tested problems in academia. However these hyperparameters are not ideal for all problems, in fact, it is unlikely that default values are the best one.

In Artificial Neural Networks we have several hyperparameters, to name a few, Dense layer connections, Convolution kernel sizes and channels, activation functions, the number of layers or models, skip layers, dropout layers normalization, optimizers, learning rates, epochs, batch sizes, early stoppings. Just these parameters themselves would make for a quite exponentially large number of combinations. We could easily have 10^10 combinations, each with cross-validations and millions of computations. 

In [1]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split
import numpy as np
np.random.seed(42)

dataset = pd.read_csv("insurance.csv") #read the dataset

features = dataset.iloc[:,0:6] #choose first 7 columns as features
print(features.columns)
labels = dataset.iloc[:,-1] #choose the final column for prediction

features = pd.get_dummies(features) #one hot encoding for categorical variables
# Expands to 11 columns, could also be cut to 9.
features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.33, random_state=42)

#standardize
ct = ColumnTransformer([('standardize', StandardScaler(), ['age', 'bmi', 'children'])], remainder='passthrough')
scaled_features_train = ct.fit_transform(features_train) #gives numpy arrays
scaled_features_test = ct.transform(features_test) #gives numpy arrays

Index(['age', 'sex', 'bmi', 'children', 'smoker', 'region'], dtype='object')


In [2]:
print("Number of features:", features.shape[1]) 
#print the number of samples in the dataset
print("Number of samples: ", features.shape[0]) 
#summary statistics for numeric features

Number of features: 11
Number of samples:  1338


In [4]:
import warnings
warnings.filterwarnings('ignore')
import tensorflow as tf
from tensorflow.keras import layers
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from sklearn.preprocessing import Normalizer
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint as sp_randint
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.metrics import mean_squared_error
from sklearn.metrics import make_scorer

In [5]:
def design_model():
    model = Sequential(name="my_model")
    input = tf.keras.Input(shape=(11,)) # Cannot use input_shape with Cross Validation
    model.add(input)
    model.add(layers.Dense(11, activation = 'relu'))
    model.add(layers.Dense(1))
    opt = tf.keras.optimizers.Adam(learning_rate = 0.01)
    model.compile(loss='mse', metrics=['mae'], optimizer=opt)
    return model

In [6]:
def do_grid_search():
  batch_size = [6, 64]
  epochs = [10, 50]
  model = KerasRegressor(build_fn=design_model)
  param_grid = dict(batch_size=batch_size, epochs=epochs) # Dict 
  grid = GridSearchCV(estimator = model, param_grid=param_grid, scoring = make_scorer(mean_squared_error, greater_is_better=False),return_train_score = True, cv=5)
  # An object/Model for testing out models.

  grid_result = grid.fit(features_train, labels_train, verbose = 0) # fitting that model.
  
  return grid_result # Returning the fitted Grid Object, Not the Grid Object itself.

In [7]:
def do_randomized_search():
  param_grid = {'batch_size': sp_randint(2, 16), 'nb_epoch': sp_randint(10, 100)}
  model = KerasRegressor(build_fn=design_model)
  grid = RandomizedSearchCV(estimator = model, param_distributions=param_grid, 
                            scoring = make_scorer(mean_squared_error, greater_is_better=False), 
                            n_iter = 12, n_jobs=-1) # njobs -1 uses all processors if one has more to spare.
  '''In contrast to GridSearchCV, not all parameter values are tried out, but rather a fixed number of parameter settings is sampled from the specified distributions. 
  The number of parameter settings that are tried is given by n_iter.'''
  random_search = grid.fit(features_train, labels_train, verbose = 0)

  return random_search

In [8]:
# Best models are stored in the GridSearch object.

In [9]:
grid_search = do_grid_search()
print("-------------- GRID SEARCH COMPLETED--------------------")

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
-------------- GRID SEARCH COMPLETED--------------------


In [10]:
random_search = do_randomized_search()
print("-------------- RANDOMIZED SEARCH COMPLETED--------------------")

-------------- RANDOMIZED SEARCH COMPLETED--------------------


In [11]:
#df approach
#Single trailing underscore naming convention is used to avoid conflicts with Python keywords.
grid = pd.DataFrame({'Means':grid_search.cv_results_['mean_test_score'],
            'Standard Dev': grid_search.cv_results_['std_test_score'],
            'Params': grid_search.cv_results_['params']})

grid.head()

Unnamed: 0,Means,Standard Dev,Params
0,-128252500.0,14932410.0,"{'batch_size': 6, 'epochs': 10}"
1,-89816670.0,17224150.0,"{'batch_size': 6, 'epochs': 50}"
2,-299847200.0,31791050.0,"{'batch_size': 64, 'epochs': 10}"
3,-142857000.0,12060570.0,"{'batch_size': 64, 'epochs': 50}"


In [12]:
random = pd.DataFrame({'Mean':random_search.cv_results_['mean_test_score'],
                      'Standard Dev': random_search.cv_results_['std_test_score'],
                       'Parameters': random_search.cv_results_['params']
                      })
random.head()

Unnamed: 0,Mean,Standard Dev,Parameters
0,-302343300.0,35232820.0,"{'batch_size': 6, 'nb_epoch': 39}"
1,-294847400.0,25172440.0,"{'batch_size': 6, 'nb_epoch': 91}"
2,-244748300.0,28438430.0,"{'batch_size': 3, 'nb_epoch': 45}"
3,-303551000.0,34373260.0,"{'batch_size': 6, 'nb_epoch': 38}"
4,-318546800.0,31743410.0,"{'batch_size': 13, 'nb_epoch': 85}"


# Conclusions
There is an almost infinite number of iterations we can run on our models for optimizations. 
I have developed or am in the progress of developing three methods for approaching this problem. 
Method number one can be read about in the computational reducibility tensorflow_3 project.

Another method involves moving the continuous calculations off the table and instead of using linear transformations on the inputs on each layer. That is, with enough data, we do not need to compute 1 + 1 = 2. We could instead access the key '1+1' and, it would return 2. I think this will be key for neural networks and robotics in the future.
Let's say we have a matrix x,y,z, and activation ReLU. This non-linear setup does not lend itself to a linear transformation. 
And we always have to do the calculation, right? - Not necessarily.
Let's say we have already done this calculation. We can then replace Input1*Weight1.... + bias and use 'Key':Value analogy of 'matrix + activation': Output.
So it is a bit difficult if the values are continuous. So we could sort of approximate, it would be (x,y,z)-esq keys and the more computations that over time were made and stored the hashtable the better it would become. 
If we could accept some error in the calculations we could move from computation to look-up.

The third is in undefined space so far.

Optimization is a tough nut to crack. One almost has to use some more simple heuristics about it at some point. Lucky perhaps that there is some Generalized error that cannot be reduced further and being close enough to that error will do.