# Applied Machine Learning
## Grid search to determine the best training parameters
- Author: Lorien Pratt
- Copyright: Quantellia LLC 2019.  All Rights Reserved

Grid search does many model runs to find which one produces the best result after a few epochs (assuming that this is a good proxy for the final learning performance, which may or may not be true).

Grid search explores multiple network architectures (number of layers, number of hidden units per layer) and other learning parameters. 

## Setup

In [1]:
# Set up to be able to invoke R from inside this Python 2 notebook
#%load_ext rpy2.ipython
#import rpy2.rinterface

##### Install and initialize the H2O library, which we will use to do the grid search
Note that this will generate a lot of warnings. These are expected, and not errors but rather notifications

In [2]:
require(h2o)
h2o.init()
h2o.no_progress() # Turns off progress bars, which don't display well in Jupyter

Loading required package: h2o

----------------------------------------------------------------------

Your next step is to start H2O:
    > h2o.init()

For H2O package documentation, ask for help:
    > ??h2o

After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit http://docs.h2o.ai

----------------------------------------------------------------------


Attaching package: ‘h2o’

The following objects are masked from ‘package:stats’:

    cor, sd, var

The following objects are masked from ‘package:base’:

    &&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames,
    colnames<-, ifelse, is.character, is.factor, is.numeric, log,
    log10, log1p, log2, round, signif, trunc



 Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         22 hours 21 minutes 
    H2O cluster timezone:       Etc/UTC 
    H2O data parsing timezone:  UTC 
    H2O cluster version:        3.26.0.10 
    H2O cluster version age:    6 days  
    H2O cluster name:           H2O_started_from_R_jupyter_mcy252 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   0.36 GB 
    H2O cluster total cores:    1 
    H2O cluster allowed cores:  1 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    H2O API Extensions:         Amazon S3, XGBoost, Algos, AutoML, Core V3, TargetEncoder, Core V4 
    R Version:                  R version 3.6.1 (2019-07-05) 



Set up my initials for file names

In [3]:
my_initials<-"nm"

Read in the test and training files that we created in the Prepare Data lesson, and convert them to h2o's internal "hex" format

In [4]:
train_filename<-paste0("data/",my_initials,"_train_auto.csv"); print( train_filename )
test_filename<-paste0("data/",my_initials,"_test_auto.csv"); print( test_filename )
backtest_filename<-paste0("data/",my_initials,"_backtest_auto.csv"); print( backtest_filename )

[1] "data/nm_train_auto.csv"
[1] "data/nm_test_auto.csv"
[1] "data/nm_backtest_auto.csv"


Read in the test and training files you created in the previous step. Convert them to h2o files along the way.

In [5]:
train_hex <- h2o.importFile(train_filename, parse = TRUE, header = TRUE, 
                            sep = "", col.names = NULL, col.types = NULL, na.strings = NULL)
test_hex <- h2o.importFile(test_filename, parse = TRUE, header = TRUE, 
                           sep = "", col.names = NULL, col.types = NULL, na.strings = NULL)

Tell the grid search which of the columns are predictors.  First, let's look at the top of the dataset again to remind us of the structure...

In [6]:
head(train_hex)

car.name,cylinders,displacement,horsepower,weight,acceleration,model.year,origin,mpg
<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
ford country squire (sw),8,351,142,4054,14.3,79,1,15.5
plymouth duster,6,198,95,3102,16.5,74,1,20.0
saab	99le,4,121,115,2671,13.5,75,2,25.0
amc matador (sw),8,304,150,3892,12.5,72,1,15.0
pontiac phoenix,4,151,90,2735,18.0,82,1,27.0
ford fairmont,4,140,88,2870,18.1,80,1,26.4


Set the predictor columns and chec that they're the right ones

In [7]:
predictors <- c(2,3,4,5,6,7,8)
names(train_hex)[predictors]

Tell the model training which of the columns is the target column (in this case, the very last column, mpg)

In [8]:
targetcol<-ncol(train_hex)

Tell the grid search which of the columns are predictors.  First, let's loo at the top of the dataset again to remind us of the structure...

Create a set of grid search *hyperparameters* .  These are the alternative structures we'll try to see which one
creates the best results after running the speciied number of epochs

In [9]:
hyper_params <- list(
    hidden=list(1, 5, 10, c(5,5), c(10,10,10)),
    l1=c(0, .01, .00001),
    l2=c(0, .01, 0.001, .00001),
    input_dropout_ratio=c(0, .01, .0001),
    epochs=c(100)
)

Run the grid test with these parameters.  This can take a little while, during which there will be no feedback.

In [10]:
grid_result <- h2o.grid(
    algorithm="deeplearning",
    x=predictors,
    y=targetcol,
    grid_id="grid_1", # Can't be reused; consider incrementing on subsequent runs. TBD: try kernel restart for this instead
    training_frame=train_hex,
    validation_frame=test_hex,
    quiet_mode=FALSE,
    export_weights_and_biases=TRUE,
    activation="Tanh",
    autoencoder=FALSE,
    ignore_const_cols=FALSE,
    standardize=FALSE,
    train_samples_per_iteration=0,
    adaptive_rate=FALSE, # Manaully tuned learning rate
    classification_stop = -1, # Dispable automatic stopping
    regression_stop = -1, # Disable automatic stopping
    stopping_rounds = 0, # Don't stop automatically
    hyper_params = hyper_params

)


ERROR: Unexpected HTTP Status code: 412 Precondition Failed (url = http://localhost:54321/99/Grid/deeplearning)

water.exceptions.H2OIllegalArgumentException
 [1] "water.exceptions.H2OIllegalArgumentException: Illegal argument: training_frame of function: grid: Cannot append new models to a grid with different training input"
 [2] "    hex.grid.GridSearch.start(GridSearch.java:106)"                                                                                                                
 [3] "    hex.grid.GridSearch.startGridSearch(GridSearch.java:447)"                                                                                                      
 [4] "    hex.grid.GridSearch.startGridSearch(GridSearch.java:389)"                                                                                                      
 [5] "    water.api.GridSearchHandler.handle(GridSearchHandler.java:103)"                                                                                        

ERROR: Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page, : 

ERROR MESSAGE:

Illegal argument: training_frame of function: grid: Cannot append new models to a grid with different training input




Display the grid search results

In [None]:
h2o.getGrid("grid_1", sort_by="mse", decreasing=FALSE)