<center><h2>Grid Search Redux</h2></center>

By The End Of This Session You Should Be Able To:
----

- Define the "double loop" optimization problem in Machine Learning
- List parameter search methods
- Conduct human-assisted Grid Search

Training Machine Learning
-------

$$Features + Algorithm + Hyperparameters = Model Parameters$$

ML is "double loop" optimization
------

- Outer loop is Hyperparameter search
- Inner loop is Model Parameter search

Model Parameter & Hyperparameter search
-------

You can search for algorithm hyperparameters the same why can search for model parameters (it is just much slower).

Methods for Parameter Search
-----

- Closed form (e.g., OLS)
- 1<sup>st</sup> order methods (e.g., SGD)
- 2<sup>nd</sup>order methods (e.g., Newton's Method)

- Manual Search
- Grid Search


- Random
- Bayesian Optimization

Learn more:

- https://ipvs.informatik.uni-stuttgart.de/mlr/marc/teaching/13-Optimization/04-secondOrderOpt.pdf
- https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/
- Random Search for Hyper-Parameter Optimization: http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf

Manual Search
-----

aka, GSD: Graduate Student Descent

- 100% manual, Trial & (mostly) Error

- The most common approach by researchers, students, and hobbyists.

Grid Search, aka Just try everything!
------

1. Define a grid with n dimensions, the number of parameters.
1. For each dimension within the grid, define the range of possible values.
1. Step through each combination. 
1. At the end, choose the best combination of parameters measured on cross-validation dataset.

Cartesian Product
------

<center><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/4/4e/Cartesian_Product_qtl1.svg/1200px-Cartesian_Product_qtl1.svg.png" width="45%"/></center>

The product of two sets: the product of set X and set Y is the set that contains all ordered pairs ( x, y ) for which x belongs to X and y belongs to Y.

What is the "Curse of dimensionality"?
-------

The more values and the mote dimensions we want to explore, search time will increase.

Typically, it is computationally intractable to search all relevant values and dimension. 

Source: https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 

"Curse of dimensionality" example
-------

In [2]:
reset -fs

In [3]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import sklearn

import warnings
warnings.filterwarnings('ignore')

palette = "Dark2"
%matplotlib inline

In [4]:
values_for_each_dim = 2 # 2 or 3 or 4

print(f"{'# of Dims':^10} {'# of Points':>10}")
for n_dimensions in range(1, 10):
    total_values = values_for_each_dim**n_dimensions
    print(f"{n_dimensions:^10} {total_values:>10,}")

# of Dims  # of Points
    1               2
    2               4
    3               8
    4              16
    5              32
    6              64
    7             128
    8             256
    9             512


Enumerating Grid Search
-----

1. Groups
1. Numeric

Groups
-------
    
Random Forest:

- max_features: ['auto', 'sqrt', 'log2']
- criterion: ['gini', 'entropy']

Numeric
-------
    
Random Forest:

- max_depth
- min_samples_split
- max_leaf_nodes

Select a finite set of "reasonable" values 

In [35]:
# For integers
pure_python = range(2, 10, 2)
numpy = np.arange(start=2, stop=10, step=2)
print(list(pure_python))
print(numpy)

[2, 4, 6, 8]
[2 4 6 8]


In [6]:
# For non-integers
np.linspace(start=3.5, stop=7, num=3)

array([3.5 , 5.25, 7.  ])

Check for understanding
-----

What happens if you select an "unreasonable" set of values?

<center><img src="https://i.imgflip.com/2ni2bl.jpg" width="45%"/></center>

Grid Search Example: Combination of Group & Numeric
----

In [36]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(load_iris().data, load_iris().target, random_state=42)

In [37]:
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(n_jobs=-1, random_state=42) 

In [38]:
from sklearn.model_selection import GridSearchCV

param_grid = { 
    'n_estimators': np.arange(start=10, stop=20, step=2),
    'max_features': ['auto', 'sqrt', 'log2']
}

rf_cv = GridSearchCV(estimator=rf, 
                      param_grid=param_grid, 
                      cv=5)
rf_cv.fit(X_train, y_train)
print(rf_cv.best_params_)

{'max_features': 'auto', 'n_estimators': 18}


<center><h2>Model fitting is an iterative process. <br> Best done with human working with computer. </h2></center>

Check for understanding
-----

Why is it difficult to tune hyperparameters?

Summary
-----

- Manual search is a human guessing for best parameters
- Grid search is a computer manually searching for best parameters
- The best strategy is HBCB: Human Best, Computer Best

<br>
