# Grid Search
Code largely sourced from [https://github.com/rasbt/python-machine-learning-book-3rd-edition/blob/master/ch06/ch06.ipynb](https://github.com/rasbt/python-machine-learning-book-3rd-edition/blob/master/ch06/ch06.ipynb) and [https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)

GridSearchCV is a brute force way to test the product of many hyperparameter ranges. For example if you were choose the values [1, 2, 3] for some hyperparameter A and ["x", "y", "z"] for some hyperparameter B, GridSearchCSV will create combinations of each of those parameters (ex: (1, "x"), (1, "y), ... , (3, "y"), (3, "z")) for a total of 9 different parameter pairs, that it will then train/score.

In [1]:
from sklearn.model_selection import GridSearchCV

## Get/Create Some Data

In [2]:
from sklearn.model_selection import train_test_split
import ck_helpers.example_data as ckdata

(X, Y) = ckdata.AND(20, random_state=42)

(X_train, X_test, Y_train, Y_test) = train_test_split(X, Y, train_size=0.75, random_state=42)

## Create A Range(s) Of Hyperparameters And An Estimator

In [3]:
from sklearn.tree import DecisionTreeClassifier

parameter_ranges = {
    'criterion': ['entropy', 'gini'],
    'max_depth': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'ccp_alpha': [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
}

estimator = DecisionTreeClassifier(random_state=42)

## Construct The GridSearchCV

NOTE: There are many parameters/hyperparameters which can be found at [https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)

In [4]:
gs = GridSearchCV(param_grid=parameter_ranges,
                  estimator=estimator,
                  scoring='accuracy', # The scoring methodology can be configured (one, many, or your own scoring), giving the algorithm better ways to evaluate its performance (ex: using 'f1' for imbalanced data. SEE documentation for a better explanation, and see what is available).
                  refit=True,         # Once the best tuple of hyperparameters is found, refit the estimator with those hyperparameters. This enables using the GridSearchCV directly after fitting.
                  n_jobs=-1)          # Number of processes to run in parallel. -1 means use all available processors.

## Train The GridSearchCV (call "fit")

In [5]:
gs.fit(X_train, Y_train)

GridSearchCV(estimator=DecisionTreeClassifier(random_state=42), n_jobs=-1,
             param_grid={'ccp_alpha': [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7,
                                       0.8, 0.9, 1.0],
                         'criterion': ['entropy', 'gini'],
                         'max_depth': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]},
             scoring='accuracy')

## Score/Use The Best Estimator Or Hyperparameters

In [6]:
display( gs.best_score_ )
display( gs.best_estimator_ )
display( gs.best_params_ )

1.0

DecisionTreeClassifier(criterion='entropy', max_depth=2, random_state=42)

{'ccp_alpha': 0.0, 'criterion': 'entropy', 'max_depth': 2}

## Or Score/Use The GridSearchCV (call "score" and "predict" respectively)

In [7]:
print( "Score: ", gs.score(X_test, Y_test) )

print( "Predict (1 && 1) = ", gs.predict([[1, 1]]) )
print( "Predict (1 && 0) = ", gs.predict([[1, 0]]) )
print( "Predict (0 && 1) = ", gs.predict([[0, 1]]) )
print( "Predict (0 && 0) = ", gs.predict([[0, 0]]) )

# NOTE: gs.predict takes an array of inputs, so it can be used like so
#       gs.predict([[0,0], [0, 1], [1, 0], [1, 1]])

Score:  1.0
Predict (1 && 1) =  [1]
Predict (1 && 0) =  [0]
Predict (0 && 1) =  [0]
Predict (0 && 0) =  [0]
