# Grid search
  
This chapter introduces you to a popular automated hyperparameter tuning methodology called Grid Search. You will learn what it is, how it works and practice undertaking a Grid Search using Scikit-Learn. You will then learn how to analyze the output of a Grid Search & gain practical experience doing this.

## Resources
  
**Notebook Syntax**
  
<span style='color:#7393B3'>NOTE:</span>  
- Denotes additional information deemed to be *contextually* important
- Colored in blue, HEX #7393B3
  
<span style='color:#E74C3C'>WARNING:</span>  
- Significant information that is *functionally* critical  
- Colored in red, HEX #E74C3C
  
---
  
**Links**
  
[NumPy Documentation](https://numpy.org/doc/stable/user/index.html#user)  
[Pandas Documentation](https://pandas.pydata.org/docs/user_guide/index.html#user-guide)  
[Matplotlib Documentation](https://matplotlib.org/stable/index.html)  
[Seaborn Documentation](https://seaborn.pydata.org)  
[Scikit Learn Documentation](https://scikit-learn.org/stable/)  
  
---
  
**Notable Functions**
  
<table>
  <tr>
    <th>Index</th>
    <th>Operator</th>
    <th>Use</th>
  </tr>
  <tr>
    <td>1</td>
    <td>pandas.get_dummies</td>
    <td>A function from the Pandas library used to perform one-hot encoding on categorical data, converting categorical variables into a binary matrix representation for machine learning.</td>
  </tr>
  <tr>
    <td>2</td>
    <td>sklearn.model_selection.train_test_split</td>
    <td>A function from scikit-learn used to split a dataset into training and testing subsets, enabling the assessment of machine learning models' performance on unseen data.</td>
  </tr>
  <tr>
    <td>3</td>
    <td>sklearn.linear_model.LogisticRegression</td>
    <td>A class from scikit-learn representing a logistic regression classifier used for binary classification tasks, modeling the probability of a binary outcome.</td>
  </tr>
  <tr>
    <td>4</td>
    <td>sklearn.ensemble.RandomForestClassifier</td>
    <td>A class from scikit-learn representing a random forest classifier, an ensemble learning method that combines multiple decision trees for improved predictive accuracy and generalization.</td>
  </tr>
  <tr>
    <td>5</td>
    <td>sklearn.metrics.confusion_matrix</td>
    <td>A function from scikit-learn used to compute a confusion matrix, providing insights into the performance of a classification model by detailing true positive, true negative, false positive, and false negative predictions.</td>
  </tr>
  <tr>
    <td>6</td>
    <td>sklearn.metrics.accuracy_score</td>
    <td>A function from scikit-learn used to calculate the accuracy score, a common metric to evaluate the correctness of classification predictions by comparing them to the actual labels.</td>
  </tr>
  <tr>
    <td>7</td>
    <td>sklearn.neighbors.KNeighborsClassifier</td>
    <td>A class from scikit-learn representing a k-nearest neighbors classifier, a machine learning algorithm that assigns a label to a data point based on the majority class among its k-nearest neighbors.</td>
  </tr>
  <tr>
    <td>8</td>
    <td>sklearn.ensemble.GradientBoostingClassifier</td>
    <td>A class from scikit-learn representing a gradient boosting classifier, an ensemble method that builds a predictive model through the iterative combination of weak learners, usually decision trees.</td>
  </tr>
  <tr>
    <td>9</td>
    <td>numpy.linspace</td>
    <td>A function from NumPy used to create an array of evenly spaced values over a specified range, facilitating the generation of input data for mathematical operations and visualization.</td>
  </tr>
  <tr>
    <td>10</td>
    <td>plt.gca()</td>
    <td>A function from Matplotlib used to get the current Axes instance within a plot, allowing for customization and fine-tuning of plot elements.</td>
  </tr>
  <tr>
    <td>11</td>
    <td>sklearn.model_selection.GridSearchCV</td>
    <td>A function from scikit-learn used for hyperparameter tuning through exhaustive search over a specified parameter grid, optimizing model performance.</td>
  </tr>
  <tr>
    <td>12</td>
    <td>sklearn.metrics.roc_auc_score</td>
    <td>A function from scikit-learn used to compute the area under the Receiver Operating Characteristic (ROC) curve, providing a measure of a classification model's ability to distinguish between classes.</td>
  </tr>
</table>

  
---
  
**Language and Library Information**  
  
Python 3.11.0  
  
Name: numpy  
Version: 1.24.3  
Summary: Fundamental package for array computing in Python  
  
Name: pandas  
Version: 2.0.3  
Summary: Powerful data structures for data analysis, time series, and statistics  
  
Name: matplotlib  
Version: 3.7.2  
Summary: Python plotting package  
  
Name: seaborn  
Version: 0.12.2  
Summary: Statistical data visualization  
  
Name: scikit-learn  
Version: 1.3.0  
Summary: A set of python modules for machine learning and data mining  
  
---
  
**Miscellaneous Notes**
  
<span style='color:#7393B3'>NOTE:</span>  
  
`python3.11 -m IPython` : Runs python3.11 interactive jupyter notebook in terminal.
  
`nohup ./relo_csv_D2S.sh > ./output/relo_csv_D2S.log &` : Runs csv data pipeline in headless log.  
  
`print(inspect.getsourcelines(test))` : Get self-defined function schema  
  
<span style='color:#7393B3'>NOTE:</span>  
  
Snippet to plot all built-in matplotlib styles :
  
```python

x = np.arange(-2, 8, .1)
y = 0.1 * x ** 3 - x ** 2 + 3 * x + 2
fig = plt.figure(dpi=100, figsize=(10, 20), tight_layout=True)
available = ['default'] + plt.style.available
for i, style in enumerate(available):
    with plt.style.context(style):
        ax = fig.add_subplot(10, 3, i + 1)
        ax.plot(x, y)
    ax.set_title(style)
```
  

In [4]:
import numpy as np                  # Numerical Python:         Arrays and linear algebra
import pandas as pd                 # Panel Datasets:           Dataset manipulation
import matplotlib.pyplot as plt     # MATLAB Plotting Library:  Visualizations
import seaborn as sns               # Seaborn:                  Visualizations
from pprint import pprint           # Pretty print              Advanced print options


# Setting a standard figure size
plt.rcParams['figure.figsize'] = (8, 8)

# Set the maximum number of columns to be displayed
pd.set_option('display.max_columns', 50)

## Introducing Grid Search
  
In this section we will look at extending our work on automatic hyperparameter tuning and learn what a Grid Search is. Let's get started!
  
**Automating 2 Hyperparameters**
  
Let's remind ourselves of your previous work using a for loop to test different values of the number of neighbors in a KNN algorithm. We then collated those into a DataFrame to analyze. For this section we are working with a reduced dataset so you may see slightly different results.
  
But what if we want to test different values of 2 hyperparameters? Let us take the example of a GBM algorithm, which has a few more hyperparameters to tune than KNN or Random Forest algorithms. Let's' say we want to tune the two hyperparameters and values as follows. How would you do that? One suggestion could be a nested loop.
  
<center><img src='../_images/introducing-grid-search1.png' alt='img' width='740'></center>
  
We can first write nicer code by having the model creation component as a function. We feed in the two hyperparameter values as arguments and use these to create a model Then we fit to our data and generate predictions. Finally we return the hyperparameter values used and the score in a list for analysis.
  
<center><img src='../_images/introducing-grid-search2.png' alt='img' width='740'></center>
  
Now we can loop through and call our function, appending our results to a list as we go. We have a nested loop so we test all values of our first hyperparameter for all values of our second hyperparameter.
  
<center><img src='../_images/introducing-grid-search3.png' alt='img' width='740'></center>
  
We can save these results into a DataFrame as well And then print it out to view.
  
<center><img src='../_images/introducing-grid-search4.png' alt='img' width='740'></center>
  
**How many models?**
  
You will notice that many more models are built when adding more hyperparameters and values to test. Importantly, this relationship between models created and hyperparameters or values to test is not a linear relationship. For each of the values tested for the first hyperparameter, you test every value of the second hyperparameter. This means to test 5 values for the first hyperparameter and 10 values for the second hyperparameter, we have 50 models to run. And what if we k-fold cross-validated each model 10 times? That would be 500 models to run!
  
<center><img src='../_images/introducing-grid-search5.png' alt='img' width='740'></center>
  
**From 2 to N hyperparameters**
  
That was just for 2 hyperparameters. What if we wanted to test a third or fourth hyperparameter? We could nest again (and again) We first list the extra things to test.
  
<center><img src='../_images/introducing-grid-search6.png' alt='img' width='740'></center>
  
Then we adjust our function to take in more inputs. Notice how our function has a more complex model build but is very similar to what we did before?
  
<center><img src='../_images/introducing-grid-search7.png' alt='img' width='740'></center>
  
Finally, we can adjust our for loop to add the extra level of nesting. This code will also look familiar, we are just adding more levels of nesting but still saving out our results for analysis.
  
<center><img src='../_images/introducing-grid-search8.png' alt='img' width='740'></center>
  
So how many models did we just create? Testing 7 values for our first hyperparameter, and the listed number for the other hyperparameters, we can see this number has greatly increased. Safe to say we cannot keep nesting forever as our code becomes complex and inefficient. Plus, what if we also wanted some extra information on training and testing times and scores. Our code will get quite complex.
  
**Introducing Grid Search**
  
Let's review our work in an alternate way. If we created a grid with each value of `max_depth=` that we want to test down the left and each value of `learning_rate=` across the top. The intersection square of each of these is a model that we need to run.
  
Running a model for every cell in the grid with the hyperparameters specified is known as a Grid Search. For example, the mentioned cell here is equivalent to creating a gradient boosting estimator with these inputs.
  
<center><img src='../_images/introducing-grid-search9.png' alt='img' width='740'></center>
  
**Grid Search Pros & Cons**
  
Grid search has a number of advantages. It's programmatic, and saves many lines of code. It is guaranteed to find the best model within the grid you specify. But if you specify a poor grid with silly or conflicting values you won't get a good score! Finally, it is an easy methodology to explain compared to some of the more complex ones we will cover later in the course.
  
**Advantages of Grid Search**
  
- You don't have to write thousands of lines of code
- Finds the best model within the grid (only within the defined grid)
- Easy to explain

However there are some disadvantages to this approach. It is computationally expensive. It is also 'uninformed' because it doesn't learn as it creates models the next model it creates could be better or worse. There are 'informed' methods that get better as they build more and more models and we will see those later in the course.
  
**Disadvantages of Grid Search**
  
- Computationally expensive
- It is 'uninformed'. Results of one model don't help create the next model.
- User defined
  


### Build Grid Search functions
  
In data science it is a great idea to try building algorithms, models and processes 'from scratch' so you can really understand what is happening at a deeper level. Of course there are great packages and libraries for this work (and we will get to that very soon!) but building from scratch will give you a great edge in your data science work.
  
In this exercise, you will create a function to take in 2 hyperparameters, build models and return results. You will use this function in a future exercise.
  
You will have available the `X_train`, `X_test`, `y_train` and `y_test` datasets available.
  
1. Build a function that takes two parameters called `learning_rate=` and `max_depth=` for the learning rate and maximum depth.
2. Add capability in the function to build a GBM model and fit it to the data with the input hyperparameters.
3. Have the function return the results of that model and the chosen hyperparameters (`learning_rate=` and `max_depth=`).

In [5]:
from sklearn.model_selection import train_test_split

# import
credit_card = pd.read_csv('../_datasets/credit-card-full.csv')
print(credit_card.shape)

# To change categorical variable with dummy variables
credit_card = pd.get_dummies(credit_card, columns=['SEX', 'EDUCATION', 'MARRIAGE'], drop_first=True, dtype=int)
print(credit_card.shape)

# X/y split
X = credit_card.drop(['ID', 'default payment next month'], axis=1)
y = credit_card['default payment next month']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, shuffle=True)

credit_card.head()

(30000, 25)
(30000, 32)


Unnamed: 0,ID,LIMIT_BAL,AGE,PAY_0,PAY_2,PAY_3,PAY_4,PAY_5,PAY_6,BILL_AMT1,BILL_AMT2,BILL_AMT3,BILL_AMT4,BILL_AMT5,BILL_AMT6,PAY_AMT1,PAY_AMT2,PAY_AMT3,PAY_AMT4,PAY_AMT5,PAY_AMT6,default payment next month,SEX_2,EDUCATION_1,EDUCATION_2,EDUCATION_3,EDUCATION_4,EDUCATION_5,EDUCATION_6,MARRIAGE_1,MARRIAGE_2,MARRIAGE_3
0,1,20000,24,2,2,-1,-1,-2,-2,3913,3102,689,0,0,0,0,689,0,0,0,0,1,1,0,1,0,0,0,0,1,0,0
1,2,120000,26,-1,2,0,0,0,2,2682,1725,2682,3272,3455,3261,0,1000,1000,1000,0,2000,1,1,0,1,0,0,0,0,0,1,0
2,3,90000,34,0,0,0,0,0,0,29239,14027,13559,14331,14948,15549,1518,1500,1000,1000,1000,5000,0,1,0,1,0,0,0,0,0,1,0
3,4,50000,37,0,0,0,0,0,0,46990,48233,49291,28314,28959,29547,2000,2019,1200,1100,1069,1000,0,1,0,1,0,0,0,0,1,0,0
4,5,50000,57,-1,0,-1,0,0,0,8617,5670,35835,20940,19146,19131,2000,36681,10000,9000,689,679,0,0,0,1,0,0,0,0,1,0,0


In [6]:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score

# Create the function
def gbm_grid_search(learn_rate, max_depth):
    # Create the model
    model = GradientBoostingClassifier(learning_rate=learn_rate, max_depth=max_depth)
    
    # Use the model to make predictions
    predictions = model.fit(X_train, y_train).predict(X_test)
    
    # Return the hyperparameters and score
    return ([learn_rate, max_depth, accuracy_score(y_test, predictions)])


Nice! You now have a function you can call to test different combinations of two hyperparameters for the GBM algorithm. In the next exercise we will use it to test some values and analyze the results.

### Iteratively tune multiple hyperparameters
  
In this exercise, you will build on the function you previously created to take in 2 hyperparameters, build a model and return the results. You will now use that to loop through some values and then extend this function and loop with another hyperparameter.
  
The function `gbm_grid_search(learn_rate, max_depth)` is available in this exercise.
  
If you need to remind yourself of the function you can run the function `print_func()` that has been created for you.

```python
def print_func():
    lines = inspect.getsource(gbm_grid_search)
    print(lines)
```
  
1. Write a for-loop to test the values (0.01, 0.1, 0.5) for the `learning_rate=` and (2, 4, 6) for the `max_depth=` using the function you created `gbm_grid_search` and print the results.
2. Extend the `gbm_grid_search` function to include the hyperparameter `subsample`. Name this new function `gbm_grid_search_extended`.
3. Extend your loop to call `gbm_grid_search` (available in your console), then test the values [0.4 , 0.6] for the `subsample` hyperparameter and print the results. `max_depth_list` & `learn_rate_list` are available in your environment.

In [7]:
# Create the relevant lists
results_list = []
learn_rate_list = [0.01, 0.1, 0.5]
max_depth_list = [2, 4, 6]

# Create the for loop
for learn_rate in learn_rate_list:
    for max_depth in max_depth_list:
        results_list.append(gbm_grid_search(learn_rate, max_depth))
        
# Print the results
pprint(results_list)

[[0.01, 2, 0.8203333333333334],
 [0.01, 4, 0.8206666666666667],
 [0.01, 6, 0.8166666666666667],
 [0.1, 2, 0.8237777777777778],
 [0.1, 4, 0.822],
 [0.1, 6, 0.8205555555555556],
 [0.5, 2, 0.82],
 [0.5, 4, 0.8072222222222222],
 [0.5, 6, 0.7882222222222223]]


In [8]:
# Extend the function input
def gbm_grid_search_extended(learn_rate, max_depth, subsample):
    # Extend the model creation section
    model = GradientBoostingClassifier(
        learning_rate=learn_rate, 
        max_depth=max_depth,
        subsample=subsample
        )
    
    predictions = model.fit(X_train, y_train).predict(X_test)
    
    # Extend the return part
    return([learn_rate, max_depth, subsample, accuracy_score(y_test, predictions)])


In [9]:
# Create the new list to test
subsample_list = [0.4, 0.6]

for learn_rate in learn_rate_list:
    for max_depth in max_depth_list:
        # Extend the for loop
        for subsample in subsample_list:
            # Extend the results to include the new hyperparameter
            results_list.append(gbm_grid_search_extended(learn_rate, max_depth, subsample))


# Print the results
pprint(results_list)

[[0.01, 2, 0.8203333333333334],
 [0.01, 4, 0.8206666666666667],
 [0.01, 6, 0.8166666666666667],
 [0.1, 2, 0.8237777777777778],
 [0.1, 4, 0.822],
 [0.1, 6, 0.8205555555555556],
 [0.5, 2, 0.82],
 [0.5, 4, 0.8072222222222222],
 [0.5, 6, 0.7882222222222223],
 [0.01, 2, 0.4, 0.8111111111111111],
 [0.01, 2, 0.6, 0.8206666666666667],
 [0.01, 4, 0.4, 0.8176666666666667],
 [0.01, 4, 0.6, 0.818],
 [0.01, 6, 0.4, 0.8172222222222222],
 [0.01, 6, 0.6, 0.8174444444444444],
 [0.1, 2, 0.4, 0.8242222222222222],
 [0.1, 2, 0.6, 0.8236666666666667],
 [0.1, 4, 0.4, 0.8231111111111111],
 [0.1, 4, 0.6, 0.8213333333333334],
 [0.1, 6, 0.4, 0.8206666666666667],
 [0.1, 6, 0.6, 0.8193333333333334],
 [0.5, 2, 0.4, 0.8172222222222222],
 [0.5, 2, 0.6, 0.8208888888888889],
 [0.5, 4, 0.4, 0.7942222222222223],
 [0.5, 4, 0.6, 0.7992222222222222],
 [0.5, 6, 0.4, 0.7868888888888889],
 [0.5, 6, 0.6, 0.7842222222222223]]


Congratulations. You have effectively built your own grid search! You went from 2 to 3 hyperparameters and can see how you could extend that to even more values and hyperparameters. That was a lot of effort though. Be warned - we are now entering a world that can get very computationally expensive very fast!

### How Many Models?
  
Adding more hyperparameters or values, you increase the amount of models created but the increases is not linear it is proportional to how many values and hyperparameters you already have.
  
How many models would be created when running a grid search over the following hyperparameters and values for a GBM algorithm?
  
```python
learning_rate = [0.001, 0.01, 0.05, 0.1, 0.2, 0.3, 0.5, 1, 2]
max_depth = [4,6,8,10,12,14,16,18, 20]
subsample = [0.4, 0.6, 0.7, 0.8, 0.9]
max_features = ['auto', 'sqrt', 'log2']
```
  
These lists are in your console so you can utilize properties of them to help you!
  
Possible answers
  
- [ ] 26
- [ ] 9 of one model, 9 of another
- [ ] 1 large model
- [x] 1215
  
Excellent! For every value of one hyperparameter, we test EVERY value of EVERY other hyperparameter. So you correctly multiplied the number of values (the lengths of the lists).

## Grid Search with Scikit Learn
  
In this lesson we will move beyond our manual code and leverage Scikit Learn to assist our grid search.
  
**GridSearchCV Object**
  
In this lesson we will be introduced to Scikit Learn's `GridSearchCV`. It will help us create a grid search more efficiently and get some performance analytics. This is an example of a `GridSearchCV` object. Don't worry, we will break it down!
  
<center><img src='../_images/grid-search-with-scikit-learn.png' alt='img' width='740'></center>
  
**Steps in a Grid Search**
  
Firstly, let us conceptualize the steps needed to do a proper grid search. Some of these will be familiar from our manual work before. One Select an algorithm (or '`estimator=`') to tune. Two Define which hyperparameters we will tune. Three Define a range of values for each hyperparameter. Four Decide a cross-validation scheme. Five Define a scoring function to determine which model was the best. Six Include extra useful information or functions. The only one of these we did not do much work with previously is step (4), but we will cover each now.
  
1. An algorithm to tune the hyperparameters (or estimator)
2. Defining which hyperparameters to tune
3. Defining a range of values for each hyperparameter
4. Setting a cross-validatoin scheme
5. Defining a score function so we can decide which square on our grid was 'the best'
6. Include extra useful information or functions
  
**GridSearchCV Object Inputs**
  
A GridSearchCV object takes several important arguments, `estimator=`, `param_grid`, `cv=`, `scoring=`, `refit=`, `n_jobs=`, `return_train_score=`.
  
**GridSearchCV 'estimator'**
  
The estimator is our algorithm. Examples include KNN, Random Forest, GBM or Logistic Regression. We only pick one algorithm for each grid search.
  
**GridSearchCV 'param_grid'**
  
`param_grid=` is how we tell `GridSearchCV` which hyperparameters and which values to test. We were previously using lists, but `param_grid=` needs a dictionary. The dictionary keys must be the hyperparameter names, the values a list of values to test.
  
**GridSearchCV 'param_grid'**
  
The keys in the `param_grid=` dictionary must be valid hyperparameters else the Grid Search will fail. See the example here, '`best_choice=`' is not a hyperparameter of Scikit Learn's Logistic Regression estimator and so this will fail.
  
**GridSearchCV 'cv'**
  
The `cv=` input allows you to undertake cross-validation. You could specify different cross-validation types here. But simply providing an integer will create a k-fold. You are likely familiar with standard 5 and 10 k-fold cross validation.
  
<center><img src='../_images/grid-search-with-scikit-learn1.png' alt='img' width='740'></center>
  
**GridSearchCV 'scoring'**
  
`scoring=` is a scoring function used to evaluate your model's performance. You did this manually previously using accuracy. You can use your own custom metric, or one from the available metrics from Scikit Learn's metrics module. You can check all available metrics using this command.
  
<center><img src='../_images/grid-search-with-scikit-learn2.png' alt='img' width='740'></center>
  
**GridSearchCV 'refit'**
  
`refit=` set to true means the best hyperparameter combinations are used to undertake a fitting to the training data. The `GridSearchCV` object can be used as an estimator directly This is very handy as you don't need to save our the best hyperparameters and train another model.
  
**GridSearchCV 'n_jobs'**
  
`n_jobs=` assists with parallel execution. You can effectively 'split up' your work and have many models being created at the same time. This is possible because the results of one model do not affect the next one. You can check how many cores you have available, which determines how many models you can run in parallel using this handy code. Be careful using all cores for a task though as this may mean you can't do other work on your computer while your models run.
  
<center><img src='../_images/grid-search-with-scikit-learn3.png' alt='img' width='740'></center>
  
**GridSearchCV 'return_train_score'**
  
Finally `return_train_score` logs statistics about the training runs that were undertaken. This can be useful for plotting and understanding test vs training set performance (and hence bias-variance tradeoff). While informative, this is computationally expensive and will not assist in finding the best model.
  
**Building a GridSearchCV object**
  
Now we have all the components to build a grid search object. Firstly we create our parameter grid for the hyperparameters and values we want to input. Then we create the base classifier, setting some default values at the time of creation.
  
<center><img src='../_images/grid-search-with-scikit-learn4.png' alt='img' width='740'></center>
  
**Building a GridSearchCV Object**
  
We can now put the pieces together to create the `GridSearchCV` object. You can see all the elements you learned about previously including the estimator and parameter grid we just created. If this seems like a lot of code, review the couple of previous slides to see what each element means.
  
<center><img src='../_images/grid-search-with-scikit-learn5.png' alt='img' width='740'></center>
  
**Using a GridSearchCV Object**
  
With '`refit=`' set to `True`, we can directly use the `GridSearchCV` object as an estimator. That means we can fit onto our data and make predictions, just like any other Scikit Learn estimator!
  
<center><img src='../_images/grid-search-with-scikit-learn6.png' alt='img' width='740'></center>
  
**Let's practice!**
  
Let's undertake our own Grid Search with Scikit Learn's `GridSearchCV` module!

### GridSearchCV inputs
  
Let's test your knowledge of `GridSeachCV` inputs by answering the question below.

Three `GridSearchCV` objects are available in the console, named `model_1`, `model_2`, `model_3`. Note that there is no data available to fit these models. Instead, you must answer by looking at their construct.

Which of these `GridSearchCV` objects would not work when we try to fit it?
  
```python
Model #1:
 GridSearchCV(
    estimator = RandomForestClassifier(),
    param_grid = {'max_depth': [2, 4, 8, 15], 'max_features': ['auto', 'sqrt']},
    scoring='roc_auc',
    n_jobs=4,
    cv=5,
    refit=True, return_train_score=True) 


Model #2:
 GridSearchCV(
    estimator = KNeighborsClassifier(),
    param_grid = {'n_neighbors': [5, 10, 20], 'algorithm': ['ball_tree', 'brute']},
    scoring='accuracy',
    n_jobs=8,
    cv=10,
    refit=False) 


Model #3:
 GridSearchCV(
    estimator = GradientBoostingClassifier(),
    param_grid = {'number_attempts': [2, 4, 6], 'max_depth': [3, 6, 9, 12]},
    scoring='accuracy',
    n_jobs=2,
    cv=7,
    refit=True)
```
  
Possible answers
  
- [ ] model_1 would not work when we try to fit it.
- [ ] model_2 would not work when we try to fit it.
- [x] model_3 would not work when we try to fit it.
- [ ] None - they will all work when we try to fit them.
  
Correct! By looking at the Scikit Learn documentation (or your excellent memory!) you know that number_attempts is not a valid hyperparameter. This GridSearchCV will not fit to our data.

### GridSearchCV with Scikit Learn
  
The `GridSearchCV` module from Scikit Learn provides many useful features to assist with efficiently undertaking a grid search. You will now put your learning into practice by creating a `GridSearchCV` object with certain parameters.
  
The desired options are:
  
- A Random Forest Estimator, with the split `criterion=` as `'entropy'`
- 5-fold cross validation
- The hyperparameters `max_depth=` (2, 4, 8, 15) and `max_features=` (`'auto'` vs `'sqrt'`)
- Use `roc_auc` to score the models
- Use 4 cores for processing in parallel
- Ensure you refit the best model and return training scores
  
You will have available `X_train`, `X_test`, `y_train` & `y_test` datasets.
  
1. Create a Random Forest estimator as specified in the context above.
2. Create a parameter grid as specified in the context above.
3. Create a `GridSearchCV` object as outlined in the context above, using the two elements created in the previous two instructions.

In [10]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

# Create a Random Forest Classifier with specified criterion
rf_class = RandomForestClassifier(criterion='entropy')

# Create the parameter grid
param_grid = {
    'max_depth' : [2, 4, 8, 15],
    'max_features' : ['auto', 'sqrt']
}

# Create a GridSearchCV object
grid_rf_class = GridSearchCV(
    estimator=rf_class,
    param_grid=param_grid,
    scoring='roc_auc',
    n_jobs=-1,
    cv=5,
    refit=True, 
    return_train_score=True
)

print(grid_rf_class)

GridSearchCV(cv=5, estimator=RandomForestClassifier(criterion='entropy'),
             n_jobs=-1,
             param_grid={'max_depth': [2, 4, 8, 15],
                         'max_features': ['auto', 'sqrt']},
             return_train_score=True, scoring='roc_auc')


Excellent work! You now understand all the inputs to a `GridSearchCV` object and can tune many different hyperparameters and many different values for each on a chosen algorithm!

## Understanding a grid search output
  
Now that you know how to run a grid search, let's focus on its output.
  
**Analyzing the output**
  
Let us now analyze each of the properties of the `GridSearchCV` output and learn how to access and use them. The properties of the object can be categorized into three different groups a results log the best results and 'Extra information'.
  
<center><img src='../_images/understanding-a-grid-search-output.png' alt='img' width='740'></center>
  
**Accessing object properties**
  
Properties are accessed using the dot notation, that is `grid_search_object.property`. Where property is the actual property you want to retrieve Let's review each of the key properties now.
  
**The .cv_results_ property**
  
Firstly there is the `.cv_results_` property. This is a dictionary that we can read into a `pandas` DataFrame to explore. Notice there are 12 rows because there are 12 squares in our grid. Each row tells you about what happened when testing that square.
  
<center><img src='../_images/understanding-a-grid-search-output1.png' alt='img' width='740'></center>
  
**The .cv_results_ 'time' columns**
  
The `'time'` columns refer to the time it took to fit and score the model. We did a cross-validation so this ran 5 times and stored the average and standard deviation of the times it took in seconds.
  
<center><img src='../_images/understanding-a-grid-search-output2.png' alt='img' width='740'></center>
  
**The .cv_results_ 'param_' columns**
  
The `param_` columns contain information on the different parameters that were used in the model. Remember, each row in this DataFrame is about one model. So we can see row 3 for example tested the hyperparameter combination of max_depth 10 and `min_samples_leaf=` 2 and `n_estimators=` 100 for our random forest estimator.
  
<center><img src='../_images/understanding-a-grid-search-output3.png' alt='img' width='740'></center>
  
**The .cv_results_ 'param' column**
  
The `params` column is a dictionary of all the parameters from the previous `'param'` columns. We need to use `pd.set_option` here to ensure we don't truncate the results we are printing.
  
<center><img src='../_images/understanding-a-grid-search-output4.png' alt='img' width='740'></center>
  
**The .cv_results_ 'test_score' columns**
  
The next 5 columns are the testing scores for each of the 5 cross-folds, or splits, we made, followed by the the mean and standard deviation for those cross-folds.
  
<center><img src='../_images/understanding-a-grid-search-output5.png' alt='img' width='740'></center>
  
**The .cv_results_ 'rank_test_score' column**
  
The rank column conveniently ranks the rows by the `mean_test_score`. We can see that the model in our third row had the best `mean_test_score`.
  
<center><img src='../_images/understanding-a-grid-search-output6.png' alt='img' width='740'></center>
  
**Extracting the best row**
  
Using the `rank_test_score` column we can easily select the grid search square for analysis. This table is the row from the `cv_results` object that was the best model created.
  
<center><img src='../_images/understanding-a-grid-search-output7.png' alt='img' width='740'></center>
  
**The .cv_results_ 'train_score' columns**
  
The `test_score` columns are then repeated for the training scores. Note that if we had not set `return_train_score` to `True` this would not include the training scores. There is also no ranking column for the training scores, as we only care about performance on the test set in each fold.
  
<center><img src='../_images/understanding-a-grid-search-output8.png' alt='img' width='740'></center>
  
**The best grid square**
  
Information on the best grid square is found in three different properties `.best_params_` which is the dictionary of the parameters that gave the best score. `.best_score_`, the actual best score and `best_index`, the row in our `.cv_results_` that was the best. This is same as the index of the row with rank 1 in `.cv_results_` that we extracted just before.
  
<center><img src='../_images/understanding-a-grid-search-output9.png' alt='img' width='740'></center>
  
**The best_estimator_ property**
  
`GridSearchCV` stores an estimator built with the best hyperparameters in the `best_estimator` property. Since it is an estimator, we can use this to predict on our test set. We can demonstrate this by using python's `type()` function and see it is a Random Forest Classification estimator. We can also use the `GridSearchCV` object itself directly as an estimator.
  
<center><img src='../_images/understanding-a-grid-search-output10.png' alt='img' width='740'></center>
  
We can print out and see the estimator itself. This is why we set `refit=True` when creating the grid search, otherwise we would need to refit using the best parameters ourself before using the best estimator.
  
<center><img src='../_images/understanding-a-grid-search-output11.png' alt='img' width='740'></center>
  
**Extra information**
  
Some extra information can be obtained with the following properties. These are not very useful properties but may be important if you construct you grid search differently. These include the `.scorer_` function that was used and the number of cross validation splits, `.n_splits_` (both of which we set ourselves), and the `.refit_time_` which is the number of seconds used for refitting the best model on the whole dataset. This may be of interest in analyzing efficiencies in your work, but not for our use case here.
  
**Let's practice!**
  
Let's practice analyzing the output of a Scikit Learn `GridSearchCV` object!

### Using the best outputs
  
Which of the following parameters must be set in order to be able to directly use the `.best_estimator_` property for predictions?
  
Possible Answers
  
- [ ] `return_train_score = True`
- [x] `refit = True`
- [ ] `refit = False`
- [ ] `verbose = 1`
  
Correct! When we set this to true, the creation of the grid search object automatically refits the best parameters on the whole training set and creates the `.best_estimator_` property.

### Exploring the grid search results
  
You will now explore the `.cv_results_` property of the `GridSearchCV` object defined in the video. This is a dictionary that we can read into a `pandas` DataFrame and contains a lot of useful information about the grid search we just undertook.
  
A reminder of the different column types in this property:
  
- `time_` columns
- `param_` columns (one for each hyperparameter) and the singular params column (with all hyperparameter settings)
- a `train_score` column for each cv fold including the `mean_train_score` and `std_train_score` columns
- a `test_score` column for each cv fold including the `mean_test_score` and `std_test_score` columns
- a `rank_test_score` column with a number from 1 to n (number of iterations) ranking the rows based on their `mean_test_score`
  
1. Read the `.cv_results_` property of the `grid_rf_class` `GridSearchCV` object into a dataframe & print the whole thing out to `inspect`.
2. Extract & print the singular column containing a dictionary of all hyperparameters used in each iteration of the grid search.
3. Extract & print the row that had the best mean test score by indexing using the `rank_test_score` column.

In [11]:
grid_rf_class.fit(X_train, y_train)

# Read the cv_results property into adataframe & print it out
cv_results_df = pd.DataFrame(grid_rf_class.cv_results_)
print(cv_results_df)

# Extract and print the column with a dictionary of hyperparameters used
column = cv_results_df.loc[:, ["params"]]
print(column)

# Extract and print the row that had the best mean test score
best_row = cv_results_df[cv_results_df['rank_test_score'] == 1]
print(best_row)

20 fits failed out of a total of 40.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
11 fits failed with the following error:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/sklearn/model_selection/_validation.py", line 732, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/sklearn/base.py", line 1144, in wrapper
    estimator._validate_params()
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/sklearn/base.py", line 637, in _validate_params
    validate_parameter_constraints(
  File "/Library/Frameworks/Python.framework/V

   mean_fit_time  std_fit_time  mean_score_time  std_score_time  \
0       0.069938      0.029345         0.000000        0.000000   
1       8.035159      1.229820         0.320498        0.077827   
2       0.039107      0.019978         0.000000        0.000000   
3      12.897208      2.507468         0.188786        0.040932   
4       0.016495      0.003481         0.000000        0.000000   
5      14.115396      0.688304         0.235631        0.048647   
6       0.014250      0.001280         0.000000        0.000000   
7      20.595176      2.292307         0.267598        0.061484   

  param_max_depth param_max_features  \
0               2               auto   
1               2               sqrt   
2               4               auto   
3               4               sqrt   
4               8               auto   
5               8               sqrt   
6              15               auto   
7              15               sqrt   

                                   

In [12]:
cv_results_df.head(10)

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_max_depth,param_max_features,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score,split0_train_score,split1_train_score,split2_train_score,split3_train_score,split4_train_score,mean_train_score,std_train_score
0,0.069938,0.029345,0.0,0.0,2,auto,"{'max_depth': 2, 'max_features': 'auto'}",,,,,,,,5,,,,,,,
1,8.035159,1.22982,0.320498,0.077827,2,sqrt,"{'max_depth': 2, 'max_features': 'sqrt'}",0.770253,0.764266,0.753591,0.776706,0.759091,0.764781,0.008124,4,0.765254,0.767306,0.771989,0.763691,0.769325,0.767513,0.002935
2,0.039107,0.019978,0.0,0.0,4,auto,"{'max_depth': 4, 'max_features': 'auto'}",,,,,,,,5,,,,,,,
3,12.897208,2.507468,0.188786,0.040932,4,sqrt,"{'max_depth': 4, 'max_features': 'sqrt'}",0.777489,0.769799,0.758693,0.781245,0.765108,0.770467,0.008164,3,0.776025,0.776943,0.781207,0.774924,0.777913,0.777403,0.002144
4,0.016495,0.003481,0.0,0.0,8,auto,"{'max_depth': 8, 'max_features': 'auto'}",,,,,,,,5,,,,,,,
5,14.115396,0.688304,0.235631,0.048647,8,sqrt,"{'max_depth': 8, 'max_features': 'sqrt'}",0.782502,0.77636,0.760826,0.785573,0.773743,0.775801,0.008593,1,0.826753,0.827581,0.830219,0.826636,0.827497,0.827737,0.001298
6,0.01425,0.00128,0.0,0.0,15,auto,"{'max_depth': 15, 'max_features': 'auto'}",,,,,,,,5,,,,,,,
7,20.595176,2.292307,0.267598,0.061484,15,sqrt,"{'max_depth': 15, 'max_features': 'sqrt'}",0.778443,0.777201,0.757441,0.780455,0.770928,0.772894,0.008357,2,0.973976,0.97531,0.973336,0.971862,0.97031,0.972959,0.001728


Great work! You have built invaluable skills in looking 'under the hood' at what your grid search is doing by extracting and analysing the `.cv_results_` property.

### Analyzing the best results
  
At the end of the day, we primarily care about the best performing 'square' in a grid search. Luckily Scikit Learn's `GridSearchCV` objects have a number of parameters that provide key information on just the best square (or row in `.cv_results_`).
  
Three properties you will explore are:
  
- `.best_score_` – The score (here `ROC_AUC`) from the best-performing square.
- `.best_index_` – The index of the row in `.cv_results_` containing information on the best-performing square.
- `.best_params_` – A dictionary of the parameters that gave the best score, for example `'max_depth': 10`
The grid search object `grid_rf_class` is available.
  
A dataframe (`cv_results_df`) has been created from the `.cv_results_` for you on line 6. This will help you index into the results.
  
1. Extract and print out the `ROC_AUC` score from the best performing square in `grid_rf_class`.
2. Create a variable from the best-performing row by indexing into `cv_results_df`.
3. Create a variable, `.best_n_estimators` by extracting the `n_estimators=` parameter from the best-performing square in `grid_rf_class` and print it out.

In [16]:
# Print out the ROC_AUC score from the best-performing square
best_score = grid_rf_class.best_score_
print(best_score)

# Create a variable from the row related to the best-performing square
cv_results_df = pd.DataFrame(grid_rf_class.cv_results_)
best_row = cv_results_df.loc[[grid_rf_class.best_index_]]
print(best_row)

# Get the max_depth parameter from the best-performing square and print
best_max_depth = grid_rf_class.best_params_["max_depth"]
print(best_max_depth)

0.7758006241929944
   mean_fit_time  std_fit_time  mean_score_time  std_score_time  \
5      14.115396      0.688304         0.235631        0.048647   

  param_max_depth param_max_features  \
5               8               sqrt   

                                     params  split0_test_score  \
5  {'max_depth': 8, 'max_features': 'sqrt'}           0.782502   

   split1_test_score  split2_test_score  split3_test_score  split4_test_score  \
5            0.77636           0.760826           0.785573           0.773743   

   mean_test_score  std_test_score  rank_test_score  split0_train_score  \
5         0.775801        0.008593                1            0.826753   

   split1_train_score  split2_train_score  split3_train_score  \
5            0.827581            0.830219            0.826636   

   split4_train_score  mean_train_score  std_train_score  
5            0.827497          0.827737         0.001298  
8


Nice stuff! Being able to quickly find and prioritize the huge volume of information given back from machine learning modeling output is a great skill. Here you had great practice doing that with `.cv_results_` by quickly isolating the key information on the best performing square. This will be very important when your grids grow from 12 squares to many more!

### Using the best results
  
While it is interesting to analyze the results of our grid search, our final goal is practical in nature; we want to make predictions on our test set using our estimator object.
  
We can access this object through the `.best_estimator_` property of our grid search object.
  
Let's take a look inside the `.best_estimator_` property, make predictions, and generate evaluation scores. We will firstly use the default predict (giving class predictions), but then we will need to use `predict_proba` rather than predict to generate the roc-auc score as roc-auc needs probability scores for its calculation. We use a slice `[:,1]` to get probabilities of the positive class.
  
You have available the `X_test` and `y_test` datasets to use and the `grid_rf_class` object from previous exercises.
  
1. Check the type of the `.best_estimator_` property.
2. Use the `.best_estimator_` property to make predictions on our test set.
3. Generate a confusion matrix and `ROC_AUC` score from our predictions.

In [18]:
from sklearn.metrics import confusion_matrix, roc_auc_score

# See what type of object the best_estimator_property is
print(type(grid_rf_class.best_estimator_))

# Create an array of predictions directly using the best_estimator_property
predictions = grid_rf_class.best_estimator_.predict(X_test)

# Take a look to confirm it worked, this should be an array of 1's and 0's
print(predictions[0:10])

# Now create a confusion matrix
print("Confusion Matrix \n", confusion_matrix(y_test, predictions))

# Get the ROC-AUC score
predictions_proba = grid_rf_class.best_estimator_.predict_proba(X_test)[:, 1]
print("ROC-AUC Score \n", roc_auc_score(y_test, predictions_proba))

<class 'sklearn.ensemble._forest.RandomForestClassifier'>
[0 0 0 0 0 0 0 1 0 1]
Confusion Matrix 
 [[6687  334]
 [1275  704]]


ROC-AUC Score 
 0.7831412641451952


Nice stuff! The `.best_estimator_` property is a really powerful property to understand for streamlining your machine learning model building process. You now can run a grid search and seamlessly use the best model from that search to make predictions. Piece of cake!