<a href="https://colab.research.google.com/github/ShumengJ/ECEGY6143-ML-Archive/blob/main/7_hw_grid_search.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Homework: Grid search for hyperparameter tuning



* Name:
* Net ID:

## Introduction

For models with a single hyperparameter controlling bias-variance (for example: $k$ in $k$ nearest neighbors), we used Scikit-learn's `KFoldCV` to test a range of values for the hyperparameter, and to select the best one.



When we have *multiple* hyperparameters to tune, we can use `GridSearchCV` to select the best *combination* of them.

For example, in this week's lesson (in the notebook on bias and variance of SVM), we saw three ways to tune the bias-variance of an SVM classifier:

* Changing the kernel
* Changing $C$, the inverse of the regularization penalty weight
* For an RBF kernel, changing $\gamma$, the inverse of the kernel bandwidth


To get the best performance from an SVM classifier, we need to find the best *combination* of these hyperparameters.

This notebook shows how to use `GridSearchCV` to tune an SVM classifier.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score
from sklearn.svm import SVC

import numpy as np
import pandas as pd

## Get the data

We will work with a subset of the MNIST handwritten digits data. First, we will get the data, and assign a small subset of samples to training and test sets.

In [None]:
from sklearn.datasets import fetch_openml

In [None]:
X, y = fetch_openml('mnist_784', version=1, return_X_y=True )

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                     train_size=10000, test_size=3000)

## Run grid search

Then, we will define a *parameter grid* with all the combinations of hyperparameters that we want to test.

In [None]:
param_grid = [
  {'C': [0.1, 1000], 'kernel': ['linear']},
  {'C': [0.1, 1000], 'gamma': [0.01, 0.0001], 'kernel': ['rbf']},
 ]
param_grid

We will pass the parameter grid to a `GridSearchCV`, along with the number of CV folds to use.

Also, we set:

* `verbose` to a large positive number, so that we get plenty of logging output, and
* `refit` to `True`, so that after testing all of the hyperparameter combinations, it will re-fit an SVM classifier with the hyperparameters that had the best mean validation score.


In [None]:
clf = GridSearchCV(SVC(), param_grid, cv=3, refit=True, verbose=100)
clf.fit(X_train, y_train)

## Review results

Finally, we'll print the results of the cross validation. For each combination of parameters, we can see:

* the validation score for each fold
* the mean validation score
* the standard deviation of the validation score
* the rank, by mean validation score

(in the report, the "test" scores are validation scores.)

In [None]:
pd.DataFrame(clf.cv_results_)

## Evaluate performance of the re-fitted model

We can see the "best" parameters, with which the model was re-fitted:

In [None]:
print(clf.best_params_)

And we can evaluate the re-fitted model on the test set. (Note that the `GridSearchCV` only used the training set; we have not used the test set at all for model fitting.)

In [None]:
y_pred = clf.predict(X_test)

In [None]:
accuracy_score(y_pred, y_test)

## Assignment

The results of a `GridSearchCV` are only as good as the combinations of hyperparameters we test in the grid.

* If the range of hyperparameter values is too narrow (it excludes good values), the model accuracy will be lower that it would be with a better choice of hyperparameters.
* If the search space is large with a fine resolution, the grid search will take a very long time.
* If the search space is large with a coarse resolution, we may not find a good combination of hyperparameters.

In the demo above, I did not use a good parameter grid. For your assignment, try to improve the parameter grid, and re-run the notebook with your modified parameter grid.

Explain the results. In particular, explain: if *I* would run your notebook, with exactly the parameter grid you defined, would I be confident that the SVM performance is about as good as it can be? Why?

Also answer the following question: suppose instead of using a `GridSearchCV`, I would separately run one `KFoldCV` over a range of values of $C$, one `KFoldCV` over a range of values of $\gamma$, and one `KFoldCV` for two values of `kernel`. In other words, I would independently select a best value for each hyperparameter. Would this be a good strategy? Why or why not?

Submit the PDF version of the notebook, including your explanation.

