In [None]:
import warnings
warnings.filterwarnings("ignore")

# Hyperparameter Tuning

When we call `fit()` on an estimator, it learns the parameters of the algorithm that make it fit the data best. However, some parameters are not directly learned within an estimator. These parameters are often referred to as hyperparameters, and include thing like:

- depth of a decision tree
- alpha for regularization
- kernel for support vector machines
- number of clusters for centroidal clustering

In this notebook we'll investigate some techniques for exploring and optimizing hyperparameters to improve the performance of our machine learning models.


**Note: If you haven't already downloaded the data, check out the instructions in the notebook called get_the_data.ipynb first!**

## Learning the Sensible Defaults

### Smoothing

How do we prevent overfit in our machine learning models? One strategy is to use regularization to affect *smoothing* in the data. 

Regularization is designed to penalize model complexity, therefore the higher the alpha, the less complex the model, decreasing the error due to variance (overfit). Alphas that are too high on the other hand increase the error due to bias (underfit). It is important, therefore to choose an optimal alpha such that the error is minimized in both directions.

The scikit-learn library offers a few popular techniques for regularization, including `LASSO`, `Ridge`, and `ElasticNet`:

- [Lasso](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html) (L1 Regularization) is a linear model trained with L1 prior as the regularizer. Lasso forces weak features to have zeroes as coefficients, effectively dropping the least predictive features. Technically the Lasso model is optimizing the same objective function as the Elastic Net with `l1_ratio=1.0` (no L2 penalty).
- [Ridge](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge) Regression (L2 Regularization) solves a regression model where the loss function is the linear least squares function and regularization is given by the l2-norm. Ridge assigns every feature a weight, but spreads the coefficient values out more equally, shrinking but still maintaining less predictive features.
- [ElasticNet](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html#sklearn.linear_model.ElasticNet) performs a linear regression with combined L1 and L2 priors as a regularizer.


Each regularizer has an `alpha` hyperparameter that helps to determine how much smoothing to do.

What are the default alpha values for each of the regularizers? What happens when you change them?

In [None]:
from sklearn.linear_model import Ridge, Lasso, ElasticNet
from yellowbrick.datasets import load_occupancy

X, y = load_occupancy()

In [None]:
model = Lasso() # what does alpha default to? What happens when you change it?
model.fit(X, y)
print(list(zip(X, model.coef_.tolist())))

In [None]:
model = Ridge() # what does alpha default to? What happens when you change it?
model.fit(X, y)
print(list(zip(X, model.coef_.tolist())))

In [None]:
model = ElasticNet() # what does alpha default to? What happens when you change it?
model.fit(X, y)
print(list(zip(X, model.coef_.tolist())))

## Gridsearch

Gridsearch is a method for finding the best combination of hyperparameters via an exhaustive search over specified parameter values for an estimator.

When you do a gridsearch, scikit-learn creates a new model for each possible combination of hyperparameters. Each of these combinations is a point on the search grid. Gridsearch trains each of these models and evaluates them using cross-validation, and then provides the results for the one that performed best.

In [None]:
from sklearn.model_selection import GridSearchCV

from yellowbrick.datasets import load_energy


X, y = load_energy()

ridge = Ridge(random_state=0)

alphas = [0.001, 0.01, 0.1, 1, 10, 100, 1000]
tuned_params = [{'alpha': alphas}]
n_folds = 5

grid = GridSearchCV(
    ridge, tuned_params, cv=n_folds
)

grid.fit(X, y)
print(grid.best_estimator_)

See also:

- [GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV)
- [RandomizedSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html#sklearn.model_selection.RandomizedSearchCV)


## Visual Gridsearch

### AlphaSelection
The `AlphaSelection` Visualizer demonstrates how different values of alpha influence model selection during the regularization of linear models. Generally speaking, alpha increases the affect of regularization, e.g. if alpha is zero there is no regularization and the higher the alpha, the more the regularization parameter influences the final model.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV

from yellowbrick.regressor import AlphaSelection
from yellowbrick.datasets import load_concrete

# Load the regression dataset
X, y = load_concrete()

# Create some lists of alphas to cross-validate against
small_range = np.logspace(-10, 1, 400)
medium_range = np.logspace(-10, 2, 400)
large_range = np.logspace(-10, 4, 400)

smoothers = {
    "Lasso": LassoCV(cv=5, alphas=small_range),
    "Ridge": RidgeCV(store_cv_values=True, alphas=large_range),
    "ElasticNet": ElasticNetCV(cv=5, alphas=medium_range)
}

for _, smoother in smoothers.items():
    _, ax = plt.subplots() # Create a new figure
    visualizer = AlphaSelection(smoother, size=(1080, 720))
    visualizer.fit(X, y)
    visualizer.poof()

The `AlphaSelection` class expects an estimator whose name ends with "CV". If you wish to use some other estimator, please see the `ManualAlphaSelection` Visualizer for manually iterating through all alphas and selecting the best one.

### Other Hyperparameter Tuning Resources

For more about hyperparameter tuning with Yellowbrick, check out:

- [Validation Curve Visualizer](https://www.scikit-yb.org/en/develop/api/model_selection/validation_curve.html)
- [Silhouette Visualizer](https://www.scikit-yb.org/en/develop/api/cluster/silhouette.html)
- [Elbow Curve Visualizer](https://www.scikit-yb.org/en/develop/api/cluster/elbow.html)
