# Hyperparameter optimization with Dask


Every machine learning model has some values that are specified before training begins. These values help adapt the model to the data but must be given before any training data is seen. For example, this might be `penalty` or `C` in Scikit-learn's [LogisiticRegression]. These values that come before any training data and are called "hyperparameters". Typical usage looks something like:

``` python
from sklearn.linear_model import LogisiticRegression
from sklearn.datasets import make_classification

X, y = make_classification()
est = LogisiticRegression(C=10, penalty="l2")
est.fit(X, y)
```

These hyperparameters influence the quality of the prediction. If `C` is too small above, the output of the estimator will be too regularized to produce meaningful output.

Determining the values of these hyperparameters is difficult. In fact, Scikit-learn has an entire documentation page on finding the best values: https://scikit-learn.org/stable/modules/grid_search.html

[LogisiticRegression]:https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html


Dask enables some new techniques and opportunities for hyperparameter optimization.

**This example will walk through**

* how to determine the input parameters
* how to use the a particular algorithm implemented in `HyperbandSearchCV`

## Setup Dask

In [None]:
from distributed import Client
client = Client("localhost:8786")
client

## Create Data

In [None]:
from sklearn.datasets import make_circles
import numpy as np
import pandas as pd

X, y = make_circles(n_samples=30_000, random_state=0, noise=0.09)

pd.DataFrame({0: X[:, 0], 1: X[:, 1], "class": y}).sample(4_000).plot.scatter(x=0, y=1, alpha=0.2, c="class", cmap="bwr")


### Add random dimensions

In [None]:
from sklearn.utils import check_random_state

rng = check_random_state(42)
random_feats = rng.uniform(-1, 1, size=(X.shape[0], 4))
X = np.hstack((X, random_feats))
X.shape

### Split and scale data

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=5_000, random_state=42)

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
scaler = StandardScaler().fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

## Create model and search space

Let's use Scikit-learn's MLPClassifier as our model. Let's use this model with 24 neurons and tune some of the other basic hyperparameters.


In [None]:
import numpy as np
from sklearn.neural_network import MLPClassifier

model = MLPClassifier()

params = {
    "hidden_layer_sizes": [
        (24, ),
        (12, 12),
        (6, 6, 6, 6),
        (4, 4, 4, 4, 4, 4),
        (12, 6, 3, 3),
    ],
    "activation": ["relu", "logistic", "tanh"],
    "alpha": np.logspace(-6, -3, num=1000),  # cnts
    "batch_size": [16, 32, 64, 128, 256, 512],
}

# Hyperparameter optimization

`HyperbandSearchCV` is Dask-ML's meta-estimator to find the best hyperparameters. It can be used as an alternative to `RandomizedSearchCV` to find similar hyper-parameters in less time by not wasting time on hyper-parameters that are not promising. Specifically, it is almost guaranteed that it will find high performing models with minimal training.

This section will focus on

1. Determining the input parameters to `HyperbandSearchCV`
2. Using `HyperbandSearchCV` to find the best hyperparameters
3. Seeing other use cases of `HyperbandSearchCV`

In [None]:
from dask_ml.model_selection import HyperbandSearchCV

## Determining input parameters

A rule-of-thumb to determine `HyperbandSearchCV`'s input parameters requires knowing:

1. the number of examples the longest trained model will see
2. the number of hyperparameters to evaluate

In [None]:
n_examples = 27 * len(X_train)
n_params = 27

With this, the inputs to `HyperbandSearchCV` fall out pretty naturally:

In [None]:
max_iter = n_params
chunks = n_examples // n_params
max_iter, chunks

This means that the longest trained estimator will see about `n_examples` examples (specifically `n_params * (n_examples // n_params`).

`HyperbandSearchCV` will work by calling `partial_fit` on each chunk of a Dask array. The `max_iter` property determines how many times to call `partial_fit`:

In [None]:
search = HyperbandSearchCV(
    model,
    params,
    max_iter=max_iter,
    patience=True,
)

And the Dask array should be rechunked to determine how many examples each `partial_fit` call sees:

In [None]:
import dask.array as da
X_train = da.from_array(X_train, chunks=chunks)
y_train = da.from_array(y_train, chunks=chunks)
X_train.chunks

It isn't clear how to determine how much computation is done from `max_iter` and `chunks`. Luckily, `HyperbandSearchCV` has a `metadata` attribute to determine this beforehand:

In [None]:
search.metadata["partial_fit_calls"]

This shows how many `partial_fit` calls will be performed in the computation. `metadata` also includes information on the number of models created.

In [None]:
%%time
search.fit(X_train, y_train, classes=list(range(4)))

The dashboard will be active while this is running. It will show which workers are running `partial_fit` and `score` calls.

# Integration

`HyperbandSearchCV` follows the Scikit-learn API and mirrors Scikit-learn's `RandomizedSearchCV`. This means that it "just works". All the Scikit-learn attributes and methods are available:

In [None]:
search.best_score_

In [None]:
search.best_estimator_

In [None]:
cv_results = pd.DataFrame(search.cv_results_)
cv_results.head()

In [None]:
search.score(X_test, y_test)

In [None]:
search.predict(X_test)

In [None]:
search.predict(X_test).compute()

It also has some other attributes.

In [None]:
hist = pd.DataFrame(search.history_)
hist.head()

This illustrates the history after every `partial_fit` call. There's also an attributed `model_history_` that records the history for each model (it's a reorganization of `history_`).

# Learn more

This notebook covered basic usage `HyperbandSearchCV`. The following documentation and resources might be useful to learn more about `HyperbandSearchCV`, including some of the finer use cases:

* [A talk introducing `HyperbandSearchCV` to the SciPy 2019 audience](https://www.youtube.com/watch?v=x67K9FiPFBQ)
* [HyperbandSearchCV's documentation](https://ml.dask.org/modules/generated/dask_ml.model_selection.HyperbandSearchCV.html)