# Ray Tune -Ray Tune with Sklearn Hyperparameter Tuning

© 2019-2022, Anyscale. All Rights Reserved

![Anyscale Academy](../images/AnyscaleAcademyLogo.png)


# Tune's Scikit Learn Drop-in Replacements

<img src="https://docs.ray.io/en/latest/_images/tune-sklearn1.png" align="center" width="50%">

Scikit-Learn is one of the most widely used tools in the ML community for working with data, offering dozens of easy-to-use machine learning algorithms. However, to achieve high performance for these algorithms, you often need to perform **model selection**. Model selection is way to elect the best performant model, after tuning over a set of parameters.

`tune-sklearn` is a module that integrates Ray Tune's hyperparameter tuning and scikit-learn's Classifier API. `tune-sklearn` has two APIs: [TuneSearchCV](https://docs.ray.io/en/latest/tune/api_docs/sklearn.html#tunesearchcv-docs) and [TuneGridSearchCV](https://docs.ray.io/en/latest/tune/api_docs/sklearn.html#tunesearchcv-docs). They are drop-in replacements for scikit-learn's [RandomizedSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html) and [GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html?highlight=gridsearchcv#sklearn.model_selection.GridSearchCV), so you only need to change less than five lines in a standard scikit-Learn script to use Tune's replacement API.

Let's compare Tune's scikit-learn APIs to the standard scikit-learn `GridSearchCV`. For this example, we'll be using `TuneGridSearchCV` with a stochastic gradient descent (SGD) [SGDClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html).

To start out, include the import statement to get tune-scikit-learn’s grid search cross validation interface.

We need to install a few libraries.

In [1]:
!pip install tune-sklearn



In [2]:
from sklearn.model_selection import GridSearchCV
# Import Tune's replacements
from ray.tune.sklearn import TuneGridSearchCV
from ray.tune.sklearn import TuneSearchCV

# Other relevant imports
from sklearn.model_selection import train_test_split

# Use the stochastic gradient descent (SGD) classifier
from sklearn.linear_model import SGDClassifier

# import the classification dataset
from sklearn import datasets
from sklearn.datasets import make_classification
import numpy as np

Create classification data using `sklearn.datasets`. To start with, with we using a small dataset of 11K rows and 1k columns. As an excercise you can increase the number and see the difference between using regular scikit-learn and tune-scikit-learn.

In [3]:
def create_classification_data() -> (np.ndarray, np.ndarray):
    X, y = make_classification(
        n_samples=11000,
        n_features=1000,
        n_informative=50,
        n_redundant=0,
        n_classes=10,
        class_sep=2.5)
    return X, y

Create the classifcation data, training and test data sets, and define our hyperparameter
grid. 

In [4]:
X, y = create_classification_data()
# Split the dataset into train and test sets
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=1000)

# Example parameters grid to tune from SGDClassifier
parameter_grid = {"alpha": [1e-4, 1e-1, 1], "epsilon": [0.01, 0.1]}

### Use Sklearn to train the model

Run this on a single node

In [7]:
%%time
# n_jobs=-1 enables use of all cores
sklearn_search = GridSearchCV(SGDClassifier(), parameter_grid, n_jobs=-1, verbose=True)
sklearn_search.fit(x_train, y_train)

Fitting 5 folds for each of 6 candidates, totalling 30 fits
CPU times: user 2.44 s, sys: 34.8 ms, total: 2.48 s
Wall time: 37.1 s


GridSearchCV(estimator=SGDClassifier(), n_jobs=-1,
             param_grid={'alpha': [0.0001, 0.1, 1], 'epsilon': [0.01, 0.1]},
             verbose=True)

In [8]:
print("Best hyperparameters found were: ", sklearn_search.best_params_)

Best hyperparameters found were:  {'alpha': 0.1, 'epsilon': 0.01}


### Use Ray's Tune's drop-in replacement

And from here, we proceed just like how we would in scikit-learn’s interface!

The `SGDClassifier` has a `partial_fit` API, which enables it to stop fitting to the data for a certain hyperparameter configuration. If the estimator does not support early stopping, we would fall back to a parallel grid search.

As you can see, the setup here is exactly how you would do it for scikit-learn, except we replace `GridSearchCV` with `TuneGridSearchCV`. Now, let's try fitting a model.



#### Start Ray on the local host

This will start Ray on the localhost. If you have a cluster, then you can supply the arguments to `ray.init(...)`.
Check the [documentation](https://docs.ray.io/en/latest/package-ref.html?highlight=ray.init#ray-init) for the specific arguments. Some examples:
 * `ray.init()`: Start Ray locally and all the relevant processes
 * `ray.init(address="localhost:6379")`: connect to the localhost cluster at a specified port (for the head node)
 * `ray.init(address="ray://123.45.67.89:10001")`: connect to an existing remote cluster, using the URI

In [9]:
import ray
ray.init(ignore_reinit_error=True)

2022-02-21 11:50:25,249	INFO services.py:1376 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


{'node_ip_address': '127.0.0.1',
 'raylet_ip_address': '127.0.0.1',
 'redis_address': '127.0.0.1:6379',
 'object_store_address': '/tmp/ray/session_2022-02-21_11-50-22_666292_13984/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2022-02-21_11-50-22_666292_13984/sockets/raylet',
 'webui_url': '127.0.0.1:8265',
 'session_dir': '/tmp/ray/session_2022-02-21_11-50-22_666292_13984',
 'metrics_export_port': 61671,
 'gcs_address': '127.0.0.1:55466',
 'node_id': 'cb1588aed40f21144a00ee93fbf8bcd233f02bca3be664132a57e399'}

Note the slight differences we introduced below:

 * an `early_stopping`, and
 * a specification of `max_iters` parameter

The ``early_stopping`` parameter allows us to terminate unpromising configurations. If ``early_stopping=True``, ``TuneGridSearchCV`` will default to using Tune's [ASHAScheduler](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#tune-scheduler-hyperband). You can pass in a custom algorithm - see Tune's documentation on [schedulers](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#tune-schedulers) for a full list to choose from.

``max_iters`` is the maximum number of iterations a given hyperparameter set could run for; it may run for fewer iterations if it is early stopped.

In [10]:
%%time
tune_search = TuneGridSearchCV(
    SGDClassifier(), parameter_grid, early_stopping=True, 
    max_iters=10, name="AcademyTraining", verbose=1)
tune_search.fit(x_train, y_train)

2022-02-21 11:51:04,510	INFO tune.py:636 -- Total run time: 15.30 seconds (14.47 seconds for the tuning loop).


CPU times: user 3.44 s, sys: 528 ms, total: 3.97 s
Wall time: 17.7 s


TuneGridSearchCV(early_stopping=<ray.tune.schedulers.async_hyperband.AsyncHyperBandScheduler object at 0x7fe4129ab410>,
                 estimator=SGDClassifier(),
                 loggers=[<class 'ray.tune.logger.JsonLogger'>,
                          <class 'ray.tune.logger.CSVLogger'>],
                 max_iters=10, mode='max', n_jobs=-1, name='AcademyTraining',
                 param_grid={'alpha': [0.0001, 0.1, 1], 'epsilon': [0.01, 0.1]},
                 scoring={'score': <function _passthrough_scorer at 0x7fe4105d2050>},
                 sk_n_jobs=1, verbose=1)

In [11]:
print("Best hyperparameters found were: ", tune_search.best_params)

Best hyperparameters found were:  {'alpha': 0.1, 'epsilon': 0.1}


## Using Bayesian Optimization

In addition to the grid search interface, tune-sklearn also provides an interface, `TuneSearchCV`, for sampling from **distributions of hyperparameters**.

In addition, you can easily enable Bayesian optimization over the distributions in only 2 lines of code:



In [12]:
%%time
digits = datasets.load_digits()
x = digits.data
y = digits.target
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=.2)

clf = SGDClassifier()
parameter_grid = {"alpha": (1e-4, 1), "epsilon": (0.01, 0.1)}

bayopt_tune_search = TuneSearchCV(
    clf,
    parameter_grid,
    search_optimization="bayesian",
    n_trials=3,
    early_stopping=True,
    max_iters=10,
    verbose=1,
)
bayopt_tune_search.fit(x_train, y_train)

2022-02-21 11:51:51,516	INFO tune.py:636 -- Total run time: 3.01 seconds (2.89 seconds for the tuning loop).


CPU times: user 327 ms, sys: 67.6 ms, total: 394 ms
Wall time: 3.16 s


TuneSearchCV(early_stopping=<ray.tune.schedulers.async_hyperband.AsyncHyperBandScheduler object at 0x7fe425428e10>,
             estimator=SGDClassifier(),
             loggers=[<class 'ray.tune.logger.JsonLogger'>,
                      <class 'ray.tune.logger.CSVLogger'>],
             max_iters=10, mode='max', n_jobs=-1, n_trials=3,
             param_distributions={'alpha': (0.0001, 1), 'epsilon': (0.01, 0.1)},
             scoring={'score': <function _passthrough_scorer at 0x7fe4105d2050>},
             search_optimization='bayesian', sk_n_jobs=1, verbose=1)

In [13]:
print("Best hyperparameters found were: ", bayopt_tune_search.best_params)

Best hyperparameters found were:  {'alpha': 0.31824533288476503, 'epsilon': 0.010387697631192022}


In [14]:
ray.shutdown()

### Excercise

 * Try increasing the `n_samples` to 110K and `test_size=10000.` 
 
 Run end-to-end. If the normal scikit-learn takes too long, stop it and continue with Ray's version.
 Do you see the difference in execution time?