# Distributed HPO with Ray Tune's TuneGridSearchCV and Scikit-Learn

This demo introduces **Ray tune's** key concepts using a classification example. Basically, there are three basic steps or Ray Tune pattern for you as a newcomer to get started with using Ray Tune. We'll use a drop-in replacement for normal Scikit-learn's `GridSearchCV` with distributed Ray Tune's `TuneGridSearchCV`.

See also the [Understanding Hyperparameter Tuning](https://github.com/anyscale/academy/blob/main/ray-tune/02-Understanding-Hyperparameter-Tuning.ipynb) notebook and the [Tune documentation](http://tune.io), in particular, the [API reference](https://docs.ray.io/en/latest/tune/api_docs/overview.html). 


In [1]:
from sklearn.model_selection import GridSearchCV
# Import Tune's replacement
from ray.tune.sklearn import TuneGridSearchCV

# Other relevant imports
from sklearn.model_selection import train_test_split

# Use the stochastic gradient descent (SGD) classifier
from sklearn.linear_model import SGDClassifier

# import the classification dataset
from sklearn.datasets import make_classification
import numpy as np
import time
import ray

In [2]:
CONNECT_TO_ANYSCALE=False
if ray.is_initialized:
    ray.shutdown()
    if CONNECT_TO_ANYSCALE:
        ray.init("anyscale://jsd-weekly-demo")
    else:
        ray.init()

2022-02-02 14:09:53,688	INFO worker.py:842 -- Connecting to existing Ray cluster at address: 172.31.111.127:6379


### Create Feature Set

 * 2.1M rows
 * 1K features
 * 5 classes

In [5]:
def create_classification_data() -> (np.ndarray, np.ndarray):
    X, y = make_classification(
        n_samples=210000,
        n_features=1000,
        n_informative=50,
        n_redundant=0,
        n_classes=5,
        class_sep=2.5)
    return X, y

### Create classification data and define parameter search space

In [6]:
X, y = create_classification_data()
# Split the dataset into train and test sets
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=10000)

# Example parameters grid to tune from SGDClassifier
parameter_grid = {"alpha": [1e-4, 1e-1, 1], "epsilon": [0.01, 0.1]}

## Use Regular Scikit-learn GridSearch
This will run on a single node using all its cores. Here number of cores = 8

In [9]:
# n_jobs=-1 enables use of all cores does
sklearn_search = GridSearchCV(SGDClassifier(),
                    parameter_grid,
                    n_jobs=-1
                              ,
                    verbose=True)

In [10]:
%%time
sklearn_search.fit(x_train, y_train)

Fitting 5 folds for each of 6 candidates, totalling 30 fits
CPU times: user 39.8 s, sys: 4.49 s, total: 44.3 s
Wall time: 20min 20s


GridSearchCV(estimator=SGDClassifier(), n_jobs=16,
             param_grid={'alpha': [0.0001, 0.1, 1], 'epsilon': [0.01, 0.1]},
             verbose=True)

In [11]:
 print(f"Standard Scikit-learn GridSearchCV Best params: {sklearn_search.best_params_}")

Standard Scikit-learn GridSearchCV Best params: {'alpha': 0.1, 'epsilon': 0.01}


## Use Ray's Scikit-learn drop-in replacement TuneGridSearchCV
Use 40 cores on a Ray Cluster to tune 

In [15]:
# Now let's do with Tune's in-place replacement
# Note: If early_stopping=True, TuneGridSearchCV will default to using Tune’s ASHAScheduler.
tune_sklearn = TuneGridSearchCV(SGDClassifier(), 
                    parameter_grid,
                    early_stopping=True,
                    max_iters=30,
                    n_jobs=40,    # Use 40 cores 
                    mode="min",
                    verbose=True)

In [None]:
%%time
tune_sklearn.fit(x_train, y_train)

In [14]:
print(f"Ray Tune Scikit-learn TuneGridSearchCV Best params: {tune_sklearn.best_params}")

Ray Tune Scikit-learn TuneGridSearchCV Best params: {'alpha': 0.1, 'epsilon': 0.01}
