# Ray crash course - Distributed HPO with Ray Tune and XGBoost-Ray

© 2019-2022, Anyscale. All Rights Reserved

This demo introduces **Ray tune's** key concepts using a classification example. This example is derived from [Hyperparameter Tuning with Ray Tune and XGBoost-Ray](https://github.com/ray-project/xgboost_ray#hyperparameter-tuning). Basically, there are three basic steps or Ray Tune pattern for you as a newcomer to get started with using Ray Tune.

Three simple steps:

 1. Setup your config space and define your trainable and objective function
 2. Use Tune to execute your training hyperparameter sweep, supplying the appropriate arguments including: search space, [search algorithms](https://docs.ray.io/en/latest/tune/api_docs/suggestion.html#summary) or [trial schedulers](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#tune-schedulers)
 3. Examine or analyse the results returned
 
 <img src="https://docs.ray.io/en/latest/_images/tune_flow.png" height="50%" width="60%">


See also the [Understanding Hyperparameter Tuning](https://github.com/anyscale/academy/blob/main/ray-tune/02-Understanding-Hyperparameter-Tuning.ipynb) notebook and the [Tune documentation](http://tune.io), in particular, the [API reference](https://docs.ray.io/en/latest/tune/api_docs/overview.html). 


In [38]:
import os

from xgboost_ray import RayDMatrix, RayParams, train
from sklearn.datasets import load_breast_cancer

import ray
from ray import tune
CONNECT_TO_ANYSCALE=False

In [39]:
if ray.is_initialized:
    ray.shutdown()
    if CONNECT_TO_ANYSCALE:
        ray.init("anyscale://jsd-ray-core-tutorial")
    else:
        ray.init()

2022-04-19 13:10:26,474	INFO services.py:1460 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


## Step 1: Define a 'Trainable' training function to use with Ray Tune `ray.tune(...)`

In [40]:
NUM_OF_ACTORS = 4           # degree of parallel trials; each actor will have a separate trial with a set of unique config from the search space
NUM_OF_CPUS_PER_ACTOR = 1   # number of CPUs per actor

ray_params = RayParams(num_actors=NUM_OF_ACTORS, cpus_per_actor=NUM_OF_CPUS_PER_ACTOR)

In [41]:
def train_func_model(config:dict, checkpoint_dir=None):
    # create the dataset
    train_X, train_y = load_breast_cancer(return_X_y=True)
    # Convert to RayDMatrix data structure
    train_set = RayDMatrix(train_X, train_y)

    # Empty dictionary for the evaluation results reported back
    # to tune
    evals_result = {}

    # Train the model with XGBoost train
    bst = train(
        params=config,                       # our hyperparameter search space
        dtrain=train_set,                    # our RayDMatrix data structure
        evals_result=evals_result,           # place holder for results
        evals=[(train_set, "train")],
        verbose_eval=False,
        ray_params=ray_params)                # distributed parameters configs for Ray Tune
    
    # save the model in the checkpoint dir for each trial run
    with tune.checkpoint_dir(step=0) as checkpoint_dir:
        bst.save_model(os.path.join(checkpoint_dir, "model.xgb"))

## Step 2: Define a hyperparameter search space

In [42]:
 # Specify the typical hyperparameter search space
config = {
    "tree_method": "approx",
    "objective": "binary:logistic",
    "eval_metric": ["logloss", "error"],
    "eta": tune.loguniform(1e-4, 1e-1),
    "subsample": tune.uniform(0.5, 1.0),
    "max_depth": tune.randint(1, 9)
}

## Step 3: Run Ray tune main trainer and examine the results

Ray Tune will launch distributed HPO, using four remote actors, each with its own instance of the trainable func

<img src="images/ray_tune_dist_hpo.png" height="60%" width="70%"> 

In [43]:
# Run tune
analysis = tune.run(
    train_func_model,
    config=config,
    metric="train-error",
    mode="min",
    num_samples=4,
    verbose=1,
    resources_per_trial=ray_params.get_tune_resources()
)

2022-04-19 13:10:59,588	INFO tune.py:702 -- Total run time: 22.75 seconds (22.63 seconds for the tuning loop).


In [44]:
print("Best hyperparameters", analysis.best_config)

Best hyperparameters {'tree_method': 'approx', 'objective': 'binary:logistic', 'eval_metric': ['logloss', 'error'], 'eta': 0.00014427164112291156, 'subsample': 0.9017039270833161, 'max_depth': 7}


In [45]:
analysis.results_df.head(5)

Unnamed: 0_level_0,train-logloss,train-error,time_this_iter_s,done,timesteps_total,episodes_total,training_iteration,experiment_id,date,timestamp,...,warmup_time,experiment_tag,config.tree_method,config.objective,config.eval_metric,config.eta,config.subsample,config.max_depth,config.nthread,config.n_jobs
trial_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
c6cab_00000,0.53444,0.02812,0.007696,True,,,10,a32c559970074412bbe511c236704d23,2022-04-19_13-10-45,1650399045,...,0.003457,"0_eta=0.0229,max_depth=3,subsample=0.6016",approx,binary:logistic,"[logloss, error]",0.022904,0.601579,3,1,1
c6cab_00001,0.644833,0.038664,0.004796,True,,,10,ee042876a8c3491086531bd510b155fd,2022-04-19_13-10-48,1650399048,...,0.00294,"1_eta=0.0063,max_depth=2,subsample=0.7052",approx,binary:logistic,"[logloss, error]",0.006308,0.70518,2,1,1
c6cab_00002,0.658369,0.073814,0.005228,True,,,10,1f3fd18913284667b705feb095ff2740,2022-04-19_13-10-54,1650399054,...,0.002776,"2_eta=0.0051,max_depth=1,subsample=0.7892",approx,binary:logistic,"[logloss, error]",0.005146,0.78917,1,1,1
c6cab_00003,0.691881,0.01406,0.004248,True,,,10,e1e89dd6d50c467a8d8431acb1085010,2022-04-19_13-10-58,1650399058,...,0.003222,"3_eta=0.0001,max_depth=7,subsample=0.9017",approx,binary:logistic,"[logloss, error]",0.000144,0.901704,7,1,1


---

In [46]:
analysis.best_logdir

'/Users/jules/ray_results/train_func_model_2022-04-19_13-10-36/train_func_model_c6cab_00003_3_eta=0.0001,max_depth=7,subsample=0.9017_2022-04-19_13-10-50'

In [49]:
ray.shutdown()

### Homework

1. Try read the references below 
2. Try some of the examples in the references

## References

 * [Ray Train: Tune: Scalable Hyperparameter Tuning](https://docs.ray.io/en/master/tune/index.html)
 * [Introducing Distributed XGBoost Training with Ray](https://www.anyscale.com/blog/distributed-xgboost-training-with-ray)
 * [How to Speed Up XGBoost Model Training](https://www.anyscale.com/blog/how-to-speed-up-xgboost-model-training)
 * [XGBoost-Ray Project](https://github.com/ray-project/xgboost_ray)
 * [Distributed XGBoost on Ray](https://docs.ray.io/en/latest/xgboost-ray.html)