# Distributed HPO with Ray Tune and XGBoost-Ray

This demo introduces **Ray tune's** key concepts using a a classification example. This example is derived from [Hyperparametere Tuning with Ray Tune and XGBoost-Ray](https://github.com/ray-project/xgboost_ray#hyperparameter-tuning). Basically, there are three basic steps or Ray Tune pattern for you as a newcomer to get started with using Ray Tune.

 1. Setup your config space and define your trainable and objective function
 2. Use tune to execute your training, supplying the appropriate arguments including: search space, [search algorithms](https://docs.ray.io/en/latest/tune/api_docs/suggestion.html#blendsearch) or [trial schedulers](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#tune-schedulers)
 3. Examine analyse the results
 
 <img src="https://docs.ray.io/en/latest/_images/tune-workflow.png" height="50%" width="60%">


See also the [Hyperparameter Tuning References](References-Hyperparameter-Tuning.ipynb) notebook and the [Tune documentation](http://tune.io), in particular, the [API reference](https://docs.ray.io/en/latest/tune/api_docs/overview.html). 


In [1]:
from xgboost_ray import RayDMatrix, RayParams, train
from sklearn.datasets import load_breast_cancer

from ray import tune

## Step 1: Define a 'Trainable' training function to use with Ray Tune `ray.tune(...)`

In [2]:
NUM_OF_ACTORS = 4           # degree of parallel trials; each actor will have a separate trial
NUM_OF_CPUS_PER_ACTOR = 1   # number of CPUs per actor

In [3]:
ray_params = RayParams(num_actors=NUM_OF_ACTORS, cpus_per_actor=NUM_OF_CPUS_PER_ACTOR)

In [4]:
def train_func_model(config:dict):
    # create the dataset
    train_X, train_y = load_breast_cancer(return_X_y=True)
    # Convert to RayDMatrix data structure
    train_set = RayDMatrix(train_X, train_y)

    # Empty dictionary for the evaluation results reported back
    # to tune
    evals_result = {}

    # Train the model with XGBoost train
    bst = train(
        params=config,                       # our hyperparameter search space
        dtrain=train_set,                    # our RayDMatrix data structure
        evals_result=evals_result,           # place holder for results
        evals=[(train_set, "train")],
        verbose_eval=False,
        ray_params=ray_params)                # distributed parameters configs for Ray Tune

    bst.save_model("model.xgb")

## Step 2: Define a hyperparameter search space

In [5]:
 # Specify the hyperparameter search space
config = {
    "tree_method": "approx",
    "objective": "binary:logistic",
    "eval_metric": ["logloss", "error"],
    "eta": tune.loguniform(1e-4, 1e-1),
    "subsample": tune.uniform(0.5, 1.0),
    "max_depth": tune.randint(1, 9)
}

## Step 3: Run Ray tune main trainer and examine the results

Ray Tune will launch distributed HPO, using four remote actors, each with its own instance of the trainable func

<img src="images/ray_tune_dist_hpo.png" height="50%" width="60%"> 

In [6]:
# Run tune
analysis = tune.run(
    train_func_model,
    config=config,
    metric="train-error",
    mode="min",
    num_samples=4,
    resources_per_trial=ray_params.get_tune_resources()
)

2022-01-04 18:07:44,192	INFO services.py:1338 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


Trial name,status,loc,eta,max_depth,subsample
train_func_model_45e5e_00000,RUNNING,127.0.0.1:6054,0.000547199,2,0.963656
train_func_model_45e5e_00001,PENDING,,0.00137726,8,0.847471
train_func_model_45e5e_00002,PENDING,,0.000147236,5,0.778445
train_func_model_45e5e_00003,PENDING,,0.00256393,8,0.912371


[2m[36m(ImplicitFunc pid=6054)[0m 2022-01-04 18:07:47,730	INFO main.py:976 -- [RayXGBoost] Created 4 new actors (4 total actors). Waiting until actors are ready for training.
[2m[36m(ImplicitFunc pid=6054)[0m 2022-01-04 18:07:49,195	INFO main.py:1021 -- [RayXGBoost] Starting XGBoost training.
[2m[36m(_RemoteRayXGBoostActor pid=6047)[0m [18:07:49] task [xgboost.ray]:140564624014928 got new rank 3
[2m[36m(_RemoteRayXGBoostActor pid=6044)[0m [18:07:49] task [xgboost.ray]:140708772576848 got new rank 1
[2m[36m(_RemoteRayXGBoostActor pid=6045)[0m [18:07:49] task [xgboost.ray]:140411480768080 got new rank 2
[2m[36m(_RemoteRayXGBoostActor pid=6046)[0m [18:07:49] task [xgboost.ray]:140264781053520 got new rank 0
[2m[36m(ImplicitFunc pid=6051)[0m 2022-01-04 18:07:49,368	INFO main.py:976 -- [RayXGBoost] Created 4 new actors (4 total actors). Waiting until actors are ready for training.


Result for train_func_model_45e5e_00000:
  date: 2022-01-04_18-07-50
  done: false
  experiment_id: e510848140a8433a89afbdd1e9f83d04
  hostname: Juless-MacBook-Pro-16-inch-2019
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 6054
  time_since_restore: 2.9446771144866943
  time_this_iter_s: 2.9446771144866943
  time_total_s: 2.9446771144866943
  timestamp: 1641348470
  timesteps_since_restore: 0
  train-error: 0.061511
  train-logloss: 0.692697
  training_iteration: 1
  trial_id: 45e5e_00000
  
Result for train_func_model_45e5e_00000:
  date: 2022-01-04_18-07-50
  done: true
  experiment_id: e510848140a8433a89afbdd1e9f83d04
  experiment_tag: 0_eta=0.0005472,max_depth=2,subsample=0.96366
  hostname: Juless-MacBook-Pro-16-inch-2019
  iterations_since_restore: 10
  node_ip: 127.0.0.1
  pid: 6054
  time_since_restore: 3.0082030296325684
  time_this_iter_s: 0.0066030025482177734
  time_total_s: 3.0082030296325684
  timestamp: 1641348470
  timesteps_since_restore: 0
  train-error: 0

[2m[36m(ImplicitFunc pid=6054)[0m 2022-01-04 18:07:50,696	INFO main.py:1500 -- [RayXGBoost] Finished XGBoost training on training data with total N=569 in 3.01 seconds (1.50 pure XGBoost training time).
[2m[36m(ImplicitFunc pid=6051)[0m 2022-01-04 18:07:51,468	INFO main.py:1021 -- [RayXGBoost] Starting XGBoost training.
[2m[36m(_RemoteRayXGBoostActor pid=6043)[0m [18:07:51] task [xgboost.ray]:140698036272768 got new rank 0
[2m[36m(_RemoteRayXGBoostActor pid=6042)[0m [18:07:51] task [xgboost.ray]:140245987753600 got new rank 1
[2m[36m(_RemoteRayXGBoostActor pid=6080)[0m [18:07:51] task [xgboost.ray]:140550670876240 got new rank 3
[2m[36m(_RemoteRayXGBoostActor pid=6079)[0m [18:07:51] task [xgboost.ray]:140615091617360 got new rank 2


Trial name,status,loc,eta,max_depth,subsample,iter,total time (s),train-logloss,train-error
train_func_model_45e5e_00001,RUNNING,127.0.0.1:6051,0.00137726,8,0.847471,,,,
train_func_model_45e5e_00002,RUNNING,127.0.0.1:6087,0.000147236,5,0.778445,,,,
train_func_model_45e5e_00003,PENDING,,0.00256393,8,0.912371,,,,
train_func_model_45e5e_00000,TERMINATED,127.0.0.1:6054,0.000547199,2,0.963656,10.0,3.0082,0.688701,0.040422


Result for train_func_model_45e5e_00001:
  date: 2022-01-04_18-07-52
  done: false
  experiment_id: ed8e630878724539b57f17b80ab4467c
  hostname: Juless-MacBook-Pro-16-inch-2019
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 6051
  time_since_restore: 3.2746198177337646
  time_this_iter_s: 3.2746198177337646
  time_total_s: 3.2746198177337646
  timestamp: 1641348472
  timesteps_since_restore: 0
  train-error: 0.026362
  train-logloss: 0.691921
  training_iteration: 1
  trial_id: 45e5e_00001
  
Result for train_func_model_45e5e_00001:
  date: 2022-01-04_18-07-52
  done: true
  experiment_id: ed8e630878724539b57f17b80ab4467c
  experiment_tag: 1_eta=0.0013773,max_depth=8,subsample=0.84747
  hostname: Juless-MacBook-Pro-16-inch-2019
  iterations_since_restore: 10
  node_ip: 127.0.0.1
  pid: 6051
  time_since_restore: 3.5455288887023926
  time_this_iter_s: 0.007091045379638672
  time_total_s: 3.5455288887023926
  timestamp: 1641348472
  timesteps_since_restore: 0
  train-error: 0.

[2m[36m(ImplicitFunc pid=6051)[0m 2022-01-04 18:07:52,872	INFO main.py:1500 -- [RayXGBoost] Finished XGBoost training on training data with total N=569 in 3.55 seconds (1.40 pure XGBoost training time).
[2m[36m(ImplicitFunc pid=6087)[0m 2022-01-04 18:07:52,839	INFO main.py:976 -- [RayXGBoost] Created 4 new actors (4 total actors). Waiting until actors are ready for training.
[2m[36m(ImplicitFunc pid=6087)[0m 2022-01-04 18:07:55,128	INFO main.py:1021 -- [RayXGBoost] Starting XGBoost training.
[2m[36m(_RemoteRayXGBoostActor pid=6097)[0m [18:07:55] task [xgboost.ray]:140345440075344 got new rank 2
[2m[36m(_RemoteRayXGBoostActor pid=6098)[0m [18:07:55] task [xgboost.ray]:140378599358032 got new rank 3
[2m[36m(_RemoteRayXGBoostActor pid=6096)[0m [18:07:55] task [xgboost.ray]:140514694037072 got new rank 1
[2m[36m(_RemoteRayXGBoostActor pid=6095)[0m [18:07:55] task [xgboost.ray]:140399002287696 got new rank 0
[2m[36m(ImplicitFunc pid=6099)[0m 2022-01-04 18:07:55,417	I

Result for train_func_model_45e5e_00002:
  date: 2022-01-04_18-07-56
  done: false
  experiment_id: a32d6f2a7e2741d994c233379adb3749
  hostname: Juless-MacBook-Pro-16-inch-2019
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 6087
  time_since_restore: 3.7764010429382324
  time_this_iter_s: 3.7764010429382324
  time_total_s: 3.7764010429382324
  timestamp: 1641348476
  timesteps_since_restore: 0
  train-error: 0.035149
  train-logloss: 0.693017
  training_iteration: 1
  trial_id: 45e5e_00002
  


Trial name,status,loc,eta,max_depth,subsample,iter,total time (s),train-logloss,train-error
train_func_model_45e5e_00002,RUNNING,127.0.0.1:6087,0.000147236,5,0.778445,1.0,3.7764,0.693017,0.035149
train_func_model_45e5e_00003,RUNNING,127.0.0.1:6099,0.00256393,8,0.912371,,,,
train_func_model_45e5e_00000,TERMINATED,127.0.0.1:6054,0.000547199,2,0.963656,10.0,3.0082,0.688701,0.040422
train_func_model_45e5e_00001,TERMINATED,127.0.0.1:6051,0.00137726,8,0.847471,10.0,3.54553,0.681226,0.010545


Result for train_func_model_45e5e_00002:
  date: 2022-01-04_18-07-56
  done: true
  experiment_id: a32d6f2a7e2741d994c233379adb3749
  experiment_tag: 2_eta=0.00014724,max_depth=5,subsample=0.77845
  hostname: Juless-MacBook-Pro-16-inch-2019
  iterations_since_restore: 10
  node_ip: 127.0.0.1
  pid: 6087
  time_since_restore: 3.8543009757995605
  time_this_iter_s: 0.006836891174316406
  time_total_s: 3.8543009757995605
  timestamp: 1641348476
  timesteps_since_restore: 0
  train-error: 0.015817
  train-logloss: 0.691873
  training_iteration: 10
  trial_id: 45e5e_00002
  


[2m[36m(ImplicitFunc pid=6087)[0m 2022-01-04 18:07:56,653	INFO main.py:1500 -- [RayXGBoost] Finished XGBoost training on training data with total N=569 in 3.86 seconds (1.52 pure XGBoost training time).
[2m[36m(ImplicitFunc pid=6099)[0m 2022-01-04 18:07:57,621	INFO main.py:1021 -- [RayXGBoost] Starting XGBoost training.
[2m[36m(_RemoteRayXGBoostActor pid=6115)[0m [18:07:57] task [xgboost.ray]:140385987624528 got new rank 1
[2m[36m(_RemoteRayXGBoostActor pid=6116)[0m [18:07:57] task [xgboost.ray]:140240224456272 got new rank 2
[2m[36m(_RemoteRayXGBoostActor pid=6114)[0m [18:07:57] task [xgboost.ray]:140569188630096 got new rank 0
[2m[36m(_RemoteRayXGBoostActor pid=6117)[0m [18:07:57] task [xgboost.ray]:140679649656400 got new rank 3


Result for train_func_model_45e5e_00003:
  date: 2022-01-04_18-07-58
  done: false
  experiment_id: bafc652b8c084873b62336b8f3debbfa
  hostname: Juless-MacBook-Pro-16-inch-2019
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 6099
  time_since_restore: 3.468424081802368
  time_this_iter_s: 3.468424081802368
  time_total_s: 3.468424081802368
  timestamp: 1641348478
  timesteps_since_restore: 0
  train-error: 0.02812
  train-logloss: 0.690879
  training_iteration: 1
  trial_id: 45e5e_00003
  
Result for train_func_model_45e5e_00003:
  date: 2022-01-04_18-07-58
  done: true
  experiment_id: bafc652b8c084873b62336b8f3debbfa
  experiment_tag: 3_eta=0.0025639,max_depth=8,subsample=0.91237
  hostname: Juless-MacBook-Pro-16-inch-2019
  iterations_since_restore: 10
  node_ip: 127.0.0.1
  pid: 6099
  time_since_restore: 3.513869047164917
  time_this_iter_s: 0.0038971900939941406
  time_total_s: 3.513869047164917
  timestamp: 1641348478
  timesteps_since_restore: 0
  train-error: 0.01054

Trial name,status,loc,eta,max_depth,subsample,iter,total time (s),train-logloss,train-error
train_func_model_45e5e_00000,TERMINATED,127.0.0.1:6054,0.000547199,2,0.963656,10,3.0082,0.688701,0.040422
train_func_model_45e5e_00001,TERMINATED,127.0.0.1:6051,0.00137726,8,0.847471,10,3.54553,0.681226,0.010545
train_func_model_45e5e_00002,TERMINATED,127.0.0.1:6087,0.000147236,5,0.778445,10,3.8543,0.691873,0.015817
train_func_model_45e5e_00003,TERMINATED,127.0.0.1:6099,0.00256393,8,0.912371,10,3.51387,0.670945,0.010545


[2m[36m(ImplicitFunc pid=6099)[0m 2022-01-04 18:07:58,889	INFO main.py:1500 -- [RayXGBoost] Finished XGBoost training on training data with total N=569 in 3.51 seconds (1.26 pure XGBoost training time).
2022-01-04 18:07:59,019	INFO tune.py:626 -- Total run time: 13.51 seconds (12.80 seconds for the tuning loop).


In [7]:
print("Best hyperparameters", analysis.best_config)

Best hyperparameters {'tree_method': 'approx', 'objective': 'binary:logistic', 'eval_metric': ['logloss', 'error'], 'eta': 0.0013772574632048907, 'subsample': 0.847471377082333, 'max_depth': 8}


In [8]:
analysis.results_df.head(5)



Unnamed: 0_level_0,train-logloss,train-error,time_this_iter_s,done,timesteps_total,episodes_total,training_iteration,experiment_id,date,timestamp,...,iterations_since_restore,experiment_tag,config.tree_method,config.objective,config.eval_metric,config.eta,config.subsample,config.max_depth,config.nthread,config.n_jobs
trial_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
45e5e_00000,0.688701,0.040422,0.006603,True,,,10,e510848140a8433a89afbdd1e9f83d04,2022-01-04_18-07-50,1641348470,...,10,"0_eta=0.0005472,max_depth=2,subsample=0.96366",approx,binary:logistic,"[logloss, error]",0.000547,0.963656,2,1,1
45e5e_00001,0.681226,0.010545,0.007091,True,,,10,ed8e630878724539b57f17b80ab4467c,2022-01-04_18-07-52,1641348472,...,10,"1_eta=0.0013773,max_depth=8,subsample=0.84747",approx,binary:logistic,"[logloss, error]",0.001377,0.847471,8,1,1
45e5e_00002,0.691873,0.015817,0.006837,True,,,10,a32d6f2a7e2741d994c233379adb3749,2022-01-04_18-07-56,1641348476,...,10,"2_eta=0.00014724,max_depth=5,subsample=0.77845",approx,binary:logistic,"[logloss, error]",0.000147,0.778445,5,1,1
45e5e_00003,0.670945,0.010545,0.003897,True,,,10,bafc652b8c084873b62336b8f3debbfa,2022-01-04_18-07-58,1641348478,...,10,"3_eta=0.0025639,max_depth=8,subsample=0.91237",approx,binary:logistic,"[logloss, error]",0.002564,0.912371,8,1,1


---

## References

 * [Ray Train: Tune: Scalable Hyperparameter Tuning](https://docs.ray.io/en/master/tune/index.html)
 * [Introducing Distributed XGBoost Training with Ray](https://www.anyscale.com/blog/distributed-xgboost-training-with-ray)
 * [How to Speed Up XGBoost Model Training](https://www.anyscale.com/blog/how-to-speed-up-xgboost-model-training)
 * [XGBoost-Ray Project](https://github.com/ray-project/xgboost_ray)
 * [Distributed XGBoost on Ray](Distributed XGBoost on Ray)