## A gentle 10-minute primer to Ray AI Runitime (Ray AIR)

© 2019-2022, Anyscale. All Rights Reserved

📖 [Back to Table of Contents](./ex_00_tutorial_overview.ipynb)<br>
⬅️ [Previous notebook](./ex_07_ray_data.ipynb) <br>

### Overview

As part of Ray 2.0, Ray AI Runtime (AIR) is an open-source and unified toolkit for building end-to-end simple and scalable ML applications. 

Ray AI Runtime focuses on two functional aspects:
 * It provides scalability by leveraging Ray’s distributed compute layer for ML workloads.
 * It is designed to interoperate with other systems for storage and metadata needs.

Ray AIR consists of five key components:

 * Data processing ([Ray Data](https://docs.ray.io/en/latest/data/dataset.html))
 * Model Training ([Ray Train](https://docs.ray.io/en/latest/train/train.html))
 * Hyperparameter Tuning ([Ray Tune](https://docs.ray.io/en/latest/tune/index.html))
 * Model Serving ([Ray Serve](https://docs.ray.io/en/latest/serve/index.html)).
 * Reinforcement Learning ([Ray RLlib](https://docs.ray.io/en/latest/rllib/index.html))
 
 <img src = "images/ray-air.svg" width="60%" height="30%">
 
📖 [Back to Table of Contents](./ex_00_tutorial_overview.ipynb)<br>
⬅️ [Previous notebook](./ex_07_ray_data.ipynb) <br>
 
### Learning objectives:
  * How to use Ray AIR as a unified toolkit to write an end-to-end ML application in a single Python script
  * Use out-of-box Preprocessors
  * Load model from the best model checkpoint and use for batch inference
  * Deploy best checkpoint model and use for online inference

In [1]:
import logging, os, random, warnings
import ray
import pandas as pd

In [2]:
warnings.filterwarnings("ignore")
os.environ["PYTHONWARNINGS"] = "ignore"

In [3]:
if ray.is_initialized:
    ray.shutdown()
ray.init(logging_level=logging.ERROR)

0,1
Python version:,3.8.13
Ray version:,2.0.0rc1
Dashboard:,http://127.0.0.1:8265


### End-to-end ML stages for a Ray AIR ML application

<img src="images/ray_air_pipeline.png" width="50%" height="25%">

### 1. Create Ray data from an S3 CSV datasource

In [4]:
dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv")

# Split data into train and validation.
train_dataset, valid_dataset = dataset.train_test_split(test_size=0.3)
test_dataset = valid_dataset.drop_columns(["target"])

Map_Batches: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 16.24it/s]


### 2. Use out-of-the-box Preprocessors
This preprocessor is automatically used in the training function to `fit` and `tranform` your datasets for training and validation. You don't have to explicitly call the preprocess before training or inference. Ray AIR toolkit automatically does that for you. 

We are going to scaler a few features like `mean radius` and `mean texture`.

In [5]:
from ray.data.preprocessors import StandardScaler

# Create a preprocessor to scale some columns
columns_to_scale = ["mean radius", "mean texture"]
preprocessor = StandardScaler(columns=columns_to_scale)

### 3a. Use AIR Trainers for supported ML frameworks
Use the Ray AIR trainer `XGBoostTrainer` with simple steps:
 1. define the parallelism for Ray compute
 2. define the XGBoost parameters for training
 3. supply the preprocessor for fitting and transforming dataset during training and validation
 4. provide the datasets for training and validation
 5. invoke `trainer.fit()` 
 
 Simple API that does a lot behind the scenes for you!

In [6]:
from ray.air.config import ScalingConfig
from ray.train.xgboost import XGBoostTrainer

trainer = XGBoostTrainer(
    scaling_config=ScalingConfig(
        # Number of workers to use for data parallelism.
        num_workers=2,
        # Whether to use GPU acceleration.
        use_gpu=False),
    label_column="target",
    num_boost_round=20,
    params={
        # XGBoost specific params
        "objective": "binary:logistic",
        "eval_metric": ["logloss", "error"],
    },
    # our train and validation dataset and preprocessor
    datasets={"train": train_dataset, "valid": valid_dataset},
    preprocessor=preprocessor,
)

##### Fit the trainer

In [7]:
result = trainer.fit()
# print(result.metrics)

Trial name,status,loc,iter,total time (s),train-logloss,train-error,valid-logloss
XGBoostTrainer_0fee7_00000,TERMINATED,127.0.0.1:10310,21,4.73005,0.0184957,0,0.0893879


[2m[36m(_RemoteRayXGBoostActor pid=10324)[0m [08:29:37] task [xgboost.ray]:5646204256 got new rank 0
[2m[36m(_RemoteRayXGBoostActor pid=10325)[0m [08:29:37] task [xgboost.ray]:4942888432 got new rank 1


Result for XGBoostTrainer_0fee7_00000:
  date: 2022-08-15_08-29-38
  done: false
  experiment_id: 868e1dcb2c944665a36be725ef74103e
  hostname: Juless-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 10310
  time_since_restore: 4.098128080368042
  time_this_iter_s: 4.098128080368042
  time_total_s: 4.098128080368042
  timestamp: 1660577378
  timesteps_since_restore: 0
  train-error: 0.02261306532663317
  train-logloss: 0.464117960489575
  training_iteration: 1
  trial_id: 0fee7_00000
  valid-error: 0.11695906432748537
  valid-logloss: 0.5025240946234318
  warmup_time: 0.002640247344970703
  
Result for XGBoostTrainer_0fee7_00000:
  date: 2022-08-15_08-29-39
  done: true
  experiment_id: 868e1dcb2c944665a36be725ef74103e
  experiment_tag: '0'
  hostname: Juless-MacBook-Pro-16
  iterations_since_restore: 21
  node_ip: 127.0.0.1
  pid: 10310
  time_since_restore: 4.730054140090942
  time_this_iter_s: 0.44510507583618164
  time_total_s: 4.730054140090942
  timestamp: 

### 3b. Use AIR Tuner for hyperparameter search

What if you want to do hyperparameter optimization during training and use the best config for the model? Well, you can then use Tuner and supply your training function, Trainer, as part of the argument, along 
with other Tuner configuration. 

Again, simple steps:
 1. define your hyperparameter space
 2. define `TuneConfig` for number of trials and parallelism 
 3. invoke `tuner.fit()`

In [8]:
from ray import tune

param_space = {"params": {"max_depth": tune.randint(1, 9)}}
metric = "train-logloss"
our_mode="min"

In [9]:
from ray.tune.tuner import Tuner, TuneConfig
from ray.air.config import RunConfig

tuner = Tuner(
    trainer,
    param_space=param_space,
    tune_config=TuneConfig(num_samples=5, metric=metric, mode=our_mode),
)
# Execute tuning.
result_grid = tuner.fit()

Trial name,status,loc,params/max_depth,iter,total time (s),train-logloss,train-error,valid-logloss
XGBoostTrainer_1ab5c_00000,TERMINATED,127.0.0.1:10397,5,21,3.86268,0.0184163,0.0,0.105782
XGBoostTrainer_1ab5c_00001,TERMINATED,127.0.0.1:10404,1,21,4.93523,0.0955215,0.0175879,0.112007
XGBoostTrainer_1ab5c_00002,TERMINATED,127.0.0.1:10405,5,21,4.95963,0.0184163,0.0,0.105782
XGBoostTrainer_1ab5c_00003,TERMINATED,127.0.0.1:10478,2,21,4.7909,0.0405455,0.00502513,0.0916641
XGBoostTrainer_1ab5c_00004,TERMINATED,127.0.0.1:10489,1,21,4.09457,0.0955215,0.0175879,0.112007


[2m[36m(_RemoteRayXGBoostActor pid=10411)[0m [08:29:53] task [xgboost.ray]:5042617888 got new rank 1
[2m[36m(_RemoteRayXGBoostActor pid=10410)[0m [08:29:53] task [xgboost.ray]:4959403600 got new rank 0


Result for XGBoostTrainer_1ab5c_00000:
  date: 2022-08-15_08-29-55
  done: false
  experiment_id: 9647be2f364445fd842768848508e290
  hostname: Juless-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 10397
  time_since_restore: 3.4272069931030273
  time_this_iter_s: 3.4272069931030273
  time_total_s: 3.4272069931030273
  timestamp: 1660577395
  timesteps_since_restore: 0
  train-error: 0.02261306532663317
  train-logloss: 0.465611254524945
  training_iteration: 1
  trial_id: 1ab5c_00000
  valid-error: 0.0935672514619883
  valid-logloss: 0.5058815336366843
  warmup_time: 0.002613067626953125
  


[2m[36m(_RemoteRayXGBoostActor pid=10426)[0m [08:29:56] task [xgboost.ray]:5014191648 got new rank 0
[2m[36m(_RemoteRayXGBoostActor pid=10427)[0m [08:29:56] task [xgboost.ray]:5126897328 got new rank 1
[2m[36m(_RemoteRayXGBoostActor pid=10436)[0m [08:29:56] task [xgboost.ray]:4831145696 got new rank 0
[2m[36m(_RemoteRayXGBoostActor pid=10437)[0m [08:29:56] task [xgboost.ray]:5008211392 got new rank 1


Result for XGBoostTrainer_1ab5c_00000:
  date: 2022-08-15_08-29-56
  done: true
  experiment_id: 9647be2f364445fd842768848508e290
  experiment_tag: 0_max_depth=5
  hostname: Juless-MacBook-Pro-16
  iterations_since_restore: 21
  node_ip: 127.0.0.1
  pid: 10397
  time_since_restore: 3.8626770973205566
  time_this_iter_s: 0.37877702713012695
  time_total_s: 3.8626770973205566
  timestamp: 1660577396
  timesteps_since_restore: 0
  train-error: 0.0
  train-logloss: 0.01841634292981527
  training_iteration: 21
  trial_id: 1ab5c_00000
  valid-error: 0.05263157894736842
  valid-logloss: 0.10578184703239703
  warmup_time: 0.002613067626953125
  
Result for XGBoostTrainer_1ab5c_00001:
  date: 2022-08-15_08-29-57
  done: false
  experiment_id: af2616905a9f46ff800c7ddf3a92af48
  hostname: Juless-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 10404
  time_since_restore: 3.972064971923828
  time_this_iter_s: 3.972064971923828
  time_total_s: 3.972064971923828
  timestamp: 

[2m[36m(_RemoteRayXGBoostActor pid=10496)[0m [08:30:00] task [xgboost.ray]:4992777808 got new rank 0
[2m[36m(_RemoteRayXGBoostActor pid=10497)[0m [08:30:00] task [xgboost.ray]:4879056368 got new rank 1


Result for XGBoostTrainer_1ab5c_00003:
  date: 2022-08-15_08-30-02
  done: false
  experiment_id: 745cf592f8954b32843a29e2b68703b9
  hostname: Juless-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 10478
  time_since_restore: 4.089522838592529
  time_this_iter_s: 4.089522838592529
  time_total_s: 4.089522838592529
  timestamp: 1660577402
  timesteps_since_restore: 0
  train-error: 0.04773869346733668
  train-logloss: 0.4862994935794092
  training_iteration: 1
  trial_id: 1ab5c_00003
  valid-error: 0.09941520467836257
  valid-logloss: 0.5120853461020174
  warmup_time: 0.002424001693725586
  


[2m[36m(_RemoteRayXGBoostActor pid=10516)[0m [08:30:02] task [xgboost.ray]:5790416224 got new rank 0
[2m[36m(_RemoteRayXGBoostActor pid=10517)[0m [08:30:02] task [xgboost.ray]:4939218464 got new rank 1


Result for XGBoostTrainer_1ab5c_00003:
  date: 2022-08-15_08-30-03
  done: true
  experiment_id: 745cf592f8954b32843a29e2b68703b9
  experiment_tag: 3_max_depth=2
  hostname: Juless-MacBook-Pro-16
  iterations_since_restore: 21
  node_ip: 127.0.0.1
  pid: 10478
  time_since_restore: 4.790897846221924
  time_this_iter_s: 0.6244070529937744
  time_total_s: 4.790897846221924
  timestamp: 1660577403
  timesteps_since_restore: 0
  train-error: 0.0050251256281407
  train-logloss: 0.04054545671047278
  training_iteration: 21
  trial_id: 1ab5c_00003
  valid-error: 0.02923976608187134
  valid-logloss: 0.09166410522894901
  warmup_time: 0.002424001693725586
  
Result for XGBoostTrainer_1ab5c_00004:
  date: 2022-08-15_08-30-04
  done: false
  experiment_id: 48ce752284124b9c868fdbdb0fd997a1
  hostname: Juless-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 10489
  time_since_restore: 3.5095250606536865
  time_this_iter_s: 3.5095250606536865
  time_total_s: 3.509525060653686

In [10]:
# Fetch the best result with its best hyperparameter config 
best_result = result_grid.get_best_result()
print("Best Result:", best_result)

Best Result: Result(metrics={'train-logloss': 0.01841634292981527, 'train-error': 0.0, 'valid-logloss': 0.10578184703239703, 'valid-error': 0.05263157894736842, 'done': True, 'trial_id': '1ab5c_00000', 'experiment_tag': '0_max_depth=5'}, error=None, log_dir=PosixPath('/Users/jules/ray_results/XGBoostTrainer_2022-08-15_08-29-51/XGBoostTrainer_1ab5c_00000_0_max_depth=5_2022-08-15_08-29-51'))


### Ray AIR Checkpoints

The AIR trainers, tuners, and custom pretrained model generate Checkpoints. An AIR Checkpoint is a format for models that are used across different components of the Ray AI Runtime. This common format allows easy interoperability among AIR components and seamless integration with external supported machine learning frameworks. Read more
about [Checkpoints]().

<img src="images/checkpoints.jpeg" height="25%" and width="50%"> 

### 4. Use AIR `BatchPreditor` for batch prediction
Once you have trained and tuned your model, create a batch predictor from best model using the `best_result.checkpoint` and do batch inference. 

In [11]:
from ray.train.batch_predictor import BatchPredictor
from ray.train.xgboost import XGBoostPredictor

batch_predictor = BatchPredictor.from_checkpoint(best_result.checkpoint, XGBoostPredictor)

predicted_probabilities = batch_predictor.predict(test_dataset)
print("PREDICTED PROBABILITIES")
predicted_probabilities.show()

Map Progress (1 actors 0 pending): 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.33it/s]

PREDICTED PROBABILITIES
{'predictions': 0.9960426092147827}
{'predictions': 0.9957077503204346}
{'predictions': 0.0034389763604849577}
{'predictions': 0.9962536096572876}
{'predictions': 0.9968380928039551}
{'predictions': 0.9957551956176758}
{'predictions': 0.9920042157173157}
{'predictions': 0.994161069393158}
{'predictions': 0.2891101539134979}
{'predictions': 0.974367082118988}
{'predictions': 0.0034389763604849577}
{'predictions': 0.9959942102432251}
{'predictions': 0.9474029541015625}
{'predictions': 0.9923243522644043}
{'predictions': 0.9941523671150208}
{'predictions': 0.1239369809627533}
{'predictions': 0.5043733716011047}
{'predictions': 0.9935414791107178}
{'predictions': 0.9832899570465088}
{'predictions': 0.0034389763604849577}





### 5. Use `PredictorDeployment` for online inference

Deploy the best model as an inference service by using Ray Serve and the `PredictorDeployment` class.

In [12]:
from ray import serve
from fastapi import Request
from ray.serve import PredictorDeployment
from ray.serve.http_adapters import pandas_read_json

serve.run(
    PredictorDeployment.options(name="XGBoostService", num_replicas=2, route_prefix="/rayair").bind(
        XGBoostPredictor, result.checkpoint, http_adapter=pandas_read_json
    )
)

[2m[36m(ServeController pid=10541)[0m INFO 2022-08-15 08:30:16,853 controller 10541 http_state.py:129 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-1a32092f2f3c58d6d028787fa69e0fe7901b1a4644dc21ea62022c71' on node '1a32092f2f3c58d6d028787fa69e0fe7901b1a4644dc21ea62022c71' listening on '127.0.0.1:8000'
[2m[36m(ServeController pid=10541)[0m INFO 2022-08-15 08:30:17,471 controller 10541 deployment_state.py:1232 - Adding 2 replicas to deployment 'XGBoostService'.
[2m[36m(HTTPProxyActor pid=10543)[0m INFO:     Started server process [10543]


RayServeSyncHandle(deployment='XGBoostService')

After deploying the service, you can send requests to it.

In [13]:
import requests

sample_input = test_dataset.take(1)
sample_input = dict(sample_input[0])

output = requests.post("http://localhost:8000/rayair", json=[sample_input]).json()
print(output)

[{'predictions': 0.9964648485183716}]


[2m[36m(HTTPProxyActor pid=10543)[0m INFO 2022-08-15 08:30:31,375 http_proxy 127.0.0.1 http_proxy.py:315 - POST /rayair 307 3.8ms
[2m[36m(ServeReplica:XGBoostService pid=10547)[0m INFO 2022-08-15 08:30:31,374 XGBoostService XGBoostService#iTcplL replica.py:482 - HANDLE __call__ OK 0.4ms
[2m[36m(HTTPProxyActor pid=10543)[0m INFO 2022-08-15 08:30:31,396 http_proxy 127.0.0.1 http_proxy.py:315 - POST /rayair 200 18.4ms
[2m[36m(ServeReplica:XGBoostService pid=10546)[0m INFO 2022-08-15 08:30:31,395 XGBoostService XGBoostService#omrqNS replica.py:482 - HANDLE __call__ OK 15.3ms


In [14]:
ray.shutdown()

### Homework

1. Have a go at Ray AIR examples in the documentation.

 📖 [Back to Table of Contents](./ex_00_tutorial_overview.ipynb)<br>
⬅️ [Previous notebook](./ex_07_ray_data.ipynb) <br>

Done! 🍻
 