## A gentle 10-minute introduction to Ray AI Runitime (Ray AIR)

As part of Ray 2.0, Ray AI Runtime (AIR) is an open-source toolkit for building end-to-end ML applications. By leveraging Ray, its distributed compute capabilities, and its library ecosystem, Ray AIR brings scalability and programmability to ML platforms.

The main focuses of the Ray AI Runtime:
 * Ray AIR focuses on providing scalability by leveraging Ray’s distributed compute layer for ML workloads.
 * It is designed to interoperate with other systems for storage and metadata needs.

Ray AIR consists of 5 key components:

 * Data processing (Ray Data)
 * Model Training (Ray Train)
 * Reinforcement Learning (Ray RLlib)
 * Hyperparameter Tuning (Ray Tune)
 * Model Serving (Ray Serve).
 
 <img src = "images/ai_runtime.jpeg" width="60%" height="30%">
 
 ### Learning objectives:
  * Get a quick and introductory feel for Ray AIR as unified toolkit to write and end-to-end ML application in a single Python script or notebook
  * Get an exposure to Ray data for data ingestion
  * Learn about out-of-box familiar Preprocessors
  * Load model from checkpoint, use batch inference
  * Use `PredictorDeployment` class to deploy model and use online inference

In [2]:
import logging, os, random, warnings
from pprint import pprint
import ray
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

from ray.data.preprocessors import StandardScaler

In [3]:
warnings.filterwarnings("ignore")
os.environ["PYTHONWARNINGS"] = "ignore"

In [4]:
if ray.is_initialized:
    ray.shutdown()
context = ray.init(logging_level=logging.ERROR)
pprint(context)

RayContext(dashboard_url='127.0.0.1:8268', python_version='3.8.13', ray_version='3.0.0.dev0', ray_commit='{{RAY_COMMIT_SHA}}', address_info={'node_ip_address': '127.0.0.1', 'raylet_ip_address': '127.0.0.1', 'redis_address': None, 'object_store_address': '/tmp/ray/session_2022-07-18_21-02-48_546719_70340/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2022-07-18_21-02-48_546719_70340/sockets/raylet', 'webui_url': '127.0.0.1:8268', 'session_dir': '/tmp/ray/session_2022-07-18_21-02-48_546719_70340', 'metrics_export_port': 65530, 'gcs_address': '127.0.0.1:65503', 'address': '127.0.0.1:65503', 'dashboard_agent_listen_port': 52365, 'node_id': '4a225d65b457a9b0566e99c1b6d70ca6e0f6f369fb83c01166d68deb'})


In [5]:
print(f"Dashboard url: http://{context.address_info['webui_url']}")

Dashboard url: http://127.0.0.1:8268


### Create Ray data from Pandas dataset

In [6]:
import ray
import pandas as pd
from ray.air import train_test_split

# Split data into train and validation.
dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv")
train_dataset, valid_dataset = train_test_split(dataset, test_size=0.3)
test_dataset = valid_dataset.drop_columns(["target"])

Map_Batches: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 74.72it/s]


### Create Preprocessors

In [7]:
# Create a preprocessor to scale some columns
from ray.data.preprocessors import StandardScaler

columns_to_scale = ["mean radius", "mean texture"]
preprocessor = StandardScaler(columns=columns_to_scale)

### Create Trainers

In [8]:
from ray.train.xgboost import XGBoostTrainer

trainer = XGBoostTrainer(
    scaling_config={
        # Number of workers to use for data parallelism.
        "num_workers": 2,
        # Whether to use GPU acceleration.
        "use_gpu": False,
    },
    label_column="target",
    num_boost_round=20,
    params={
        # XGBoost specific params
        "objective": "binary:logistic",
        "eval_metric": ["logloss", "error"],
    },
    datasets={"train": train_dataset, "valid": valid_dataset},
    preprocessor=preprocessor,
)
result = trainer.fit()
print(result.metrics)

Trial name,status,loc,iter,total time (s),train-logloss,train-error,valid-logloss
XGBoostTrainer_de37e_00000,TERMINATED,127.0.0.1:70815,20,4.78056,0.0184957,0,0.0893879


[2m[36m(XGBoostTrainer pid=70815)[0m 2022-07-18 21:04:24,434	INFO main.py:980 -- [RayXGBoost] Created 2 new actors (2 total actors). Waiting until actors are ready for training.
[2m[36m(_RemoteRayXGBoostActor pid=70829)[0m   File "/Users/jules/git-repos/ray/python/ray/_private/workers/default_worker.py", line 237, in <module>
[2m[36m(_RemoteRayXGBoostActor pid=70829)[0m     ray._private.worker.global_worker.main_loop()
[2m[36m(_RemoteRayXGBoostActor pid=70829)[0m   File "/Users/jules/git-repos/ray/python/ray/_private/worker.py", line 754, in main_loop
[2m[36m(_RemoteRayXGBoostActor pid=70829)[0m     self.core_worker.run_task_loop()
[2m[36m(_RemoteRayXGBoostActor pid=70829)[0m   File "/Users/jules/git-repos/ray/python/ray/_private/function_manager.py", line 674, in actor_method_executor
[2m[36m(_RemoteRayXGBoostActor pid=70829)[0m     return method(__ray_actor, *args, **kwargs)
[2m[36m(_RemoteRayXGBoostActor pid=70829)[0m   File "/Users/jules/git-repos/ray/python

Result for XGBoostTrainer_de37e_00000:
  date: 2022-07-18_21-04-27
  done: false
  experiment_id: fa960636b34145a689be5de2568fa53a
  hostname: Juless-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 70815
  should_checkpoint: true
  time_since_restore: 4.643670082092285
  time_this_iter_s: 4.643670082092285
  time_total_s: 4.643670082092285
  timestamp: 1658203467
  timesteps_since_restore: 0
  train-error: 0.02261306532663317
  train-logloss: 0.464117960489575
  training_iteration: 1
  trial_id: de37e_00000
  valid-error: 0.11695906432748537
  valid-logloss: 0.5025240946234318
  warmup_time: 0.0028710365295410156
  


[2m[36m(XGBoostTrainer pid=70815)[0m 2022-07-18 21:04:27,576	INFO main.py:1516 -- [RayXGBoost] Finished XGBoost training on training data with total N=398 in 3.15 seconds (1.91 pure XGBoost training time).


Result for XGBoostTrainer_de37e_00000:
  date: 2022-07-18_21-04-27
  done: true
  experiment_id: fa960636b34145a689be5de2568fa53a
  experiment_tag: '0'
  hostname: Juless-MacBook-Pro-16
  iterations_since_restore: 20
  node_ip: 127.0.0.1
  pid: 70815
  should_checkpoint: true
  time_since_restore: 4.780561208724976
  time_this_iter_s: 0.006635904312133789
  time_total_s: 4.780561208724976
  timestamp: 1658203467
  timesteps_since_restore: 0
  train-error: 0.0
  train-logloss: 0.01849572773292735
  training_iteration: 20
  trial_id: de37e_00000
  valid-error: 0.04093567251461988
  valid-logloss: 0.08938791319913073
  warmup_time: 0.0028710365295410156
  
{'train-logloss': 0.01849572773292735, 'train-error': 0.0, 'valid-logloss': 0.08938791319913073, 'valid-error': 0.04093567251461988, 'time_this_iter_s': 0.006635904312133789, 'should_checkpoint': True, 'done': True, 'timesteps_total': None, 'episodes_total': None, 'training_iteration': 20, 'trial_id': 'de37e_00000', 'experiment_id': 'fa

### Create Tuner for hyperparameter search

In [9]:
from ray import tune

param_space = {"params": {"max_depth": tune.randint(1, 9)}}
metric = "train-logloss"

In [10]:
from ray.tune.tuner import Tuner, TuneConfig
from ray.air.config import RunConfig

tuner = Tuner(
    trainer,
    param_space=param_space,
    tune_config=TuneConfig(num_samples=5, metric=metric, mode="min"),
)
# Execute tuning.
result_grid = tuner.fit()

# Fetch the best result.
best_result = result_grid.get_best_result()
print("Best Result:", best_result)

Trial name,status,loc,params/max_depth,iter,total time (s),train-logloss,train-error,valid-logloss
XGBoostTrainer_ea7f4_00000,TERMINATED,127.0.0.1:70856,3,20,2.70495,0.0215151,0.0,0.0765915
XGBoostTrainer_ea7f4_00001,TERMINATED,127.0.0.1:70865,5,20,3.58326,0.0184163,0.0,0.105782
XGBoostTrainer_ea7f4_00002,TERMINATED,127.0.0.1:70866,3,20,3.5276,0.0215151,0.0,0.0765915
XGBoostTrainer_ea7f4_00003,TERMINATED,127.0.0.1:70942,1,20,5.27101,0.0955215,0.0175879,0.112007
XGBoostTrainer_ea7f4_00004,TERMINATED,127.0.0.1:70953,6,20,4.48254,0.0184957,0.0,0.0893879


[2m[36m(XGBoostTrainer pid=70856)[0m 2022-07-18 21:04:43,152	INFO main.py:980 -- [RayXGBoost] Created 2 new actors (2 total actors). Waiting until actors are ready for training.
[2m[36m(_RemoteRayXGBoostActor pid=70871)[0m   File "/Users/jules/git-repos/ray/python/ray/_private/workers/default_worker.py", line 237, in <module>
[2m[36m(_RemoteRayXGBoostActor pid=70871)[0m     ray._private.worker.global_worker.main_loop()
[2m[36m(_RemoteRayXGBoostActor pid=70871)[0m   File "/Users/jules/git-repos/ray/python/ray/_private/worker.py", line 754, in main_loop
[2m[36m(_RemoteRayXGBoostActor pid=70871)[0m     self.core_worker.run_task_loop()
[2m[36m(_RemoteRayXGBoostActor pid=70871)[0m   File "/Users/jules/git-repos/ray/python/ray/_private/function_manager.py", line 674, in actor_method_executor
[2m[36m(_RemoteRayXGBoostActor pid=70871)[0m     return method(__ray_actor, *args, **kwargs)
[2m[36m(_RemoteRayXGBoostActor pid=70871)[0m   File "/Users/jules/git-repos/ray/python

Result for XGBoostTrainer_ea7f4_00000:
  date: 2022-07-18_21-04-45
  done: false
  experiment_id: db2f8c822d7441789c3d9b68ec2b0d2f
  hostname: Juless-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 70856
  should_checkpoint: true
  time_since_restore: 2.301074981689453
  time_this_iter_s: 2.301074981689453
  time_total_s: 2.301074981689453
  timestamp: 1658203485
  timesteps_since_restore: 0
  train-error: 0.03517587939698492
  train-logloss: 0.47431553248784053
  training_iteration: 1
  trial_id: ea7f4_00000
  valid-error: 0.09941520467836257
  valid-logloss: 0.5004687657830311
  warmup_time: 0.0028541088104248047
  


[2m[36m(_RemoteRayXGBoostActor pid=70910)[0m   File "/Users/jules/git-repos/ray/python/ray/_private/workers/default_worker.py", line 237, in <module>
[2m[36m(_RemoteRayXGBoostActor pid=70910)[0m     ray._private.worker.global_worker.main_loop()
[2m[36m(_RemoteRayXGBoostActor pid=70910)[0m   File "/Users/jules/git-repos/ray/python/ray/_private/worker.py", line 754, in main_loop
[2m[36m(_RemoteRayXGBoostActor pid=70910)[0m     self.core_worker.run_task_loop()
[2m[36m(_RemoteRayXGBoostActor pid=70910)[0m   File "/Users/jules/git-repos/ray/python/ray/_private/function_manager.py", line 674, in actor_method_executor
[2m[36m(_RemoteRayXGBoostActor pid=70910)[0m     return method(__ray_actor, *args, **kwargs)
[2m[36m(_RemoteRayXGBoostActor pid=70910)[0m   File "/Users/jules/git-repos/ray/python/ray/util/tracing/tracing_helper.py", line 466, in _resume_span
[2m[36m(_RemoteRayXGBoostActor pid=70910)[0m     return method(self, *_args, **_kwargs)
[2m[36m(_RemoteRayXGBoos

Result for XGBoostTrainer_ea7f4_00000:
  date: 2022-07-18_21-04-45
  done: true
  experiment_id: db2f8c822d7441789c3d9b68ec2b0d2f
  experiment_tag: 0_max_depth=3
  hostname: Juless-MacBook-Pro-16
  iterations_since_restore: 20
  node_ip: 127.0.0.1
  pid: 70856
  should_checkpoint: true
  time_since_restore: 2.7049450874328613
  time_this_iter_s: 0.008007049560546875
  time_total_s: 2.7049450874328613
  timestamp: 1658203485
  timesteps_since_restore: 0
  train-error: 0.0
  train-logloss: 0.02151511543566108
  training_iteration: 20
  trial_id: ea7f4_00000
  valid-error: 0.03508771929824561
  valid-logloss: 0.07659151291540056
  warmup_time: 0.0028541088104248047
  
Result for XGBoostTrainer_ea7f4_00001:
  date: 2022-07-18_21-04-47
  done: false
  experiment_id: 5f8025068bcb4ed4ac66cbc01b1c4265
  hostname: Juless-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 70865
  should_checkpoint: true
  time_since_restore: 3.318230152130127
  time_this_iter_s: 3.318230152

[2m[36m(XGBoostTrainer pid=70865)[0m 2022-07-18 21:04:47,906	INFO main.py:1516 -- [RayXGBoost] Finished XGBoost training on training data with total N=398 in 3.52 seconds (1.29 pure XGBoost training time).
[2m[36m(XGBoostTrainer pid=70866)[0m 2022-07-18 21:04:47,851	INFO main.py:1516 -- [RayXGBoost] Finished XGBoost training on training data with total N=398 in 3.46 seconds (1.24 pure XGBoost training time).


Result for XGBoostTrainer_ea7f4_00001:
  date: 2022-07-18_21-04-47
  done: true
  experiment_id: 5f8025068bcb4ed4ac66cbc01b1c4265
  experiment_tag: 1_max_depth=5
  hostname: Juless-MacBook-Pro-16
  iterations_since_restore: 20
  node_ip: 127.0.0.1
  pid: 70865
  should_checkpoint: true
  time_since_restore: 3.583259105682373
  time_this_iter_s: 0.06814098358154297
  time_total_s: 3.583259105682373
  timestamp: 1658203487
  timesteps_since_restore: 0
  train-error: 0.0
  train-logloss: 0.01841634292981527
  training_iteration: 20
  trial_id: ea7f4_00001
  valid-error: 0.05263157894736842
  valid-logloss: 0.10578184703239703
  warmup_time: 0.0031299591064453125
  
Result for XGBoostTrainer_ea7f4_00002:
  date: 2022-07-18_21-04-47
  done: true
  experiment_id: 45c11517546e48fba5953b3460d0119a
  experiment_tag: 2_max_depth=3
  hostname: Juless-MacBook-Pro-16
  iterations_since_restore: 20
  node_ip: 127.0.0.1
  pid: 70866
  should_checkpoint: true
  time_since_restore: 3.527599811553955
  

[2m[36m(XGBoostTrainer pid=70942)[0m 2022-07-18 21:04:50,605	INFO main.py:980 -- [RayXGBoost] Created 2 new actors (2 total actors). Waiting until actors are ready for training.
[2m[36m(XGBoostTrainer pid=70953)[0m 2022-07-18 21:04:50,758	INFO main.py:980 -- [RayXGBoost] Created 2 new actors (2 total actors). Waiting until actors are ready for training.
[2m[36m(_RemoteRayXGBoostActor pid=70976)[0m   File "/Users/jules/git-repos/ray/python/ray/_private/workers/default_worker.py", line 237, in <module>
[2m[36m(_RemoteRayXGBoostActor pid=70976)[0m     ray._private.worker.global_worker.main_loop()
[2m[36m(_RemoteRayXGBoostActor pid=70976)[0m   File "/Users/jules/git-repos/ray/python/ray/_private/worker.py", line 754, in main_loop
[2m[36m(_RemoteRayXGBoostActor pid=70976)[0m     self.core_worker.run_task_loop()
[2m[36m(_RemoteRayXGBoostActor pid=70976)[0m   File "/Users/jules/git-repos/ray/python/ray/_private/function_manager.py", line 674, in actor_method_executor
[2m

Result for XGBoostTrainer_ea7f4_00003:
  date: 2022-07-18_21-04-53
  done: false
  experiment_id: 19226b5824c84cd39c0ee7b613d47917
  hostname: Juless-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 70942
  should_checkpoint: true
  time_since_restore: 4.885594844818115
  time_this_iter_s: 4.885594844818115
  time_total_s: 4.885594844818115
  timestamp: 1658203493
  timesteps_since_restore: 0
  train-error: 0.07537688442211055
  train-logloss: 0.5118698591562971
  training_iteration: 1
  trial_id: ea7f4_00003
  valid-error: 0.0935672514619883
  valid-logloss: 0.5195214661241275
  warmup_time: 0.0029630661010742188
  
Result for XGBoostTrainer_ea7f4_00004:
  date: 2022-07-18_21-04-53
  done: false
  experiment_id: 3026cf2a23684e0183bbced308281d66
  hostname: Juless-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 70953
  should_checkpoint: true
  time_since_restore: 4.043035984039307
  time_this_iter_s: 4.043035984039307
  time_total_s: 4.

[2m[36m(XGBoostTrainer pid=70942)[0m 2022-07-18 21:04:54,161	INFO main.py:1516 -- [RayXGBoost] Finished XGBoost training on training data with total N=398 in 3.57 seconds (1.41 pure XGBoost training time).


Result for XGBoostTrainer_ea7f4_00003:
  date: 2022-07-18_21-04-54
  done: true
  experiment_id: 19226b5824c84cd39c0ee7b613d47917
  experiment_tag: 3_max_depth=1
  hostname: Juless-MacBook-Pro-16
  iterations_since_restore: 20
  node_ip: 127.0.0.1
  pid: 70942
  should_checkpoint: true
  time_since_restore: 5.2710089683532715
  time_this_iter_s: 0.007917165756225586
  time_total_s: 5.2710089683532715
  timestamp: 1658203494
  timesteps_since_restore: 0
  train-error: 0.01758793969849246
  train-logloss: 0.09552153718105551
  training_iteration: 20
  trial_id: ea7f4_00003
  valid-error: 0.02923976608187134
  valid-logloss: 0.11200698223564098
  warmup_time: 0.0029630661010742188
  
Result for XGBoostTrainer_ea7f4_00004:
  date: 2022-07-18_21-04-54
  done: true
  experiment_id: 3026cf2a23684e0183bbced308281d66
  experiment_tag: 4_max_depth=6
  hostname: Juless-MacBook-Pro-16
  iterations_since_restore: 20
  node_ip: 127.0.0.1
  pid: 70953
  should_checkpoint: true
  time_since_restore: 4

[2m[36m(XGBoostTrainer pid=70953)[0m 2022-07-18 21:04:54,365	INFO main.py:1516 -- [RayXGBoost] Finished XGBoost training on training data with total N=398 in 3.62 seconds (1.46 pure XGBoost training time).


Best Result: Result(metrics={'train-logloss': 0.01841634292981527, 'train-error': 0.0, 'valid-logloss': 0.10578184703239703, 'valid-error': 0.05263157894736842, 'time_this_iter_s': 0.06814098358154297, 'should_checkpoint': True, 'done': True, 'timesteps_total': None, 'episodes_total': None, 'training_iteration': 20, 'trial_id': 'ea7f4_00001', 'experiment_id': '5f8025068bcb4ed4ac66cbc01b1c4265', 'date': '2022-07-18_21-04-47', 'timestamp': 1658203487, 'time_total_s': 3.583259105682373, 'pid': 70865, 'hostname': 'Juless-MacBook-Pro-16', 'node_ip': '127.0.0.1', 'config': {'params': {'max_depth': 5}}, 'time_since_restore': 3.583259105682373, 'timesteps_since_restore': 0, 'iterations_since_restore': 20, 'warmup_time': 0.0031299591064453125, 'experiment_tag': '1_max_depth=5'}, checkpoint=<ray.air.checkpoint.Checkpoint object at 0x15eb14bb0>, error=None, log_dir=PosixPath('/Users/jules/ray_results/XGBoostTrainer_2022-07-18_21-04-41/XGBoostTrainer_ea7f4_00001_1_max_depth=5_2022-07-18_21-04-43')

### Create a `BatchPreditor` for batch prediction
Load the model from the checkpoint

In [11]:
from ray.train.batch_predictor import BatchPredictor
from ray.train.xgboost import XGBoostPredictor

batch_predictor = BatchPredictor.from_checkpoint(result.checkpoint, XGBoostPredictor)

predicted_probabilities = batch_predictor.predict(test_dataset)
print("PREDICTED PROBABILITIES")
predicted_probabilities.show()

Map Progress (1 actors 1 pending): 100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.29it/s]

PREDICTED PROBABILITIES
{'predictions': 0.9964648485183716}
{'predictions': 0.9951295852661133}
{'predictions': 0.0037899704184383154}
{'predictions': 0.9964648485183716}
{'predictions': 0.9969868063926697}
{'predictions': 0.9947494864463806}
{'predictions': 0.9899886250495911}
{'predictions': 0.9952162504196167}
{'predictions': 0.3375702202320099}
{'predictions': 0.9766711592674255}
{'predictions': 0.0037899704184383154}
{'predictions': 0.9948934316635132}
{'predictions': 0.9472665786743164}
{'predictions': 0.989780068397522}
{'predictions': 0.9952002763748169}
{'predictions': 0.18953870236873627}
{'predictions': 0.2149435132741928}
{'predictions': 0.99428790807724}
{'predictions': 0.9890844225883484}
{'predictions': 0.0037899704184383154}





### Create `PredictorDeployment` for Online Inference
Deploy the model as an inference service by using Ray Serve and the `PredictorDeployment` class.

In [12]:
from ray import serve
from fastapi import Request
from ray.serve import PredictorDeployment
from ray.serve.http_adapters import json_request


async def adapter(request: Request):
    content = await request.json()
    print(content)
    return pd.DataFrame.from_dict(content)


serve.start(detached=True)
deployment = PredictorDeployment.options(name="XGBoostService")

deployment.deploy(
    XGBoostPredictor, result.checkpoint, batching_params=False, http_adapter=adapter
)

print(deployment.url)

[2m[36m(ServeController pid=71210)[0m INFO 2022-07-18 21:06:20,059 controller 71210 checkpoint_path.py:17 - Using RayInternalKVStore for controller checkpoint and recovery.
[2m[36m(ServeController pid=71210)[0m INFO 2022-07-18 21:06:20,081 controller 71210 http_state.py:123 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-4a225d65b457a9b0566e99c1b6d70ca6e0f6f369fb83c01166d68deb' on node '4a225d65b457a9b0566e99c1b6d70ca6e0f6f369fb83c01166d68deb' listening on '127.0.0.1:8000'
[2m[36m(HTTPProxyActor pid=71212)[0m INFO:     Started server process [71212]
[2m[36m(ServeController pid=71210)[0m INFO 2022-07-18 21:06:20,804 controller 71210 deployment_state.py:1280 - Adding 1 replicas to deployment 'XGBoostService'.


http://127.0.0.1:8000/XGBoostService


After deploying the service, you can send requests to it.

In [None]:
import requests

sample_input = test_dataset.take(1)
sample_input = dict(sample_input[0])

output = requests.post(deployment.url, json=[sample_input]).json()
print(output)

In [None]:
ray.shutdown()

### Homework

1. Have a go at Ray AIR examples in the documentation.