# Ray AIR Demonstration: End to End ML from Training to Serving using PyTorch

Adapted from https://docs.ray.io/en/master/ray-air/examples/torch_image_example.html

Ray Serve is part of Ray AI Runtime (AIR). Now you have learned Ray Serve, we can't wait to show you the bigger picture of where Serve fits in the end to end ML lifecycle.

Ray AI Runtime (AIR) is a scalable and unified toolkit for ML applications. AIR enables simple scaling of individual workloads, end-to-end workflows, and popular ecosystem frameworks, all in just Python.

![air-layering](https://docs.ray.io/en/master/_images/ray-air.svg)

AIR builds on Ray’s best-in-class libraries for Preprocessing, Training, Tuning, Scoring, Serving, and Reinforcement Learning to bring together an ecosystem of integrations.


Ray AIR aims to simplify the ecosystem of machine learning frameworks, platforms, and tools. It does this by leveraging Ray to provide a seamless, unified, and open experience for scalable ML:

![air-integration](https://docs.ray.io/en/master/_images/why-air-2.svg)


1. **Seamless Dev to Prod**: AIR reduces friction going from development to production. With Ray and AIR, the same Python code scales seamlessly from a laptop to a large cluster.

2. **Unified ML API**: AIR’s unified ML API enables swapping between popular frameworks, such as XGBoost, PyTorch, and HuggingFace, with just a single class change in your code.

3. **Open and Extensible**: AIR and Ray are fully open-source and can run on any cluster, cloud, or Kubernetes. Build custom components and integrations on top of scalable developer APIs.


---

This tutorial demonstrates how to train an image classifier using the [Ray AI Runtime](air) (AIR), then perform batch scoring as well as online serving. 

You should be familiar with [PyTorch](https://pytorch.org/) before starting the tutorial. If you need a refresher, read PyTorch's [training a classifier](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) tutorial.

## Before you begin

* Install the [Ray AI Runtime](air). You'll need Ray 1.13 later to run this example.

In [1]:
!pip install -q 'ray[air]'

* Install `requests`, `torch`, `torchvision`, `tqdm`

In [2]:
!pip install -q requests torch torchvision tqdm

## Load and normalize CIFAR-10

We'll train our classifier on a popular image dataset called [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html).

First, let's load CIFAR-10 into a Ray Dataset.

In [3]:
import ray
from ray.data.datasource import SimpleTorchDatasource
import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose(
    [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
)

def train_dataset_factory():
    return torchvision.datasets.CIFAR10(root="./data", download=True, train=True, transform=transform)

def test_dataset_factory():
    return torchvision.datasets.CIFAR10(root="./data", download=True, train=False, transform=transform)

train_dataset: ray.data.Dataset = ray.data.read_datasource(SimpleTorchDatasource(), dataset_factory=train_dataset_factory)
test_dataset: ray.data.Dataset = ray.data.read_datasource(SimpleTorchDatasource(), dataset_factory=test_dataset_factory)

  from .autonotebook import tqdm as notebook_tqdm
2022-08-16 01:00:13,565	INFO worker.py:1481 -- Started a local Ray instance. View the dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m.


[2m[36m(_execute_read_task pid=28284)[0m Files already downloaded and verified




[2m[36m(_execute_read_task pid=28284)[0m Files already downloaded and verified


In [4]:
train_dataset

Dataset(num_blocks=1, num_rows=50000, schema=<class 'tuple'>)

Note that {py:class}`SimpleTorchDatasource <ray.data.datasource.SimpleTorchDatasource>` loads all data into memory, so you shouldn't use it with larger datasets.

Next, let's represent our data using pandas dataframes instead of tuples. This lets us call methods like {py:meth}`Dataset.iter_torch_batches <ray.data.Dataset.iter_torch_batches>` later in the tutorial.

In [5]:
from typing import Tuple
import pandas as pd
from ray.data.extensions import TensorArray
import torch


def convert_batch_to_pandas(batch: Tuple[torch.Tensor, int]) -> pd.DataFrame:
    images = TensorArray([image.numpy() for image, _ in batch])
    labels = [label for _, label in batch]

    df = pd.DataFrame({"image": images, "label": labels})

    return df


train_dataset = train_dataset.map_batches(convert_batch_to_pandas)
test_dataset = test_dataset.map_batches(convert_batch_to_pandas)

Read->Map_Batches:   0%|                                                | 0/1 [00:00<?, ?it/s]

[2m[36m(_map_block_nosplit pid=28284)[0m Files already downloaded and verified


Read->Map_Batches: 100%|████████████████████████████████████████| 1/1 [00:04<00:00,  4.22s/it]
Read->Map_Batches:   0%|                                                | 0/1 [00:00<?, ?it/s]

[2m[36m(_map_block_nosplit pid=28284)[0m Files already downloaded and verified


Read->Map_Batches: 100%|████████████████████████████████████████| 1/1 [00:01<00:00,  1.38s/it]


In [6]:
train_dataset

Dataset(num_blocks=1, num_rows=50000, schema={image: TensorDtype(shape=(3, 32, 32), dtype=float32), label: int64})

## Train a convolutional neural network

Now that we've created our datasets, let's define the training logic.

In [7]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1)  # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

We define our training logic in a function called `train_loop_per_worker`.

`train_loop_per_worker` contains regular PyTorch code with a few notable exceptions:
* We wrap our model with {py:func}`train.torch.prepare_model <ray.train.torch.prepare_model>`.
* We call {py:func}`session.get_dataset_shard <ray.air.session.get_dataset_shard>` and {py:meth}`Dataset.iter_torch_batches <ray.data.Dataset.iter_torch_batches>` to convert a subset of our training data to a Torch dataset.
* We save model state using {py:func}`session.report <ray.air.session.report>`.

In [8]:
from ray import train
from ray.air import session, Checkpoint
import torch.optim as optim


def train_loop_per_worker(config):
    model = train.torch.prepare_model(Net())

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

    train_dataset_shard = session.get_dataset_shard("train").iter_torch_batches(
        batch_size=config["batch_size"],
    )

    for epoch in range(2):
        running_loss = 0.0
        for i, data in enumerate(train_dataset_shard):
            # get the inputs and labels
            inputs, labels = data["image"], data["label"]

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            # print statistics
            running_loss += loss.item()
            if i % 2000 == 1999:  # print every 2000 mini-batches
                print(f"[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}")
                running_loss = 0.0

        session.report(
            dict(running_loss=running_loss),
            checkpoint=Checkpoint.from_dict(dict(model=model.module.state_dict())),
        )

Finally, we can train our model. This should take a few minutes to run.

In [9]:
from ray.train.torch import TorchTrainer
from ray.air.config import ScalingConfig

trainer = TorchTrainer(
    train_loop_per_worker=train_loop_per_worker,
    train_loop_config={"batch_size": 2},
    datasets={"train": train_dataset},
    scaling_config=ScalingConfig(num_workers=2)
)
result = trainer.fit()
latest_checkpoint = result.checkpoint

Trial name,status,loc,iter,total time (s),running_loss,_timestamp,_time_this_iter_s
TorchTrainer_86ba4_00000,TERMINATED,127.0.0.1:28383,2,21.3122,0,1660636865,0.0527029


[2m[36m(RayTrainWorker pid=28387)[0m 2022-08-16 01:00:45,483	INFO config.py:71 -- Setting up process group for: env:// [rank=0, world_size=2]
[2m[36m(RayTrainWorker pid=28387)[0m 2022-08-16 01:00:46,550	INFO train_loop_utils.py:300 -- Moving model to device: cpu
[2m[36m(RayTrainWorker pid=28387)[0m 2022-08-16 01:00:46,550	INFO train_loop_utils.py:347 -- Wrapping provided model in DDP.
[2m[36m(RayTrainWorker pid=28388)[0m   return torch.as_tensor(ndarray, dtype=dtype, device=device)
[2m[36m(RayTrainWorker pid=28387)[0m   return torch.as_tensor(ndarray, dtype=dtype, device=device)


[2m[36m(RayTrainWorker pid=28387)[0m [1,  2000] loss: 2.206
[2m[36m(RayTrainWorker pid=28388)[0m [1,  2000] loss: 2.211
[2m[36m(RayTrainWorker pid=28387)[0m [1,  4000] loss: 1.857
[2m[36m(RayTrainWorker pid=28388)[0m [1,  4000] loss: 1.889
[2m[36m(RayTrainWorker pid=28387)[0m [1,  6000] loss: 1.663
[2m[36m(RayTrainWorker pid=28388)[0m [1,  6000] loss: 1.652
[2m[36m(RayTrainWorker pid=28387)[0m [1,  8000] loss: 1.576
[2m[36m(RayTrainWorker pid=28388)[0m [1,  8000] loss: 1.557
[2m[36m(RayTrainWorker pid=28387)[0m [1, 10000] loss: 1.486
[2m[36m(RayTrainWorker pid=28388)[0m [1, 10000] loss: 1.531
[2m[36m(RayTrainWorker pid=28387)[0m [1, 12000] loss: 1.435
[2m[36m(RayTrainWorker pid=28388)[0m [1, 12000] loss: 1.442
Result for TorchTrainer_86ba4_00000:
  _time_this_iter_s: 18.94872808456421
  _timestamp: 1660636865
  _training_iteration: 1
  date: 2022-08-16_01-01-05
  done: false
  experiment_id: de85b3b22d89446ea767befc5abf6b21
  hostname: Simons-MacBook

2022-08-16 01:01:06,164	INFO tune.py:758 -- Total run time: 23.27 seconds (23.13 seconds for the tuning loop).


To scale your training script, create a [Ray Cluster](cluster-index) and increase the number of workers. If your cluster contains GPUs, add `"use_gpu": True` to your scaling config.

```{code-block} python
scaling_config=ScalingConfig(num_workers=8, "use_gpu=True)
```

## Test the network on the test data

Let's see how our model performs.

To classify images in the test dataset, we'll need to create a {py:class}`Predictor <ray.train.predictor.Predictor>`.

{py:class}`Predictors <ray.train.predictor.Predictor>` load data from checkpoints and efficiently perform inference. In contrast to {py:class}`TorchPredictor <ray.train.torch.TorchPredictor>`, which performs inference on a single batch, {py:class}`BatchPredictor <ray.train.batch_predictor.BatchPredictor>` performs inference on an entire dataset. Because we want to classify all of the images in the test dataset, we'll use a {py:class}`BatchPredictor <ray.train.batch_predictor.BatchPredictor>`.

In [10]:
from ray.train.torch import TorchPredictor
from ray.train.batch_predictor import BatchPredictor

predict_dataset = test_dataset.drop_columns(cols=["label"])
batch_predictor = BatchPredictor.from_checkpoint(
    checkpoint=latest_checkpoint,
    predictor_cls=TorchPredictor,
    model=Net(),
)

outputs: ray.data.Dataset = batch_predictor.predict(
    data=test_dataset, dtype=torch.float, feature_columns=["image"], keep_columns=["label"]
)

Map_Batches: 100%|██████████████████████████████████████████████| 1/1 [00:00<00:00, 12.47it/s]
[2m[36m(BlockWorker pid=28406)[0m A value is trying to be set on a copy of a slice from a DataFrame.
[2m[36m(BlockWorker pid=28406)[0m Try using .loc[row_indexer,col_indexer] = value instead
[2m[36m(BlockWorker pid=28406)[0m 
[2m[36m(BlockWorker pid=28406)[0m See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
[2m[36m(BlockWorker pid=28406)[0m   df.loc[:, col_name] = TensorArray(col)
[2m[36m(BlockWorker pid=28406)[0m A value is trying to be set on a copy of a slice from a DataFrame.
[2m[36m(BlockWorker pid=28406)[0m Try using .loc[row_indexer,col_indexer] = value instead
[2m[36m(BlockWorker pid=28406)[0m 
[2m[36m(BlockWorker pid=28406)[0m See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
[2m

Our model outputs a list of energies for each class. To classify an image, we
choose the class that has the highest energy.

In [11]:
import numpy as np

def convert_logits_to_classes(df):
    best_class = df["predictions"].map(lambda x: x.argmax())
    df["prediction"] = best_class
    return df

predictions = outputs.map_batches(
    convert_logits_to_classes, batch_format="pandas"
)

predictions.show(1)

Map_Batches: 100%|██████████████████████████████████████████████| 1/1 [00:00<00:00, 35.27it/s]

{'predictions': array([-1.5615891 , -1.8779886 ,  0.80665046,  2.2954726 ,  0.05051345,
        1.0009389 ,  1.3282954 , -1.025135  , -1.2868532 , -1.8736922 ],
      dtype=float32), 'label': 3, 'prediction': 3}





Now that we've classified all of the images, let's figure out which images were
classified correctly. The ``predictions`` dataset contains predicted labels and 
the ``test_dataset`` contains the true labels. To determine whether an image 
was classified correctly, we join the two datasets and check if the predicted 
labels are the same as the actual labels.

In [12]:
def calculate_prediction_scores(df):
    df["correct"] = df["prediction"] == df["label"]
    return df[["prediction", "label", "correct"]]

scores = predictions.map_batches(calculate_prediction_scores)

scores.show(1)

Map_Batches: 100%|██████████████████████████████████████████████| 1/1 [00:00<00:00, 52.73it/s]

{'prediction': 3, 'label': 3, 'correct': True}





To compute our test accuracy, we'll count how many images the model classified 
correctly and divide that number by the total number of test images.

In [13]:
scores.sum(on="correct") / scores.count()

Shuffle Map: 100%|█████████████████████████████████████████████| 1/1 [00:00<00:00, 108.38it/s]
Shuffle Reduce: 100%|██████████████████████████████████████████| 1/1 [00:00<00:00, 174.07it/s]


0.4944

## Deploy the network and make a prediction

Our model seems to perform decently, so let's deploy the model to an 
endpoint. This'll allow us to make predictions over the Internet.

In [14]:
from ray import serve
from ray.serve import PredictorDeployment
from ray.serve.http_adapters import NdArray


def json_to_numpy(payload: NdArray) -> pd.DataFrame:
    """Accepts an NdArray JSON from an HTTP body and converts it to a Numpy Array."""
    # Have to explicitly convert to float since np.array reads as a double.
    arr = np.array(payload.array, dtype=np.float32)
    return arr


serve.run(
    PredictorDeployment.bind(
        TorchPredictor,
        latest_checkpoint,
        batching_params=False,
        model=Net(),
        http_adapter=json_to_numpy,
    )
)

[2m[36m(ServeController pid=28454)[0m INFO 2022-08-16 01:01:20,420 controller 28454 http_state.py:129 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-149bc79c679e815f8878f39ebb27086bc31a30b1b3796b7cef49280c' on node '149bc79c679e815f8878f39ebb27086bc31a30b1b3796b7cef49280c' listening on '127.0.0.1:8000'
[2m[36m(ServeController pid=28454)[0m INFO 2022-08-16 01:01:21,039 controller 28454 deployment_state.py:1232 - Adding 1 replicas to deployment 'PredictorDeployment'.
[2m[36m(HTTPProxyActor pid=28469)[0m INFO:     Started server process [28469]


RayServeSyncHandle(deployment='PredictorDeployment')

Let's classify a test image.

In [15]:
batch = test_dataset.take(1)
array = np.expand_dims(np.array(batch[0]["image"]), axis=0)

In [16]:
array.shape

(1, 3, 32, 32)

You can perform inference against a deployed model by posting a dictionary with an `"array"` key. To learn more about the default input schema, read the {py:class}`NdArray <ray.serve.http_adapters.NdArray>` documentation.

In [17]:
import requests

payload = {"array": array.tolist()}
response = requests.post("http://localhost:8000/", json=payload)
response.json()

[[-1.5615893602371216,
  -1.8779891729354858,
  0.8066505193710327,
  2.295473337173462,
  0.05051347613334656,
  1.0009385347366333,
  1.3282963037490845,
  -1.0251355171203613,
  -1.286853551864624,
  -1.8736921548843384]]

[2m[36m(HTTPProxyActor pid=28469)[0m INFO 2022-08-16 01:01:26,301 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 13.9ms
[2m[36m(ServeReplica:PredictorDeployment pid=28474)[0m INFO 2022-08-16 01:01:26,300 PredictorDeployment PredictorDeployment#GtFTdH replica.py:482 - HANDLE __call__ OK 10.1ms
