# Ray Serve - Model Serving

© 2019-2022, Anyscale. All Rights Reserved

![Anyscale Academy](../images/AnyscaleAcademyLogo.png)

Now we'll explore a nontrivial example for Ray Serve.

We'll work through an example that also covers training a model, deploying it, then updating later, based on this [documentation example](https://docs.ray.io/en/latest/serve/deployment.html). This page also has a section on [deployment to Kubernetes](https://docs.ray.io/en/latest/serve/deployment.html#deploying-as-a-kubernetes-service).

This example is from the Ray Serve [scikit-learn example.](https://docs.ray.io/en/latest/serve/tutorials/sklearn.html)
See also the Serve documentation's [mini-tutorials](https://docs.ray.io/en/latest/serve/tutorials/index.html) for using Serve with various frameworks.

In [1]:
import ray
from ray import serve
import os
import requests  # for making web requests
import tempfile

In [2]:
serve.start()

2022-03-16 16:34:31,024	INFO services.py:1412 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8270[39m[22m
[2m[36m(ServeController pid=63313)[0m 2022-03-16 16:34:33,999	INFO checkpoint_path.py:16 -- Using RayInternalKVStore for controller checkpoint and recovery.
[2m[36m(ServeController pid=63313)[0m 2022-03-16 16:34:34,105	INFO http_state.py:98 -- Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:RamarF:SERVE_PROXY_ACTOR-node:127.0.0.1-0' on node 'node:127.0.0.1-0' listening on '127.0.0.1:8000'
2022-03-16 16:34:34,477	INFO api.py:521 -- Started Serve instance in namespace 'serve'.


<ray.serve.api.Client at 0x7f86f8bf1d90>

## Get a Model to Serve 

We'll begin by training a classifier with the Iris data we used before, this time using [scikit-learn](https://scikit-learn.org/stable/). The details aren't too important for our purposes, except for the fact we'll save the trained model to disk for subsequent serving.

In [3]:
import pickle
import json
import numpy as np

In [4]:
import sklearn
from sklearn.datasets import load_iris
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import mean_squared_error

[2m[36m(HTTPProxyActor pid=63308)[0m INFO:     Started server process [63308]


In [5]:
# Load data
iris_dataset = load_iris()
data, target, target_names = iris_dataset["data"], iris_dataset[
    "target"], iris_dataset["target_names"]

In [6]:
# Instantiate model
model = GradientBoostingClassifier()

In [7]:
# Training and validation split
data, target = sklearn.utils.shuffle(data, target)
train_x, train_y = data[:100], target[:100]
val_x, val_y = data[100:], target[100:]

In [8]:
# Train and evaluate models
model.fit(train_x, train_y)
print("MSE:", mean_squared_error(model.predict(val_x), val_y))

MSE: 0.04


In [9]:
# Save the model and label to file
MODEL_PATH = os.path.join(tempfile.gettempdir(),
                          "iris_model_logistic_regression.pkl")
LABEL_PATH = os.path.join(tempfile.gettempdir(), "iris_labels.json")

In [10]:
# Save the model and label to file. (This could also be S3 or other "global" place)

with open(MODEL_PATH, "wb") as f:
    pickle.dump(model, f)
with open(LABEL_PATH, "w") as f:
    json.dump(target_names.tolist(), f)

## Create a Model and Serve It

Next, we define a servable model by instantiating a class and defining the `__call__` method that Ray Serve will use. 

In [11]:
@serve.deployment(route_prefix="/regressor")
class BoostingModel:
    def __init__(self):
        with open(MODEL_PATH, "rb") as f:
            self.model = pickle.load(f)
        with open(LABEL_PATH) as f:
            self.label_list = json.load(f)

    # async allows us to have this call concurrently            
    async def __call__(self, starlette_request):
        payload = await starlette_request.json()
        print("Worker: received starlette request with data", payload)

        input_vector = [
            payload["sepal length"],
            payload["sepal width"],
            payload["petal length"],
            payload["petal width"],
        ]
        prediction = self.model.predict([input_vector])[0]
        human_name = self.label_list[prediction]
        return {"result": human_name}

## Deploy the model

In [12]:
BoostingModel.deploy()

2022-03-16 16:34:46,505	INFO api.py:262 -- Updating deployment 'BoostingModel'. component=serve deployment=BoostingModel
[2m[36m(ServeController pid=63313)[0m 2022-03-16 16:34:46,532	INFO deployment_state.py:920 -- Adding 1 replicas to deployment 'BoostingModel'. component=serve deployment=BoostingModel
2022-03-16 16:34:47,284	INFO api.py:274 -- Deployment 'BoostingModel' is ready at `http://127.0.0.1:8000/regressor`. component=serve deployment=BoostingModel


## Score the model
Internally, Serve stores the model as a Ray actor and routes traffic to it as the endpoint is queried, in this case over HTTP. 

Now let’s query the endpoint to see results.

In [13]:
sample_request_input = {
    "sepal length": 1.2,
    "sepal width": 1.0,
    "petal length": 1.1,
    "petal width": 0.9,
}

We can now send HTTP requests to our route `route_prefix=/regressor` at the default port 8000

In [14]:
response = requests.get(
    "http://localhost:8000/regressor", json=sample_request_input)
print(response.text)

{
  "result": "versicolor"
}
[2m[36m(BoostingModel pid=63311)[0m Worker: received starlette request with data {'sepal length': 1.2, 'sepal width': 1.0, 'petal length': 1.1, 'petal width': 0.9}


In [15]:
for i in range(10):
    response = requests.get("http://localhost:8000/regressor", json=sample_request_input).json()
    print(response)

{'result': 'versicolor'}
{'result': 'versicolor'}
{'result': 'versicolor'}
{'result': 'versicolor'}
{'result': 'versicolor'}
{'result': 'versicolor'}
{'result': 'versicolor'}
{'result': 'versicolor'}
{'result': 'versicolor'}
{'result': 'versicolor'}
[2m[36m(BoostingModel pid=63311)[0m Worker: received starlette request with data {'sepal length': 1.2, 'sepal width': 1.0, 'petal length': 1.1, 'petal width': 0.9}
[2m[36m(BoostingModel pid=63311)[0m Worker: received starlette request with data {'sepal length': 1.2, 'sepal width': 1.0, 'petal length': 1.1, 'petal width': 0.9}
[2m[36m(BoostingModel pid=63311)[0m Worker: received starlette request with data {'sepal length': 1.2, 'sepal width': 1.0, 'petal length': 1.1, 'petal width': 0.9}
[2m[36m(BoostingModel pid=63311)[0m Worker: received starlette request with data {'sepal length': 1.2, 'sepal width': 1.0, 'petal length': 1.1, 'petal width': 0.9}
[2m[36m(BoostingModel pid=63311)[0m Worker: received starlette request with dat

## Cleanup

In [16]:
deployments = serve.list_deployments()
print(f'deployments: {deployments}')

deployments: {'BoostingModel': Deployment(name=BoostingModel,version=None,route_prefix=/regressor)}


In [17]:
serve.shutdown() 

[2m[36m(ServeController pid=63313)[0m 2022-03-16 16:35:06,116	INFO deployment_state.py:940 -- Removing 1 replicas from deployment 'BoostingModel'. component=serve deployment=BoostingModel


## Exercise - Try Adding more examples

Here are some things you can try:

1. Send more input requests.
2. Add a small model of your own