# Ray Serve - Model Serving

© 2019-2022, Anyscale. All Rights Reserved


Now we'll explore a short example for Ray Serve. This example is from the Ray Serve [scikit-learn example.](https://docs.ray.io/en/latest/serve/tutorials/sklearn.html)

See also the Serve documentation's [mini-tutorials](https://docs.ray.io/en/latest/serve/tutorials/index.html) for using Serve with various frameworks.

In [18]:
import ray
from ray import serve
import requests  # for making web requests
import tempfile

import os
import pickle
import json
import numpy as np

In [16]:
serve.start()

2022-04-06 19:36:18,937	INFO services.py:1460 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8267[39m[22m
[2m[36m(ServeController pid=23735)[0m 2022-04-06 19:36:22,666	INFO checkpoint_path.py:15 -- Using RayInternalKVStore for controller checkpoint and recovery.
[2m[36m(ServeController pid=23735)[0m 2022-04-06 19:36:22,773	INFO http_state.py:106 -- Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:ICfrTN:SERVE_PROXY_ACTOR-node:127.0.0.1-0' on node 'node:127.0.0.1-0' listening on '127.0.0.1:8000'
2022-04-06 19:36:24,034	INFO api.py:797 -- Started Serve instance in namespace 'serve'.


<ray.serve.api.Client at 0x7fc6714e0b50>

[2m[36m(HTTPProxyActor pid=23737)[0m INFO:     Started server process [23737]


## Get a Model to Serve 

We'll begin by training a classifier with the Iris data we used before, this time using [scikit-learn](https://scikit-learn.org/stable/). The details aren't too important for our purposes, except for the fact we'll save the trained model to disk for subsequent serving.

In [19]:
import sklearn
from sklearn.datasets import load_iris
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import mean_squared_error

In [20]:
# Load data
iris_dataset = load_iris()
data, target, target_names = iris_dataset["data"], iris_dataset[
    "target"], iris_dataset["target_names"]

In [21]:
# Instantiate model
model = GradientBoostingClassifier()

In [22]:
# Training and validation split
data, target = sklearn.utils.shuffle(data, target)
train_x, train_y = data[:100], target[:100]
val_x, val_y = data[100:], target[100:]

In [23]:
# Train and evaluate models
model.fit(train_x, train_y)
print("MSE:", mean_squared_error(model.predict(val_x), val_y))

MSE: 0.08


In [24]:
# Save the model and label to file
MODEL_PATH = os.path.join(tempfile.gettempdir(),
                          "iris_model_logistic_regression.pkl")
LABEL_PATH = os.path.join(tempfile.gettempdir(), "iris_labels.json")

In [25]:
# Save the model and label to file. (This could also be S3 or other "global" place)

with open(MODEL_PATH, "wb") as f:
    pickle.dump(model, f)
with open(LABEL_PATH, "w") as f:
    json.dump(target_names.tolist(), f)

## Create a Model and Serve It

Next, we define a servable model by instantiating a class and defining the `__call__` method that Ray Serve will use. 

In [26]:
@serve.deployment(route_prefix="/regressor", num_replicas=2)
class BoostingModel:
    def __init__(self):
        with open(MODEL_PATH, "rb") as f:
            self.model = pickle.load(f)
        with open(LABEL_PATH) as f:
            self.label_list = json.load(f)

    # async allows us to have this call concurrently            
    async def __call__(self, starlette_request):
        payload = await starlette_request.json()
        print("Worker: received starlette request with data", payload)

        input_vector = [
            payload["sepal length"],
            payload["sepal width"],
            payload["petal length"],
            payload["petal width"],
        ]
        prediction = self.model.predict([input_vector])[0]
        human_name = self.label_list[prediction]
        return {"result": human_name}

## Deploy the model

In [27]:
BoostingModel.deploy()

2022-04-06 19:37:39,945	INFO api.py:618 -- Updating deployment 'BoostingModel'. component=serve deployment=BoostingModel
[2m[36m(ServeController pid=23735)[0m 2022-04-06 19:37:40,008	INFO deployment_state.py:1210 -- Adding 2 replicas to deployment 'BoostingModel'. component=serve deployment=BoostingModel
2022-04-06 19:37:41,959	INFO api.py:633 -- Deployment 'BoostingModel' is ready at `http://127.0.0.1:8000/regressor`. component=serve deployment=BoostingModel


## Score the model
Internally, Serve stores the model as a Ray actor and routes traffic to it as the endpoint is queried, in this case over HTTP. 

Now let’s query the endpoint to see results.

In [28]:
sample_request_input = {
    "sepal length": 1.2,
    "sepal width": 1.0,
    "petal length": 1.1,
    "petal width": 0.9,
}

We can now send HTTP requests to our route `route_prefix=/regressor` at the default port 8000

In [29]:
response = requests.get(
    "http://localhost:8000/regressor", json=sample_request_input)
print(response.text)

{
  "result": "versicolor"
}
[2m[36m(BoostingModel pid=23828)[0m Worker: received starlette request with data {'sepal length': 1.2, 'sepal width': 1.0, 'petal length': 1.1, 'petal width': 0.9}


In [30]:
response = requests.get("http://localhost:8000/regressor", json=sample_request_input).json()
print(response)

{'result': 'versicolor'}
[2m[36m(BoostingModel pid=23827)[0m Worker: received starlette request with data {'sepal length': 1.2, 'sepal width': 1.0, 'petal length': 1.1, 'petal width': 0.9}


In [31]:
deployments = serve.list_deployments()
print(f'deployments: {deployments}')

deployments: {'BoostingModel': Deployment(name=BoostingModel,version=None,route_prefix=/regressor)}


In [32]:
serve.shutdown() 

[2m[36m(ServeController pid=23735)[0m 2022-04-06 19:37:53,144	INFO deployment_state.py:1236 -- Removing 2 replicas from deployment 'BoostingModel'. component=serve deployment=BoostingModel


### Exercise

Here are some things you can try:

1. Send more input requests.

## Homework
1. Add a small model of your own and deploy it on Ray Serve
2. Send some requests
3. Try some [mini tutorials](https://docs.ray.io/en/latest/serve/tutorials/index.html) for other Frameworks