# Ray Serve - Creating, Deploying and Accessing Deployments

© 2019-2022, Anyscale. All Rights Reserved

This brief tutorial shows how to create, deploy, and expose access to
deployment models, using the simple Ray Serve deployment APIs.
Once deployed, you can send requests to deployments via two methods:

 1. ServerHandle API
 2. HTTP

In [1]:
import os
from random import random

import requests
import starlette
from starlette.requests import Request
import ray
from ray import serve

A simple example model stored in a pickled format at an accessible path
that can be reloaded and deserialized into a model instance. Once deployed
in Ray Serve, we can use it for prediction. The prediction is a fake condition,
based on threshold of weight greater than 0.5.

In [2]:
class Model:
    def __init__(self, path):
        self.path = path

    def predict(self, data):
        return random() + data if data > 0.5 else data

In [3]:
@serve.deployment
class Deployment:
    # Take in a path to load your desired model
    def __init__(self, path: str) -> None:
        self.path = path
        self.model = Model(path)
        # Get the pid on which this deployment is running on
        self.pid = os.getpid()

    # Deployments are callable. Here we simply return a prediction from
    # our request
    def __call__(self, starlette_request) -> str:
        # Request came via an HTTP
        if isinstance(starlette_request, starlette.requests.Request):
            data = starlette_request.query_params['data']
        else:
            # Request came via a ServerHandle API method call.
            data = starlette_request
        pred = self.model.predict(float(data))
        return f"(pid: {self.pid}); path: {self.path}; data: {float(data):.3f}; prediction: {pred:.3f}"

Start a Ray Serve instance. This will automatically start or connect to an existing Ray cluster.

In [4]:
serve.start()

2022-03-16 16:32:58,840	INFO services.py:1412 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8269[39m[22m
[2m[36m(ServeController pid=63161)[0m 2022-03-16 16:33:01,745	INFO checkpoint_path.py:16 -- Using RayInternalKVStore for controller checkpoint and recovery.
[2m[36m(ServeController pid=63161)[0m 2022-03-16 16:33:01,853	INFO http_state.py:98 -- Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:fNdmTn:SERVE_PROXY_ACTOR-node:127.0.0.1-0' on node 'node:127.0.0.1-0' listening on '127.0.0.1:8000'
2022-03-16 16:33:02,196	INFO api.py:521 -- Started Serve instance in namespace 'serve'.


<ray.serve.api.Client at 0x7ff9208ceb80>

Create two distinct deployments of the same class as two replicas. 
Associate each deployment with a unique 'name'.This name can be used as to fetch its respective serve handle.
See code below for method 1.

In [5]:
Deployment.options(name="rep-1", num_replicas=2).deploy("/model/rep-1.pkl")
Deployment.options(name="rep-2", num_replicas=2).deploy("/model/rep-2.pkl")

2022-03-16 16:33:02,215	INFO api.py:262 -- Updating deployment 'rep-1'. component=serve deployment=rep-1
[2m[36m(HTTPProxyActor pid=63157)[0m INFO:     Started server process [63157]
[2m[36m(ServeController pid=63161)[0m 2022-03-16 16:33:02,287	INFO deployment_state.py:920 -- Adding 2 replicas to deployment 'rep-1'. component=serve deployment=rep-1
2022-03-16 16:33:02,729	INFO api.py:274 -- Deployment 'rep-1' is ready at `http://127.0.0.1:8000/rep-1`. component=serve deployment=rep-1
2022-03-16 16:33:02,735	INFO api.py:262 -- Updating deployment 'rep-2'. component=serve deployment=rep-2
[2m[36m(ServeController pid=63161)[0m 2022-03-16 16:33:02,837	INFO deployment_state.py:920 -- Adding 2 replicas to deployment 'rep-2'. component=serve deployment=rep-2
2022-03-16 16:33:03,282	INFO api.py:274 -- Deployment 'rep-2' is ready at `http://127.0.0.1:8000/rep-2`. component=serve deployment=rep-2


### Get the current list of deployment


In [6]:
print(serve.list_deployments())

{'rep-1': Deployment(name=rep-1,version=None,route_prefix=/rep-1), 'rep-2': Deployment(name=rep-2,version=None,route_prefix=/rep-2)}


### Method 1: Access each deployment using the ServerHandle API

In [7]:
for _ in range(2):
    for d_name in ["rep-1", "rep-2"]:
        # Get handle to the each deployment and invoke its method.
        # Which replica the request is dispatched to is determined
        # by the Router actor.
        handle = serve.get_deployment(d_name).get_handle()
        print(f"handle name : {d_name}")
        print(f"prediction  : {ray.get(handle.remote(random()))}")
        print("-" * 2)



handle name : rep-1
prediction  : (pid: 63160); path: /model/rep-1.pkl; data: 0.517; prediction: 1.339
--
handle name : rep-2
prediction  : (pid: 63150); path: /model/rep-2.pkl; data: 0.470; prediction: 0.470
--
handle name : rep-1
prediction  : (pid: 63156); path: /model/rep-1.pkl; data: 0.827; prediction: 1.650
--
handle name : rep-2
prediction  : (pid: 63159); path: /model/rep-2.pkl; data: 0.605; prediction: 1.428
--


### Method 2: Access deployment via HTTP Request

In [8]:
for _ in range(2):
    for d_name in ["rep-1", "rep-2"]:
        # Send HTTP request along with data payload
        url = f"http://127.0.0.1:8000/{d_name}"
        print(f"handle name : {d_name}")
        print(f"prediction  : {requests.get(url, params={'data': random()}).text}")

handle name : rep-1
prediction  : (pid: 63156); path: /model/rep-1.pkl; data: 0.320; prediction: 0.320
handle name : rep-2
prediction  : (pid: 63159); path: /model/rep-2.pkl; data: 0.974; prediction: 1.513
handle name : rep-1
prediction  : (pid: 63160); path: /model/rep-1.pkl; data: 0.853; prediction: 1.393
handle name : rep-2
prediction  : (pid: 63150); path: /model/rep-2.pkl; data: 0.133; prediction: 0.133


### Exercises

Here are some things you can try:

1. For each method, send ten requests
2. Increase number of replicas
3. Do requests get sent to different replicas? (check the pids)