# Ray Serve - Inference Graphs APIs

© 2019-2022, Anyscale. All Rights Reserved

### Learning Objective:
In this introductory tutorial, you will:

* construct a simple model composition inference graph pipeline
* utilize inference graph APIs to create a single deployment
* and score an inference graph end-to-end

This tutorial takes a simple example of model composition, built using ServeHandle APIs, and converts it into
an equivalent model composition. The example converted from ServeHandle APIs to inference graphs is [here](https://docs.ray.io/en/latest/serve/ml-models.html#id3).

<img src="../images/model_composition_inference_graph.png" width="50%" height="25%">

import time
import asyncio
import requests
import starlette

from random import random
import requests

import ray
from ray import serve
from ray.experimental.dag.input_node import InputNode
from ray.serve.drivers import DAGDriver
from ray.serve.http_adapters import json_request

### Step 1: Build processor nodes.

In [66]:
@serve.deployment
async def preprocess(input_data):
    """Simple feature processing that returns int multiplied by the input as a float."""
    await asyncio.sleep(0.15) # Manual delay for blocking computation
    return random() * input_data

### Step 2: Model nodes

In [67]:
@serve.deployment
class ModelOne:
    def __init__(self, input):
        self.weight = random()
        self.result = self.weight * input
        
    async def forward(self, input: int):
        await asyncio.sleep(0.3) # Manual delay for blocking computation
        print(f"Model 1 called with data:{input}: result: {self.result}")
        return self.result

In [68]:
@serve.deployment
class ModelTwo:
    def __init__(self, input):
        self.weight = random()
        self.result = self.weight * input
        
    async def forward(self, input: int):
        await asyncio.sleep(0.3) # Manual delay for blocking computation
        print(f"Model 2 called with data:{input}: result: {self.result}")
        return self.result

### Step 3: Build over Combiner aggregation based on user input and operation

In [69]:
@serve.deployment
class Combiner:
    def __init__(self, m1:ModelOne, m2:ModelTwo):
        self.m1 = m1
        self.m2 = m2
        
    async def run(self, req_part):
        # Submit to model-1 for inference
        rst = self.m1.forward.remote(req_part)

        # Async gathering of model forward results for request data
        score = await asyncio.gather(rst)
        if score[0] >= 0.5:
            rst = self.m2.forward.remote(req_part)
            await asyncio.gather(rst)
            result = {"model_used: 1 & 2;  score": score}
        else:
            result = {"model_used: 1 ; score": score}
            
        return result 

### Step 4: Build our InputNode and driver deployment to handle http ingress¶

In [70]:
with InputNode() as dag_input:
    
    # create a preprocessor
    pre_prop = preprocess.bind(dag_input[0])
    
    # create two models nodes
    model_1 = ModelOne.bind(1)
    model_2 = ModelTwo.bind(2)
    
    # create the combiner
    combiner = Combiner.bind(model_1, model_2)
    
    # Use output of function DeploymentNode in bind()
    dag = combiner.run.bind(pre_prop)
    
    # Each serve dag has a driver deployment as ingress that can be user provided.
    serve_dag = DAGDriver.options(route_prefix="/my-dag", num_replicas=2).bind(
        dag, http_adapter=json_request)

### Step 5: Test the full DAG in both python and http

In [71]:
dag_handle = serve.run(serve_dag)

[2m[36m(ServeController pid=97331)[0m INFO 2022-07-06 14:06:28,040 controller 97331 checkpoint_path.py:17 - Using RayInternalKVStore for controller checkpoint and recovery.
[2m[36m(ServeController pid=97331)[0m INFO 2022-07-06 14:06:28,145 controller 97331 http_state.py:112 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-node:127.0.0.1-0' on node 'node:127.0.0.1-0' listening on '127.0.0.1:8000'
[2m[36m(ServeController pid=97331)[0m INFO 2022-07-06 14:06:28,669 controller 97331 deployment_state.py:1216 - Adding 1 replicas to deployment 'preprocess'.
[2m[36m(ServeController pid=97331)[0m INFO 2022-07-06 14:06:28,675 controller 97331 deployment_state.py:1216 - Adding 1 replicas to deployment 'ModelOne'.
[2m[36m(ServeController pid=97331)[0m INFO 2022-07-06 14:06:28,680 controller 97331 deployment_state.py:1216 - Adding 1 replicas to deployment 'ModelTwo'.
[2m[36m(ServeController pid=97331)[0m INFO 2022-07-06 14:06:28,686 controller 97331 deploym

### Use HTTP endpoint

Send a bunch of HTTP requests

In [72]:
%%time
for i in range(5):
    print(requests.post("http://127.0.0.1:8000/my-dag", json=[i]).text)

[2m[36m(HTTPProxyActor pid=97333)[0m INFO 2022-07-06 14:06:47,593 http_proxy 127.0.0.1 http_proxy.py:310 - POST /my-dag 307 6.1ms
[2m[36m(DAGDriver pid=97339)[0m INFO 2022-07-06 14:06:47,592 DAGDriver DAGDriver#Haucbp replica.py:478 - HANDLE __call__ OK 0.3ms
[2m[36m(DAGDriver pid=97340)[0m You are retrieving a sync handle inside an asyncio loop. Try getting client.get_handle(.., sync=False) to get better performance. Learn more at https://docs.ray.io/en/master/serve/http-servehandle.html#sync-and-async-handles
[2m[36m(DAGDriver pid=97340)[0m You are retrieving a sync handle inside an asyncio loop. Try getting client.get_handle(.., sync=False) to get better performance. Learn more at https://docs.ray.io/en/master/serve/http-servehandle.html#sync-and-async-handles
[2m[36m(preprocess pid=97335)[0m INFO 2022-07-06 14:06:47,761 preprocess preprocess#iGqPMH replica.py:478 - HANDLE __call__ OK 151.7ms
[2m[36m(Combiner pid=97338)[0m You are retrieving a sync handle inside an

[2m[36m(ModelOne pid=97336)[0m Model 1 called with data:0.0: result: 0.5773534453760967
{"model_used: 1 & 2;  score":[0.5773534453760967]}
[2m[36m(ModelTwo pid=97337)[0m Model 2 called with data:0.0: result: 1.1547068907521933


[2m[36m(HTTPProxyActor pid=97333)[0m INFO 2022-07-06 14:06:48,425 http_proxy 127.0.0.1 http_proxy.py:310 - POST /my-dag 200 830.7ms
[2m[36m(HTTPProxyActor pid=97333)[0m INFO 2022-07-06 14:06:48,441 http_proxy 127.0.0.1 http_proxy.py:310 - POST /my-dag 307 6.2ms
[2m[36m(ModelTwo pid=97337)[0m INFO 2022-07-06 14:06:48,411 ModelTwo ModelTwo#zCWcFH replica.py:478 - HANDLE forward OK 302.2ms
[2m[36m(DAGDriver pid=97339)[0m INFO 2022-07-06 14:06:48,439 DAGDriver DAGDriver#Haucbp replica.py:478 - HANDLE __call__ OK 0.5ms
[2m[36m(Combiner pid=97338)[0m INFO 2022-07-06 14:06:48,416 Combiner Combiner#EVWrVn replica.py:478 - HANDLE run OK 645.6ms
[2m[36m(DAGDriver pid=97340)[0m INFO 2022-07-06 14:06:48,420 DAGDriver DAGDriver#YQhOiE replica.py:478 - HANDLE __call__ OK 823.9ms
[2m[36m(preprocess pid=97335)[0m INFO 2022-07-06 14:06:48,605 preprocess preprocess#iGqPMH replica.py:478 - HANDLE __call__ OK 151.5ms
[2m[36m(ModelOne pid=97336)[0m INFO 2022-07-06 14:06:48,930 Model

[2m[36m(ModelOne pid=97336)[0m Model 1 called with data:0.6082014745761107: result: 0.5773534453760967
{"model_used: 1 & 2;  score":[0.5773534453760967]}
[2m[36m(ModelTwo pid=97337)[0m Model 2 called with data:0.6082014745761107: result: 1.1547068907521933


[2m[36m(HTTPProxyActor pid=97333)[0m INFO 2022-07-06 14:06:49,248 http_proxy 127.0.0.1 http_proxy.py:310 - POST /my-dag 200 804.2ms
[2m[36m(HTTPProxyActor pid=97333)[0m INFO 2022-07-06 14:06:49,254 http_proxy 127.0.0.1 http_proxy.py:310 - POST /my-dag 307 2.6ms
[2m[36m(ModelTwo pid=97337)[0m INFO 2022-07-06 14:06:49,244 ModelTwo ModelTwo#zCWcFH replica.py:478 - HANDLE forward OK 300.7ms
[2m[36m(DAGDriver pid=97339)[0m INFO 2022-07-06 14:06:49,253 DAGDriver DAGDriver#Haucbp replica.py:478 - HANDLE __call__ OK 0.2ms
[2m[36m(Combiner pid=97338)[0m INFO 2022-07-06 14:06:49,246 Combiner Combiner#EVWrVn replica.py:478 - HANDLE run OK 627.3ms
[2m[36m(DAGDriver pid=97340)[0m INFO 2022-07-06 14:06:49,247 DAGDriver DAGDriver#YQhOiE replica.py:478 - HANDLE __call__ OK 801.3ms
[2m[36m(preprocess pid=97335)[0m INFO 2022-07-06 14:06:49,412 preprocess preprocess#iGqPMH replica.py:478 - HANDLE __call__ OK 151.2ms
[2m[36m(ModelOne pid=97336)[0m INFO 2022-07-06 14:06:49,717 Model

[2m[36m(ModelOne pid=97336)[0m Model 1 called with data:1.1953268734127345: result: 0.5773534453760967
{"model_used: 1 & 2;  score":[0.5773534453760967]}
[2m[36m(ModelTwo pid=97337)[0m Model 2 called with data:1.1953268734127345: result: 1.1547068907521933


[2m[36m(HTTPProxyActor pid=97333)[0m INFO 2022-07-06 14:06:50,040 http_proxy 127.0.0.1 http_proxy.py:310 - POST /my-dag 200 784.9ms
[2m[36m(ModelTwo pid=97337)[0m INFO 2022-07-06 14:06:50,028 ModelTwo ModelTwo#zCWcFH replica.py:478 - HANDLE forward OK 303.3ms
[2m[36m(Combiner pid=97338)[0m INFO 2022-07-06 14:06:50,034 Combiner Combiner#EVWrVn replica.py:478 - HANDLE run OK 620.7ms
[2m[36m(DAGDriver pid=97340)[0m INFO 2022-07-06 14:06:50,036 DAGDriver DAGDriver#YQhOiE replica.py:478 - HANDLE __call__ OK 780.0ms
[2m[36m(HTTPProxyActor pid=97333)[0m INFO 2022-07-06 14:06:50,050 http_proxy 127.0.0.1 http_proxy.py:310 - POST /my-dag 307 4.2ms
[2m[36m(DAGDriver pid=97339)[0m INFO 2022-07-06 14:06:50,048 DAGDriver DAGDriver#Haucbp replica.py:478 - HANDLE __call__ OK 0.4ms
[2m[36m(preprocess pid=97335)[0m INFO 2022-07-06 14:06:50,210 preprocess preprocess#iGqPMH replica.py:478 - HANDLE __call__ OK 151.3ms
[2m[36m(ModelOne pid=97336)[0m INFO 2022-07-06 14:06:50,518 Model

[2m[36m(ModelOne pid=97336)[0m Model 1 called with data:1.1364933874902587: result: 0.5773534453760967
{"model_used: 1 & 2;  score":[0.5773534453760967]}
[2m[36m(ModelTwo pid=97337)[0m Model 2 called with data:1.1364933874902587: result: 1.1547068907521933


[2m[36m(HTTPProxyActor pid=97333)[0m INFO 2022-07-06 14:06:50,831 http_proxy 127.0.0.1 http_proxy.py:310 - POST /my-dag 200 778.8ms
[2m[36m(HTTPProxyActor pid=97333)[0m INFO 2022-07-06 14:06:50,838 http_proxy 127.0.0.1 http_proxy.py:310 - POST /my-dag 307 2.3ms
[2m[36m(ModelTwo pid=97337)[0m INFO 2022-07-06 14:06:50,827 ModelTwo ModelTwo#zCWcFH replica.py:478 - HANDLE forward OK 301.4ms
[2m[36m(DAGDriver pid=97339)[0m INFO 2022-07-06 14:06:50,837 DAGDriver DAGDriver#Haucbp replica.py:478 - HANDLE __call__ OK 0.2ms
[2m[36m(Combiner pid=97338)[0m INFO 2022-07-06 14:06:50,828 Combiner Combiner#EVWrVn replica.py:478 - HANDLE run OK 616.1ms
[2m[36m(DAGDriver pid=97340)[0m INFO 2022-07-06 14:06:50,830 DAGDriver DAGDriver#YQhOiE replica.py:478 - HANDLE __call__ OK 775.7ms
[2m[36m(preprocess pid=97335)[0m INFO 2022-07-06 14:06:50,997 preprocess preprocess#iGqPMH replica.py:478 - HANDLE __call__ OK 151.6ms
[2m[36m(ModelOne pid=97336)[0m INFO 2022-07-06 14:06:51,321 Model

[2m[36m(ModelOne pid=97336)[0m Model 1 called with data:2.4201230515254943: result: 0.5773534453760967
{"model_used: 1 & 2;  score":[0.5773534453760967]}
CPU times: user 92.9 ms, sys: 35.8 ms, total: 129 ms
Wall time: 4.07 s
[2m[36m(ModelTwo pid=97337)[0m Model 2 called with data:2.4201230515254943: result: 1.1547068907521933


[2m[36m(HTTPProxyActor pid=97333)[0m INFO 2022-07-06 14:06:51,641 http_proxy 127.0.0.1 http_proxy.py:310 - POST /my-dag 200 801.2ms
[2m[36m(ModelTwo pid=97337)[0m INFO 2022-07-06 14:06:51,635 ModelTwo ModelTwo#zCWcFH replica.py:478 - HANDLE forward OK 300.9ms
[2m[36m(Combiner pid=97338)[0m INFO 2022-07-06 14:06:51,637 Combiner Combiner#EVWrVn replica.py:478 - HANDLE run OK 632.0ms
[2m[36m(DAGDriver pid=97340)[0m INFO 2022-07-06 14:06:51,638 DAGDriver DAGDriver#YQhOiE replica.py:478 - HANDLE __call__ OK 797.3ms


In [73]:
serve.shutdown()

[2m[36m(ServeController pid=97331)[0m INFO 2022-07-06 14:07:01,050 controller 97331 deployment_state.py:1240 - Removing 1 replicas from deployment 'preprocess'.
[2m[36m(ServeController pid=97331)[0m INFO 2022-07-06 14:07:01,052 controller 97331 deployment_state.py:1240 - Removing 1 replicas from deployment 'ModelOne'.
[2m[36m(ServeController pid=97331)[0m INFO 2022-07-06 14:07:01,053 controller 97331 deployment_state.py:1240 - Removing 1 replicas from deployment 'ModelTwo'.
[2m[36m(ServeController pid=97331)[0m INFO 2022-07-06 14:07:01,056 controller 97331 deployment_state.py:1240 - Removing 1 replicas from deployment 'Combiner'.
[2m[36m(ServeController pid=97331)[0m INFO 2022-07-06 14:07:01,057 controller 97331 deployment_state.py:1240 - Removing 2 replicas from deployment 'DAGDriver'.
