# Ray Serve - Inference Graphs APIs

© 2019-2022, Anyscale. All Rights Reserved

### Learning Objective:
In this introductory tutorial, you will:

* construct a simple inference graph pipeline
* utilize inference graph APIs to create a single deployment
* and score an inference graph end-to-end

<img src="../images/simple_inference_graph.png" width="50%" height="25%">

In [25]:
import time
import asyncio
import requests
import starlette

import ray
from ray import serve
from ray.experimental.dag.input_node import InputNode
from ray.serve.drivers import DAGDriver
from ray.serve.http_adapters import json_request

### Step 1: Build processor nodes.

In [26]:
@serve.deployment
async def avg_preprocessor(input_data):
    """Simple feature processing that returns average of input list as float."""
    await asyncio.sleep(0.15) # Manual delay for blocking computation
    return sum(input_data) / len(input_data)

### Step 2: Model nodes

In [27]:
@serve.deployment
class Model:
    def __init__(self, weight: int):
        self.weight = weight

    async def forward(self, input: int):
        await asyncio.sleep(0.3) # Manual delay for blocking computation
        return f"({self.weight} * {input})"

### Step 3: Build over Combiner aggregation based on user input and operation

In [28]:
@serve.deployment
class Combiner:
    def __init__(self, m: Model):
        self.m = m

    async def run(self, req_part, operation):
        # Merge model input from the preprocessor
        req = f"({req_part}"

        # Submit to model for inference
        r1_ref = self.m.forward.remote(req)

        # Async gathering of model forward results for same request data
        rst = await asyncio.gather(r1_ref)

        # Control flow that determines runtime behavior based on user input
        if operation == "sum":
            return f"sum({rst})"
        else:
            return f"max({rst})"

### Step 4: Build our InputNode and driver deployment to handle http ingress¶

In [29]:
# DAG building
with InputNode() as dag_input:
    # Partial access of user input by index
    preprocessed_2 = avg_preprocessor.bind(dag_input[0])
    
    # Create a model Node 
    m1 = Model.bind(1)
    
    # Use other DeploymentNode in bind()
    combiner = Combiner.bind(m1)
    
    # Use output of function DeploymentNode in bind()
    dag = combiner.run.bind(preprocessed_2, dag_input[1])
    
    # Each serve dag has a driver deployment as ingress that can be user provided.
    serve_dag = DAGDriver.options(route_prefix="/my-dag", num_replicas=2).bind(
        dag, http_adapter=json_request
    )

### Step 5: Test the full DAG in both python and http

In [30]:
dag_handle = serve.run(serve_dag)

[2m[36m(ServeController pid=99734)[0m INFO 2022-07-06 14:31:44,653 controller 99734 checkpoint_path.py:17 - Using RayInternalKVStore for controller checkpoint and recovery.
[2m[36m(ServeController pid=99734)[0m INFO 2022-07-06 14:31:44,758 controller 99734 http_state.py:112 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-node:127.0.0.1-0' on node 'node:127.0.0.1-0' listening on '127.0.0.1:8000'
[2m[36m(ServeController pid=99734)[0m INFO 2022-07-06 14:31:45,282 controller 99734 deployment_state.py:1216 - Adding 1 replicas to deployment 'avg_preprocessor'.
[2m[36m(ServeController pid=99734)[0m INFO 2022-07-06 14:31:45,288 controller 99734 deployment_state.py:1216 - Adding 1 replicas to deployment 'Model'.
[2m[36m(ServeController pid=99734)[0m INFO 2022-07-06 14:31:45,293 controller 99734 deployment_state.py:1216 - Adding 1 replicas to deployment 'Combiner'.
[2m[36m(HTTPProxyActor pid=99736)[0m INFO:     Started server process [99736]
[2m[36m(

### Use Python API

In [31]:
%%time
ray.get(dag_handle.predict.remote([[1, 2], "sum"]))

[2m[36m(DAGDriver pid=99742)[0m You are retrieving a sync handle inside an asyncio loop. Try getting client.get_handle(.., sync=False) to get better performance. Learn more at https://docs.ray.io/en/master/serve/http-servehandle.html#sync-and-async-handles
[2m[36m(DAGDriver pid=99742)[0m You are retrieving a sync handle inside an asyncio loop. Try getting client.get_handle(.., sync=False) to get better performance. Learn more at https://docs.ray.io/en/master/serve/http-servehandle.html#sync-and-async-handles
[2m[36m(Combiner pid=99740)[0m You are retrieving a sync handle inside an asyncio loop. Try getting client.get_handle(.., sync=False) to get better performance. Learn more at https://docs.ray.io/en/master/serve/http-servehandle.html#sync-and-async-handles
[2m[36m(avg_preprocessor pid=99738)[0m INFO 2022-07-06 14:31:49,202 avg_preprocessor avg_preprocessor#FZuVUH replica.py:478 - HANDLE __call__ OK 151.6ms


CPU times: user 7.75 ms, sys: 5.09 ms, total: 12.8 ms
Wall time: 512 ms


"sum(['(1 * (1.5)'])"

[2m[36m(Combiner pid=99740)[0m INFO 2022-07-06 14:31:49,533 Combiner Combiner#lYkSfJ replica.py:478 - HANDLE run OK 324.7ms
[2m[36m(Model pid=99739)[0m INFO 2022-07-06 14:31:49,529 Model Model#IgYGEr replica.py:478 - HANDLE forward OK 301.8ms
[2m[36m(DAGDriver pid=99742)[0m INFO 2022-07-06 14:31:49,536 DAGDriver DAGDriver#wqpEFr replica.py:478 - HANDLE predict OK 505.5ms


### Use HTTP endpoint

In [32]:
%%time
print(requests.post("http://127.0.0.1:8000/my-dag", json=[[1, 2], "sum"]).text)

[2m[36m(HTTPProxyActor pid=99736)[0m INFO 2022-07-06 14:31:51,742 http_proxy 127.0.0.1 http_proxy.py:310 - POST /my-dag 307 4.7ms
[2m[36m(DAGDriver pid=99742)[0m INFO 2022-07-06 14:31:51,741 DAGDriver DAGDriver#wqpEFr replica.py:478 - HANDLE __call__ OK 0.4ms
[2m[36m(DAGDriver pid=99741)[0m You are retrieving a sync handle inside an asyncio loop. Try getting client.get_handle(.., sync=False) to get better performance. Learn more at https://docs.ray.io/en/master/serve/http-servehandle.html#sync-and-async-handles
[2m[36m(DAGDriver pid=99741)[0m You are retrieving a sync handle inside an asyncio loop. Try getting client.get_handle(.., sync=False) to get better performance. Learn more at https://docs.ray.io/en/master/serve/http-servehandle.html#sync-and-async-handles
[2m[36m(avg_preprocessor pid=99738)[0m INFO 2022-07-06 14:31:51,914 avg_preprocessor avg_preprocessor#FZuVUH replica.py:478 - HANDLE __call__ OK 151.1ms


"sum(['(1 * (1.5)'])"
CPU times: user 11.7 ms, sys: 5.36 ms, total: 17 ms
Wall time: 495 ms


[2m[36m(HTTPProxyActor pid=99736)[0m INFO 2022-07-06 14:31:52,225 http_proxy 127.0.0.1 http_proxy.py:310 - POST /my-dag 200 480.9ms
[2m[36m(Combiner pid=99740)[0m INFO 2022-07-06 14:31:52,222 Combiner Combiner#lYkSfJ replica.py:478 - HANDLE run OK 305.6ms
[2m[36m(Model pid=99739)[0m INFO 2022-07-06 14:31:52,221 Model Model#IgYGEr replica.py:478 - HANDLE forward OK 301.3ms
[2m[36m(DAGDriver pid=99741)[0m INFO 2022-07-06 14:31:52,223 DAGDriver DAGDriver#IvglWN replica.py:478 - HANDLE __call__ OK 476.8ms


In [33]:
serve.shutdown()

[2m[36m(ServeController pid=99734)[0m INFO 2022-07-06 14:31:53,678 controller 99734 deployment_state.py:1240 - Removing 1 replicas from deployment 'avg_preprocessor'.
[2m[36m(ServeController pid=99734)[0m INFO 2022-07-06 14:31:53,680 controller 99734 deployment_state.py:1240 - Removing 1 replicas from deployment 'Model'.
[2m[36m(ServeController pid=99734)[0m INFO 2022-07-06 14:31:53,682 controller 99734 deployment_state.py:1240 - Removing 1 replicas from deployment 'Combiner'.
[2m[36m(ServeController pid=99734)[0m INFO 2022-07-06 14:31:53,683 controller 99734 deployment_state.py:1240 - Removing 2 replicas from deployment 'DAGDriver'.
