# Ray Serve - Model Composition

© 2019-2022, Anyscale. All Rights Reserved

![Anyscale Academy](../images/AnyscaleAcademyLogo.png)

Ray Serve supports composing individually scalable models into a single model out of the box. For instance, you can combine multiple models to perform stacking or ensembles.

To define a higher-level composed model you need to do three things:

 1. Define your underlying models (the ones that you will compose together) as Ray Serve deployments.

 2. Define your composed model, using the handles of the underlying models 

 3. Define a deployment representing this composed model and query it!

In order to avoid synchronous execution in the composed model (e.g., it’s very slow to make calls to the composed model), you’ll need to make the function asynchronous by using an `async` def. 

Our pipeline will be structured as follows:
 * Input comes in, the composed model sends it to model_one
 * model_one outputs a random number between 0 and 1, if the value is
 * greater than 0.5, then the data is sent to model_two
 * otherwise, the data is returned to the user.

Let's define two models that just print out the data they received.

In [1]:
from random import random
import requests
import ray
from ray import serve

We are using stateless functions as our deployments. 

In [2]:
@serve.deployment
def model_one(data):
    print(f"Model 1 called with data: {data}")
    return random()

@serve.deployment
def model_two(data):
    print(f"Model 2 called with data: {data}")
    # Use this data sent from model_one
    return data

`max_concurrent_queries` is optional. By default, if you pass in an async
function, Ray Serve sets the limit to a high number.

In [3]:
@serve.deployment(max_concurrent_queries=10, route_prefix='/composed')
class ComposedModel:
    def __init__(self):
        # Use the Python ServeHandle APIs
        # Set sync=False to override default, which is use this in a synchronous mode.
        # We want these deployments to be run within a asynchronous event loop for concurrency
        # See documentation for Sync Async ServeHandle APIs for details:
        # https://docs.ray.io/en/master/serve/http-servehandle.html#sync-and-async-handles
        
        self.model_one = model_one.get_handle(sync=False)
        self.model_two = model_two.get_handle(sync=False)

    # This method can be called concurrently
    async def __call__(self, starlette_request):
        # at this point you are yielding to the event loop take in another request
        data = await starlette_request.body()

        # Use await twice here for two reasons:
        # 1. Since we are running within a async def callable function and we want to use
        # this model_one deployment to run in an asynchronous fashion, this is standard
        # async-await pattern. This await call will return an ObjectRef.
        # 2. The second await waits on the ObjectRef to do an implicit ray.get(Object) to
        # fetch the actual value returned.
        # Hence two awaits.
        score = await(await self.model_one.remote(data=data))
        if score > 0.5:
            await (await self.model_two.remote(data=data))
            result = {"model_used: 1 & 2;  score": score}
        else:
            result = {"model_used: 1 ; score": score}

        return result

In [5]:
# start ray with 8 processes
if ray.is_initialized:
    ray.shutdown()
ray.init(num_cpus=8)

2022-02-19 02:06:41,915	INFO services.py:1338 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8267[39m[22m


{'node_ip_address': '127.0.0.1',
 'raylet_ip_address': '127.0.0.1',
 'redis_address': '127.0.0.1:46607',
 'object_store_address': '/tmp/ray/session_2022-02-19_02-06-38_862887_19079/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2022-02-19_02-06-38_862887_19079/sockets/raylet',
 'webui_url': '127.0.0.1:8267',
 'session_dir': '/tmp/ray/session_2022-02-19_02-06-38_862887_19079',
 'metrics_export_port': 65470,
 'node_id': '5cae9aed97eba29607439fa4144cc2f61d902adab9f2758c3f6e151e'}

In [6]:
serve.start()    # will start a serve instance 

[2m[36m(ServeController pid=19980)[0m 2022-02-19 02:07:14,107	INFO checkpoint_path.py:16 -- Using RayInternalKVStore for controller checkpoint and recovery.
[2m[36m(ServeController pid=19980)[0m 2022-02-19 02:07:14,111	INFO http_state.py:98 -- Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:JfJOso:SERVE_PROXY_ACTOR-node:127.0.0.1-0' on node 'node:127.0.0.1-0' listening on '127.0.0.1:8000'
2022-02-19 02:07:14,505	INFO api.py:463 -- Started Serve instance in namespace 'bb3eb743-f34c-4f7a-8a06-e6ceb1a3471d'.


<ray.serve.api.Client at 0x7fa210182370>

[2m[36m(HTTPProxyActor pid=19977)[0m INFO:     Started server process [19977]


### Start deployment instances

In [7]:
model_one.deploy()
model_two.deploy()
ComposedModel.deploy()

2022-02-19 02:08:06,395	INFO api.py:242 -- Updating deployment 'model_one'. component=serve deployment=model_one
[2m[36m(ServeController pid=19980)[0m 2022-02-19 02:08:06,433	INFO deployment_state.py:912 -- Adding 1 replicas to deployment 'model_one'. component=serve deployment=model_one
2022-02-19 02:08:06,881	INFO api.py:249 -- Deployment 'model_one' is ready at `http://127.0.0.1:8000/model_one`. component=serve deployment=model_one
2022-02-19 02:08:06,886	INFO api.py:242 -- Updating deployment 'model_two'. component=serve deployment=model_two
[2m[36m(ServeController pid=19980)[0m 2022-02-19 02:08:06,989	INFO deployment_state.py:912 -- Adding 1 replicas to deployment 'model_two'. component=serve deployment=model_two
2022-02-19 02:08:07,433	INFO api.py:249 -- Deployment 'model_two' is ready at `http://127.0.0.1:8000/model_two`. component=serve deployment=model_two
2022-02-19 02:08:07,439	INFO api.py:242 -- Updating deployment 'ComposedModel'. component=serve deployment=ComposedM

In [8]:
serve.list_deployments()

{'model_one': Deployment(name=model_one,version=None,route_prefix=/model_one),
 'model_two': Deployment(name=model_two,version=None,route_prefix=/model_two),
 'ComposedModel': Deployment(name=ComposedModel,version=None,route_prefix=/composed)}

#### Send requests

In [9]:
for _ in range(8):
    resp = requests.get("http://127.0.0.1:8000/composed", data="Hey!")
    print(resp.json())

{'model_used: 1 ; score': 0.4446714648125224}
{'model_used: 1 ; score': 0.3584252167752343}
{'model_used: 1 & 2;  score': 0.9944325362625361}
{'model_used: 1 ; score': 0.17709078567333958}
{'model_used: 1 ; score': 0.43465235796229407}
{'model_used: 1 & 2;  score': 0.8347432503521646}
{'model_used: 1 ; score': 0.2922063099356801}
{'model_used: 1 & 2;  score': 0.8178159182653494}
[2m[36m(model_one pid=19976)[0m Model 1 called with data:b'Hey!'
[2m[36m(model_one pid=19976)[0m Model 1 called with data:b'Hey!'
[2m[36m(model_one pid=19976)[0m Model 1 called with data:b'Hey!'
[2m[36m(model_one pid=19976)[0m Model 1 called with data:b'Hey!'
[2m[36m(model_one pid=19976)[0m Model 1 called with data:b'Hey!'
[2m[36m(model_one pid=19976)[0m Model 1 called with data:b'Hey!'
[2m[36m(model_one pid=19976)[0m Model 1 called with data:b'Hey!'
[2m[36m(model_one pid=19976)[0m Model 1 called with data:b'Hey!'
[2m[36m(model_two pid=19979)[0m Model 2 called with data:b'Hey!'
[2m[

In [10]:
ray.shutdown()