# Multimodel serving with over-commit example


Note: this notebook requires access to internal services, so either `make start-all-host` (under `scheduler` sub directory) or expose the relevant ports.

## `iris` model on `MLServer`

In [9]:
%env INFER_ENDPOINT=0.0.0.0:9000
%env SCHEDULER_ENDPOINT=0.0.0.0:9004
%env MLSERVER_DEBUG=0.0.0.0:7777
%env TRITON_DEBUG=0.0.0.0:7778

env: INFER_ENDPOINT=0.0.0.0:9000
env: SCHEDULER_ENDPOINT=0.0.0.0:9004
env: MLSERVER_DEBUG=0.0.0.0:7777
env: TRITON_DEBUG=0.0.0.0:7778


By default if running locally there is 1 replica of `mlserver` with memory slots up to 10MB and 20% overcommit budget. We will load 11 `iris` models each requiring 1MB worth of memory slots as an example. These numbers allow for 10 models to be active at the same time and 1 model to be evicted to disk.  

In [13]:
%%bash
for i in {1..11}; 
do

echo "loading model iris$i"

data='{ 
        "model":{ 
            "meta": {"name":"iris'"$i"'"}, 
            "modelSpec" : { 
                "uri":"gs://seldon-models/mlserver/iris", 
                "requirements":["sklearn"],  
                "memoryBytes":1000000}, 
            "deploymentSpec": {"replicas":1}
            }
      }'

grpcurl -d "$data" \
-plaintext \
-import-path ../apis \
-proto ../apis/mlops/scheduler/scheduler.proto "$SCHEDULER_ENDPOINT" seldon.mlops.scheduler.Scheduler/LoadModel

sleep 0.05
done

loading model iris1
{
  
}
loading model iris2
{
  
}
loading model iris3
{
  
}
loading model iris4
{
  
}
loading model iris5
{
  
}
loading model iris6
{
  
}
loading model iris7
{
  
}
loading model iris8
{
  
}
loading model iris9
{
  
}
loading model iris10
{
  
}
loading model iris11
{
  
}


Get the list of models on this mlserver replica and whether they are loaded in main memory

In [14]:
%%bash
grpcurl -d '{}' \
         -plaintext \
         -import-path ../apis/ \
         -proto ../apis/mlops/agent_debug/agent_debug.proto  ${MLSERVER_DEBUG} seldon.mlops.agent_debug.AgentDebugService/ReplicaStatus

{
  "models": [
    {
      "state": "InMemory",
      "lastAccessed": "2023-01-20T18:45:40Z",
      "name": "iris10_1"
    },
    {
      "state": "InMemory",
      "lastAccessed": "2023-01-20T18:44:48Z",
      "name": "iris3_1"
    },
    {
      "state": "InMemory",
      "lastAccessed": "2023-01-20T18:44:48Z",
      "name": "iris6_1"
    },
    {
      "state": "InMemory",
      "lastAccessed": "2023-01-20T18:44:48Z",
      "name": "iris1_1"
    },
    {
      "lastAccessed": "0001-01-01T00:00:00Z",
      "name": "iris11_1"
    },
    {
      "state": "InMemory",
      "lastAccessed": "2023-01-20T18:44:48Z",
      "name": "iris9_1"
    },
    {
      "state": "InMemory",
      "lastAccessed": "2023-01-20T18:44:48Z",
      "name": "iris7_1"
    },
    {
      "state": "InMemory",
      "lastAccessed": "2023-01-20T18:44:48Z",
      "name": "iris8_1"
    },
    {
      "state": "InMemory",
      "lastAccessed": "2023-01-20T18:44:48Z",
      "name": "iris5_1"
    },
    {
      "state"

In [15]:
%%bash
for i in {1..11}; 
do

url=http://${INFER_ENDPOINT}/v2/models/iris${i}/infer 
ret=`curl -s -o /dev/null -w "%{http_code}" "${url}" -H "Content-Type: application/json" \
        -d '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'`
if [ $ret -ne 200 ]; then
    echo "Failed with code ${ret}"
    exit
fi

done
echo "All succeeded"

All succeeded


Doing inference for all models succeeds as swapping in and out models is handled automatically

In [19]:
%%bash
for i in {1..10}; 
do 

for j in {1..11};
do
curl -s -o /dev/null http://"$INFER_ENDPOINT"/v2/models/iris$j/infer -H "Content-Type: application/json"  \
        -d '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}' &
done

done


Unload models

In [20]:
%%bash

for i in {1..11};
do

grpcurl -d '{"model": {"name" : "iris'"$i"'"}}' \
         -plaintext \
         -import-path ../apis/ \
         -proto ../apis/mlops/scheduler/scheduler.proto "$SCHEDULER_ENDPOINT" seldon.mlops.scheduler.Scheduler/UnloadModel


done

{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}


## `tfsimple` model on `triton`

With `tfsimple` on `triton` we will reduce the memory slot required to 100KB, which will allow us to load at least 100 models on the server in memory. The remaining models (10) will have to be evicted.

In [21]:
%%bash
for i in {1..110}; 
do

echo "loading model tfsimple$i"

data='{ 
        "model":{ 
            "meta": {"name":"tfsimple'"$i"'"}, 
            "modelSpec" : { 
                "uri":"gs://seldon-models/triton/simple", 
                "requirements":["tensorflow"],  
                "memoryBytes":100000}, 
            "deploymentSpec": {"replicas":1}
            }
      }'

grpcurl -d "$data" \
-plaintext \
-import-path ../apis \
-proto ../apis/mlops/scheduler/scheduler.proto "$SCHEDULER_ENDPOINT" seldon.mlops.scheduler.Scheduler/LoadModel

sleep 0.05
done

loading model tfsimple1
{
  
}
loading model tfsimple2
{
  
}
loading model tfsimple3
{
  
}
loading model tfsimple4
{
  
}
loading model tfsimple5
{
  
}
loading model tfsimple6
{
  
}
loading model tfsimple7
{
  
}
loading model tfsimple8
{
  
}
loading model tfsimple9
{
  
}
loading model tfsimple10
{
  
}
loading model tfsimple11
{
  
}
loading model tfsimple12
{
  
}
loading model tfsimple13
{
  
}
loading model tfsimple14
{
  
}
loading model tfsimple15
{
  
}
loading model tfsimple16
{
  
}
loading model tfsimple17
{
  
}
loading model tfsimple18
{
  
}
loading model tfsimple19
{
  
}
loading model tfsimple20
{
  
}
loading model tfsimple21
{
  
}
loading model tfsimple22
{
  
}
loading model tfsimple23
{
  
}
loading model tfsimple24
{
  
}
loading model tfsimple25
{
  
}
loading model tfsimple26
{
  
}
loading model tfsimple27
{
  
}
loading model tfsimple28
{
  
}
loading model tfsimple29
{
  
}
loading model tfsimple30
{
  
}
loading model tfsimple31
{
  
}
loading model tfs

In [22]:
%%bash
for i in {1..110}; 
do

url=http://${INFER_ENDPOINT}/v2/models/tfsimple${i}/infer 
ret=`curl -s -o /dev/null -w "%{http_code}" curl -s -o /dev/null -w "%{http_code}" "${url}" -H "Content-Type: application/json" \
        -d '{"inputs":[{"name":"INPUT0","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]}]}'`
if [ $ret -ne 200 ]; then
    echo "Failed with code ${ret}"
    exit
fi
done
echo "All succeeded"

All succeeded


In [23]:
%%bash
grpcurl -d '{}' \
         -plaintext \
         -import-path ../apis/ \
         -proto ../apis/mlops/agent_debug/agent_debug.proto  ${TRITON_DEBUG} seldon.mlops.agent_debug.AgentDebugService/ReplicaStatus

{
  "models": [
    {
      "state": "InMemory",
      "lastAccessed": "2023-01-20T18:48:28Z",
      "name": "tfsimple89_1"
    },
    {
      "state": "InMemory",
      "lastAccessed": "2023-01-20T18:48:26Z",
      "name": "tfsimple34_1"
    },
    {
      "state": "InMemory",
      "lastAccessed": "2023-01-20T18:48:27Z",
      "name": "tfsimple59_1"
    },
    {
      "state": "InMemory",
      "lastAccessed": "2023-01-20T18:48:28Z",
      "name": "tfsimple76_1"
    },
    {
      "state": "InMemory",
      "lastAccessed": "2023-01-20T18:48:27Z",
      "name": "tfsimple45_1"
    },
    {
      "state": "InMemory",
      "lastAccessed": "2023-01-20T18:48:27Z",
      "name": "tfsimple60_1"
    },
    {
      "lastAccessed": "0001-01-01T00:00:00Z",
      "name": "tfsimple8_1"
    },
    {
      "state": "InMemory",
      "lastAccessed": "2023-01-20T18:48:26Z",
      "name": "tfsimple19_1"
    },
    {
      "state": "InMemory",
      "lastAccessed": "2023-01-20T18:48:26Z",
      "name":

In [24]:
%%bash

for i in {1..110};
do

grpcurl -d '{"model": {"name" : "tfsimple'"$i"'"}}' \
         -plaintext \
         -import-path ../apis/ \
         -proto ../apis/mlops/scheduler/scheduler.proto "$SCHEDULER_ENDPOINT" seldon.mlops.scheduler.Scheduler/UnloadModel


done

{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
{
  
}
