# K8s Envoy Test with Changing Model Replica

In this test we change the number of replicas to a given model and then look at whether inference requests are being served still.

So far we can expect 503 due to the way we do envoy updates.

## Prerequisites

- (KinD) cluster with 3 triton replica servers. 
    - One way to do so is to increase triton server `Replicas` to 3 in `k8s/yaml/seldon-v2-servers.yaml` and then apply the manifest 
    - via `kubectl apply -f k8s/yaml/seldon-v2-servers.yaml -n seldon-mesh`.

In [1]:
SCHEDULER_IP=!kubectl get svc seldon-scheduler -n seldon-mesh -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
SCHEDULER_IP=SCHEDULER_IP[0]

In [2]:
MESH_IP=!kubectl get svc seldon-mesh -n seldon-mesh -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
MESH_IP=MESH_IP[0]

## Deploy single replica `tfsimple` model

In [110]:
!grpcurl -d '{"model":{ \
              "meta":{"name":"tfsimple"},\
              "modelSpec":{"uri":"gs://seldon-models/triton/simple",\
                           "requirements":["tensorflow"],\
                           "memoryBytes":500},\
              "deploymentSpec":{"replicas":1}}}' \
         -plaintext \
         -import-path ../../apis \
         -proto ../../apis/mlops/scheduler/scheduler.proto  ${SCHEDULER_IP}:9004 seldon.mlops.scheduler.Scheduler/LoadModel

{
  
}


In [111]:
!seldon model status tfsimple -w ModelAvailable --scheduler-host "$SCHEDULER_IP:9004" | jq -M .

{}


## Scaling up

In [118]:
%%bash
for i in {1..1000}; 
do

url=http://${MESH_IP}/v2/models/tfsimple/infer 
ret=`curl -s -o /dev/null -w "%{http_code}" curl -s -o /dev/null -w "%{http_code}" "${url}" -H "Content-Type: application/json" \
        -d '{"inputs":[{"name":"INPUT0","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]}]}'` \
&& if [ $ret -ne 200 ]; then echo "Failed with code ${ret}"; fi &

if [[ $i -eq 500 ]]; then
    echo "Increasing replica count"
    grpcurl -d '{"model":{ 
              "meta":{"name":"tfsimple"},
              "modelSpec":{"uri":"gs://seldon-models/triton/simple",
                           "requirements":["tensorflow"],
                           "memoryBytes":500},
              "deploymentSpec":{"replicas":3}}}' \
         -plaintext \
         -import-path ../../apis \
         -proto ../../apis/mlops/scheduler/scheduler.proto  $SCHEDULER_IP:9004 seldon.mlops.scheduler.Scheduler/LoadModel &
fi

done
echo "Done"

Increasing replica count
Done
{
  
}


## Scaling down

In [119]:
%%bash
for i in {1..1000}; 
do

url=http://${MESH_IP}/v2/models/tfsimple/infer 
ret=`curl -s -o /dev/null -w "%{http_code}" curl -s -o /dev/null -w "%{http_code}" "${url}" -H "Content-Type: application/json" \
        -d '{"inputs":[{"name":"INPUT0","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]}]}'` \
&& if [ $ret -ne 200 ]; then echo "Failed with code ${ret}"; fi &

if [[ $i -eq 500 ]]; then
    echo "Decrease replica count"
    grpcurl -d '{"model":{ 
              "meta":{"name":"tfsimple"},
              "modelSpec":{"uri":"gs://seldon-models/triton/simple",
                           "requirements":["tensorflow"],
                           "memoryBytes":500},
              "deploymentSpec":{"replicas":1}}}' \
         -plaintext \
         -import-path ../../apis \
         -proto ../../apis/mlops/scheduler/scheduler.proto  $SCHEDULER_IP:9004 seldon.mlops.scheduler.Scheduler/LoadModel &
fi

done
echo "Done"

Decrease replica count
{
  
}
Failed with code 000503
Failed with code 000503
Failed with code 000503
Failed with code 000503
Failed with code 000503
Failed with code 000503
Failed with code 000503
Failed with code 000503
Failed with code 000503
Failed with code 000503
Failed with code 000503
Failed with code 000503
Failed with code 000503
Failed with code 000503
Failed with code 000503
Failed with code 000503
Failed with code 000404
Failed with code 000404
Failed with code 000404
Failed with code 000404
Failed with code 000404
Done
Failed with code 000404
Failed with code 000503
Failed with code 000404
Failed with code 000404
Failed with code 000404


In [109]:
!grpcurl -d '{"model": {"name" : "tfsimple"}}' \
         -plaintext \
         -import-path ../../apis/ \
         -proto ../../apis/mlops/scheduler/scheduler.proto  ${SCHEDULER_IP}:9004 seldon.mlops.scheduler.Scheduler/UnloadModel

{
  
}
