# Many Models

This is an example of inferencing multiple models with Triton. The models can also run on different frameworks.

![multi](multimodel.png)

## Download models

It's important that your model have this directory structure for Triton Inference Server to be able to load it. [Read more about the directory structure that Triton expects](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/model_repository.html).

In [None]:
!pip install azure-storage-blob
!pip install nvidia-pyindex
!pip install tritonclient[http]
!pip install --upgrade nltk geventhttpclient python-rapidjson

In [None]:
import os
import sys
from pathlib import Path
from src.model_utils import download_triton_models, delete_triton_models

prefix = Path(".")
download_triton_models(prefix)

## Register models

Download multiple models into models folder. The registered models should follow the Triton specified model folder structure for Triton Inference Server to be able to load it.

In [None]:
subscription = "subscription_id"
resource_group = "resource_group"
workspace = "workspace"
model_name = "multi-models"
endpoint_name = "multi1"

In [None]:
!az account set --subscription $subscription
!az configure --defaults workspace=$workspace group=$resource_group

In [None]:
!az ml model create -n $model_name -v 2 -l models -g $resource_group -w $workspace --subscription $subscripton

In [None]:
!az ml model show -n $model_name -v 2 -g $resource_group -w $workspace --subscription $subscripton

## Create endpoint

Deploy to a pre-created [Aks Compute](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.compute.aks.akscompute?view=azure-ml-py#provisioning-configuration-agent-count-none--vm-size-none--ssl-cname-none--ssl-cert-pem-file-none--ssl-key-pem-file-none--location-none--vnet-resourcegroup-name-none--vnet-name-none--subnet-name-none--service-cidr-none--dns-service-ip-none--docker-bridge-cidr-none--cluster-purpose-none--load-balancer-type-none-) named aks-gpu-deploy. For other options, see [our documentation](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=azcli).

Please note for aks deployment change below yml file name to 'multi-endpoint-aks.yml' and for managed deployment change it to 'multi-endpoint-mir-gpu.yml'

In [None]:
!az ml endpoint create -g $resource_group -w $workspace --name $endpoint_name -f multi-endpoint-aks.yml

In [None]:
!az ml endpoint show -g $resource_group -w $workspace --name $endpoint_name

## Test Webservice

#### Get scoring URI 

In [None]:
url = !az ml endpoint show -g $resource_group -w $workspace -n $endpoint_name --query "scoring_uri"
service_url = url[1].strip('!"').rstrip('/score')
print(service_url)

#### Get Auth token

In [None]:
import re
key = !az ml endpoint get-credentials -n $endpoint_name -g $resource_group -w $workspace
Service_key = re.split(": |,",key[2])[1]
print(Service_key)

#### Check the status of server and models

In [None]:
import tritonclient.http as tritonhttpclient

service_key = Service_key.strip('!"')
headers = {}
headers["Authorization"] = f"Bearer {service_key}"

triton_client = tritonhttpclient.InferenceServerClient(service_url)

# Check the state of server.
health_ctx = triton_client.is_server_ready(headers=headers)
print("Is server ready - {}".format(health_ctx))

# Check the status of desnsenet model.
densenet_model = "densenet_onnx"
status_ctx = triton_client.is_model_ready(densenet_model, "1", headers)
print("Is model ready - {}".format(status_ctx))

# Check the status of bidaf-9 model.
bidaf_model = "bidaf-9"
status_ctx = triton_client.is_model_ready("bidaf-9", "1", headers)
print("Is model ready - {}".format(status_ctx))

#### Score against the densenet model

In [None]:
import numpy as np
import requests
from pathlib import Path
from src.densenet_utils import preprocess, postprocess

img_content = requests.get("https://aka.ms/peacock-pic").content

img_data = preprocess(img_content, scaling="INCEPTION")

model_metadata = triton_client.get_model_metadata(model_name=densenet_model, headers=headers)

input_meta = model_metadata["inputs"]
output_meta = model_metadata["outputs"]

# Populate the inputs array
inputs = []
input = tritonhttpclient.InferInput(input_meta[0]["name"], img_data.shape, input_meta[0]["datatype"])
input.set_data_from_numpy(img_data, binary_data=False)
inputs.append(input)

outputs = []
# Populate the outputs array
for out_meta in output_meta:
    output_name = out_meta["name"]
    output = tritonhttpclient.InferRequestedOutput(output_name, binary_data=False)
    outputs.append(output)

# Run inference
res = triton_client.infer(densenet_model, inputs, request_id="0", outputs=outputs, model_version="1", headers=headers)

out_data = res.as_numpy('fc6_1')
max_label = np.argmax(out_data[0])
label_path = Path(".").joinpath("src","densenet_labels.txt")
result = postprocess(max_label, label_path)

result

#### Score against the bidaf model

In [None]:
import json
from src.bidaf_utils import preprocess, postprocess
from tritonclient.utils import triton_to_np_dtype

context = "A quick brown fox jumped over the lazy dog."
query = "Which animal was lower?"

model_metadata = triton_client.get_model_metadata(model_name=bidaf_model, headers=headers)

input_meta = model_metadata["inputs"]
output_meta = model_metadata["outputs"]

# We use the np.object data type for string data
np_dtype = triton_to_np_dtype(input_meta[0]["datatype"])
cw, cc = preprocess(context, np_dtype)
qw, qc = preprocess(query, np_dtype)

input_mapping = {
    "query_word": qw,
    "query_char": qc,
    "context_word": cw,
    "context_char": cc,
}

inputs = []
outputs = []

# Populate the inputs array
for in_meta in input_meta:
    input_name = in_meta["name"]
    data = input_mapping[input_name]
    input = tritonhttpclient.InferInput(input_name, data.shape, in_meta["datatype"])
    input.set_data_from_numpy(data, binary_data=False)
    inputs.append(input)

# Populate the outputs array
for out_meta in output_meta:
    output_name = out_meta["name"]
    output = tritonhttpclient.InferRequestedOutput(output_name, binary_data=False)
    outputs.append(output)

# Run inference
res = triton_client.infer(bidaf_model, inputs, request_id="0", outputs=outputs, model_version="1", headers=headers)

result = postprocess(context_words=cw, answer=res)

result

# Delete the webservice and the model

In [None]:
!az ml model delete -n $endpoint_name -g $resource_group -w $workspace
!az ml model delete -n $model_name -v 2

# Next steps

Try reading [our documentation](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-with-triton?tabs=python) to use Triton with your own models or check out the other notebooks in this folder for ways to do pre- and post-processing on the server.