# Many Models

This is an example of inferencing multiple models with Triton. The models can also run on different frameworks.

![multi](multimodel.png)

## Download models

It's important that your model have this directory structure for Triton Inference Server to be able to load it. [Read more about the directory structure that Triton expects](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/model_repository.html).

In [None]:
!pip install azure-storage-blob

In [None]:
import os
import sys
from pathlib import Path
from src.model_utils import download_triton_models, delete_triton_models

prefix = Path(".")
download_triton_models(prefix)

## Register models

Download multiple models into models folder. The registered models should follow the Triton specified model folder structure for Triton Inference Server to be able to load it.

In [None]:
subscripton = "subscription_id"
resource_group = "resource_group"
workspace = "workspace"
model_name = "multi-models"
service_name = "multi1"

In [None]:
!az ml model create -n $model_name -v 2 -l models -g $resource_group -w $workspace --subscription $subscripton

In [None]:
!az ml model show -n $model_name -v 2 -g $resource_group -w $workspace --subscription $subscripton

## Create endpoint

Deploy to a pre created AKS compute

In [None]:
!az ml endpoint create -g $resource_group -w $workspace --name $service_name -f .\deployment.yml

In [None]:
!az ml endpoint show -g $resource_group -w $workspace --name $service_name

## Test Webservice

Get scoring URI and auth token

In [None]:
!az ml endpoint list-keys -g $resource_group -w $workspace --name $service_name

In [None]:
import requests

service_key = "service_key"
headers = {}
headers["Authorization"] = f"Bearer {service_key}"

# Check the state of server.
service_url = "service_url"
requests.get(f"{service_url}/v2/health/ready", headers=headers)

In [None]:
# Check the status of model.
resp = requests.post(f"{service_url}/v2/repository/index", headers=headers)
print(resp.text)

In [None]:
# Check metadata of model for inference 
resp = requests.get(f"{service_url}/v2/models/bidaf-9", headers=headers)
print(resp.text)

In [None]:
resp = requests.get(f"{service_url}/v2/models/densenet_onnx", headers=headers)
print(resp.text)