# Deploy a model served with Triton using a custom container in an online endpoint
Learn how to deploy a model using Triton as an online endpoint in Azure Machine Learning.

Triton is multi-framework, open-source software that is optimized for inference. It supports popular machine learning frameworks like TensorFlow, ONNX Runtime, PyTorch, NVIDIA TensorRT, and more. It can be used for your CPU or GPU workloads.

## Prerequisites

* To use Azure Machine Learning, you must have an Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://azure.microsoft.com/free/).

* Install and configure the [Python SDK v2](sdk/setup.sh).

* You must have an Azure resource group, and you (or the service principal you use) must have Contributor access to it.

* You must have an Azure Machine Learning workspace. 

* You must have a container registry associated with your workspace. 

* You must have additional Python packages installed for scoring, install them with this code: 

In [None]:
%pip install --pre azure-mgmt-containerregistry

In [None]:
%pip install numpy tritonclient[http] pillow gevent
%pip install --pre azure-containerregistry

### Please note, for Triton no-code-deployment, testing via local endpoints is currently not supported, so this tutorial will only show how to set up on online endpoint.

## 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

### 1.1 Configure workspace details
To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. 

In [None]:
subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace_name = "<AML_WORKSPACE_NAME>"

In [1]:
#TODO: Delete 
subscription_id = "ed2cab61-14cc-4fb3-ac23-d72609214cfd"
resource_group = "inference_turing"
workspace_name = "tritonex"

### 1.2 Generate an endpoint name

In [None]:
import random 
endpoint_name = f"endpoint-{random.randint(0, 10000)}" 

In [2]:
#TODO: Delete 
endpoint_name = "tritonex1"

### 1.3 Get a handle to the workspace

We use these details above in the `MLClient` from `azure.ai.ml` to get a handle to the required Azure Machine Learning workspace. We use the default [default azure authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for this tutorial. Check the [configuration notebook](../../jobs/configuration.ipynb) for more details on how to configure credentials and connect to a workspace.

In [3]:
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id,
    resource_group,
    workspace_name,
)

### 1.4 Get the container registry associated with the workspace

In [4]:
workspace = ml_client.workspaces.get(workspace_name)
acr_uri = workspace.container_registry
acr_name = acr_uri.split("/")[-1]

## 2. Create a Custom Container

In [5]:
from azure.containerregistry import ContainerRegistryClient

endpoint = f"https://{acr_name}.azurecr.io"
audience = "https://management.azure.com"
acr_client = ContainerRegistryClient(endpoint, DefaultAzureCredential(), audience=audience)

In [7]:
from azure.mgmt.containerregistry import ContainerRegistryManagementClient
acrm_client = ContainerRegistryManagementClient(
    credential=DefaultAzureCredential(),
    subscription_id=subscription_id,
    base_url="https://management.azure.com"
)

In [8]:
acr = acrm_client.registries.get(
    resource_group_name=resource_group,
    registry_name=acr_name
)

In [None]:
acr_client.upload_blob

In [12]:
acr.

{'id': '/subscriptions/ed2cab61-14cc-4fb3-ac23-d72609214cfd/resourceGroups/inference_turing/providers/Microsoft.ContainerRegistry/registries/tritonexcr',
 'name': 'tritonexcr',
 'type': 'Microsoft.ContainerRegistry/registries',
 'location': 'australiaeast',
 'tags': {},
 'system_data': {'created_by': 'v-alwallace@microsoft.com',
  'created_by_type': 'User',
  'created_at': '2022-09-13T20:06:06.569682Z',
  'last_modified_by': 'v-alwallace@microsoft.com',
  'last_modified_by_type': 'User',
  'last_modified_at': '2022-09-13T20:06:06.569682Z'},
 'sku': {'name': 'Standard', 'tier': 'Standard'},
 'login_server': 'tritonexcr.azurecr.io',
 'creation_date': '2022-09-13T20:06:06.569682Z',
 'provisioning_state': 'Succeeded',
 'admin_user_enabled': True,
 'policies': {'quarantine_policy': {'status': 'disabled'},
  'trust_policy': {'type': 'Notary', 'status': 'disabled'},
  'retention_policy': {'days': 7,
   'last_updated_time': '2022-09-13T20:06:11.620909Z',
   'status': 'disabled'},
  'export_pol

In [None]:
acrm_client.registries.

In [10]:
l = acrm_client.builds.list(
    resource_group,
    acr_name
)

In [None]:
acr_client.upload_blob()

In [11]:
list(l)

[]

In [15]:
!az account set --subscription "ed2cab61-14cc-4fb3-ac23-d72609214cfd"
!az configure --defaults group=inference_turing
!az configure --defaults workspace=tritonex


In [None]:
!az login --use-device-code

In [20]:
img_tag =f'{acr_name}.azurecr.io/azureml-examples/triton-cc:10'


!az acr build -r {acr_name} -t {img_tag} --resource-group {resource_group} -f triton-cc.dockerfile . 

[93mPacking source code into tar to upload...[0m
[93mUploading archived source code from '/tmp/build_archive_6975d0d8dd0a45a2a625333b0431d29a.tar.gz'...[0m
[93mSending context (6.405 KiB) to registry: tritonexcr...[0m
[K - Starting ..[93mQueued a build with ID: cr1[0m
[93mWaiting for an agent...[0m
2022/09/13 23:12:22 Downloading source code...
2022/09/13 23:12:23 Finished downloading source code
2022/09/13 23:12:24 Using acb_vol_ff63a62f-1ea2-4d86-a8ab-4dab4fb9890e as the home volume
2022/09/13 23:12:24 Setting up Docker configuration...
2022/09/13 23:12:24 Successfully set up Docker configuration
2022/09/13 23:12:24 Logging in to registry: tritonexcr.azurecr.io
2022/09/13 23:12:25 Successfully logged into tritonexcr.azurecr.io
2022/09/13 23:12:25 Executing step ID: build. Timeout(sec): 28800, Working directory: '', Network: ''
2022/09/13 23:12:25 Scanning for dependencies...
2022/09/13 23:12:26 Successfully scanned dependencies
2022/09/13 23:12:26 Launching container with 

## 3. Configure deployment and associated resources

A deployment is a set of resources required for hosting the model that does the actual inferencing. We will create a deployment for our endpoint using the `ManagedOnlineDeployment` class.

### Key aspects of deployment 
- `name` - Name of the deployment.
- `endpoint_name` - Name of the endpoint to create the deployment under.
- `model` - The model to use for the deployment. This value can be either a reference to an existing versioned model in the workspace or an inline model specification.
- `environment` - The environment to use for the deployment. This value can be either a reference to an existing versioned environment in the workspace or an inline environment specification.
- `code_configuration` - the configuration for the source code and scoring script
    - `path`- Path to the source code directory for scoring the model
    - `scoring_script` - Relative path to the scoring file in the source code directory
- `instance_type` - The VM size to use for the deployment. For the list of supported sizes, see [Managed online endpoints SKU list](https://docs.microsoft.com/en-us/azure/machine-learning/reference-managed-online-endpoints-vm-sku-list).
- `instance_count` - The number of instances to use for the deployment

### 3.1 Configure online endpoint
`endpoint_name`: The name of the endpoint. It must be unique in the Azure region. Naming rules are defined under [managed online endpoint limits](https://docs.microsoft.com/azure/machine-learning/how-to-manage-quotas#azure-machine-learning-managed-online-endpoints-preview).

`auth_mode` : Use `key` for key-based authentication. Use `aml_token` for Azure Machine Learning token-based authentication. A `key` does not expire, but `aml_token` does expire. 

Optionally, you can add description, tags to your endpoint.

In [None]:
from azure.ai.ml.entities import ManagedOnlineEndpoint

endpoint = ManagedOnlineEndpoint(
    name=endpoint_name,
    auth_mode="key"
)

### 3.2 Configure a model

In [None]:
from azure.ai.ml.entities import Model
model = Model(path="./models/model_1", type="triton_model")

### 3.3 Configure an environment

#### Readiness route vs. liveness route
An HTTP server defines paths for both liveness and readiness. A liveness route is used to check whether the server is running. A readiness route is used to check whether the server is ready to do work. In machine learning inference, a server could respond 200 OK to a liveness request before loading a model. The server could respond 200 OK to a readiness request only after the model has been loaded into memory.

Review the [Kubernetes documentation](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) for more information about liveness and readiness probes.

Notice that this deployment uses the same path for both liveness and readiness, since TF Serving only defines a liveness route.

In [None]:
from azure.ai.ml.entities import Environment

enviroment = Environment(
    name="triton-cc-env", 
    inference_config={
        "liveness_route" : 
        { 
            "path": "/v2/health/live",
            "port": 8000
        },
        "readiness_route" : 
        {
            "path": "/v2/health/ready",
            "port": 8000
        },
        "scoring_route" : {
            "path": "/",
            "port": 8000
        }
    }, 
    image=img_tag
)

### 3.4 Configure the deployment 

In [None]:
from azure.ai.ml.entities import ManagedOnlineDeployment

deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name=endpoint_name,
    environment=enviroment,
    model=model,
    instance_type="Standard_NC6s_v3",
    instance_count=1,
    model_mount_path="/models"
)

## 4. Deploy to Azure

### 4.1 Create the endpoint
Using the `MLClient` created earlier, we will now create the Endpoint in the workspace. This command will start the endpoint creation and return a confirmation response while the endpoint creation continues.

In [None]:
endpoint = ml_client.begin_create_or_update(endpoint)

In [None]:
#TODO: Delete
endpoint = ml_client.online_endpoints.get(endpoint_name)

### 4.2 Create the deployment

Using the `MLClient` created earlier, we will now create the deployment in the workspace. This command will start the deployment creation and return a confirmation response while the deployment creation continues.

In [None]:
deployment = ml_client.begin_create_or_update(deployment)

### 4.3 Set traffic to 100% for deployment

In [None]:
endpoint.traffic = {"blue": 100}
ml_client.begin_create_or_update(endpoint)

## 4. Test the endpoint with sample data
This version of the triton server requires pre- and post-image processing. Below we show how to invoke the endpoint with this processing.

### 4.1 Retrieve the scoring URI


In [None]:
scoring_uri = endpoint.scoring_uri

### 4.2 Retrieve the endpoint auth key

In [None]:
keys = ml_client.online_endpoints.list_keys(endpoint_name)
auth_key  = keys.primary_key

### 4.3 Test the endpoint
The below script imports pre- and post-processing functions from `sdk/endpoints/online/triton/scoring_utils/prepost.py`. We first test the model/server readiness and then use those functions to convert the image into a triton readable format and issue the scoring request.

In [None]:
# test the blue deployment with some sample data

import requests
import numpy as np
from PIL import Image
from importlib import util
import gevent.ssl
import tritonclient.http as tritonhttpclient
from importlib import util
import sys
sys.path.append("../../triton/scoring_utils")
import prepost

img_url = "../../triton/scoring_utils/peacock.jpg"

# We remove the scheme from the url
scoring_uri = scoring_uri[8:]

# Initialize client handler 
triton_client = tritonhttpclient.InferenceServerClient(
        url=scoring_uri,
        ssl=True,
        ssl_context_factory=gevent.ssl._create_default_https_context,
    )

# Create headers
headers = {}
headers["Authorization"] = f"Bearer {auth_key}"

# Check status of triton server
health_ctx = triton_client.is_server_ready(headers=headers)
print("Is server ready - {}".format(health_ctx))

# Check status of model
model_name = "model_1"
status_ctx = triton_client.is_model_ready(model_name, "1", headers)
print("Is model ready - {}".format(status_ctx))

#img_content = requests.get(img_url).content
img_data = prepost.preprocess('peacock-image.png')

# Populate inputs and outputs
input = tritonhttpclient.InferInput("data_0", img_data.shape, "FP32")
input.set_data_from_numpy(img_data)
inputs = [input]
output = tritonhttpclient.InferRequestedOutput("fc6_1")
outputs = [output]

result = triton_client.infer(model_name, inputs, outputs=outputs, headers=headers)
max_label = np.argmax(result.as_numpy("fc6_1"))
label_name = prepost.postprocess(max_label)
print(label_name)

## 5. Managing endpoints and deployments

### 5.1 Get the logs for the new deployment

In [None]:
ml_client.online_deployments.get_logs(
    name="blue", endpoint_name=endpoint_name, lines=50
)

# 6. Delete the endpoint

In [None]:
ml_client.online_endpoints.begin_delete(name=endpoint_name)