# Deploy to Triton Inference Server locally

description: (preview) deploy an image classification model trained on densenet locally via Triton

Please note that this Public Preview release is subject to the [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).

In [15]:
!pip install nvidia-pyindex
!pip install --upgrade tritonclient
!pip install azureml_core-1.19.0a1-py3-none-any.whl

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already up-to-date: tritonclient in /home/gopalv/miniconda3/envs/azureml/lib/python3.7/site-packages (2.4.0)
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


In [16]:
from azureml.core import Workspace

ws = Workspace.from_config()
ws

Workspace.create(name='Inference-PM-AML-Workspace', subscription_id='92c76a2f-0e1c-4216-b65e-abf7a3f34c1e', resource_group='Inference-PM')

## Download model

It's important that your model have this directory structure for Triton Inference Server to be able to load it. [Read more about the directory structure that Triton expects](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/model_repository.html).

In [17]:
import os
import sys
from pathlib import Path
from src.model_utils import download_triton_models, delete_triton_models

prefix = Path(".")
download_triton_models(prefix)

successfully downloaded model: densenet_onnx
successfully downloaded model: bidaf-9


## Register model

A registered model is a logical container stored in the cloud, containing all files located at `model_path`, which is associated with a version number and other metadata.

In [18]:
from azureml.core.model import Model

model_path = "models"

model = Model.register(
    model_path=model_path,
    model_name="densenet-onnx-example",
    tags={"area": "Image classification", "type": "classification"},
    description="Image classification trained on Imagenet Dataset",
    workspace=ws,
)

print(model)

Registering model densenet-onnx-example
Model(workspace=Workspace.create(name='Inference-PM-AML-Workspace', subscription_id='92c76a2f-0e1c-4216-b65e-abf7a3f34c1e', resource_group='Inference-PM'), name=densenet-onnx-example, id=densenet-onnx-example:7, version=7, tags={'area': 'Image classification', 'type': 'classification'}, properties={})


## Deploy webservice

In this case we deploy to the local compute, but for other options, see [our documentation](https://docs.microsoft.com/azure/machine-learning/how-to-deploy-and-where?tabs=azcli).


In [19]:
from azureml.core.webservice import LocalWebservice
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.model import InferenceConfig
from random import randint

service_name = "triton-densenet-onnx-local" + str(randint(10000, 99999))


env = Environment("triton-tutorial")
env.docker.base_image = None
env.docker.base_dockerfile = "Dockerfile"
env.python.user_managed_dependencies = True
env.python.interpreter_path = "/opt/miniconda/bin/python"

env.environment_variables["WORKER_COUNT"] = "1"

inference_config = InferenceConfig(
    # this entry script is where we dispatch a call to the Triton server
    source_directory="src",
    entry_script="score_densenet.py",
    environment=env,
)

config = LocalWebservice.deploy_configuration(port=6789)

service = Model.deploy(
    workspace=ws,
    name=service_name,
    models=[model],
    inference_config=inference_config,
    deployment_config=config,
    overwrite=True,
)

service.wait_for_deployment(show_output=True)

Downloading model densenet-onnx-example:7 to /tmp/azureml_tsae_34e/densenet-onnx-example/7
Generating Docker build context.
Package creation Succeeded
Logging into Docker registry inferencepmaef584480.azurecr.io
Logging into Docker registry inferencepmaef584480.azurecr.io
Building Docker image from Dockerfile...
Step 1/5 : FROM inferencepmaef584480.azurecr.io/azureml/azureml_930c481407be9ccaddfb266acde89404
 ---> 3aece98eaff6
Step 2/5 : COPY azureml-app /var/azureml-app
 ---> 5d041ba38b0b
Step 3/5 : RUN mkdir -p '/var/azureml-app' && echo eyJhY2NvdW50Q29udGV4dCI6eyJzdWJzY3JpcHRpb25JZCI6IjkyYzc2YTJmLTBlMWMtNDIxNi1iNjVlLWFiZjdhM2YzNGMxZSIsInJlc291cmNlR3JvdXBOYW1lIjoiaW5mZXJlbmNlLXBtIiwiYWNjb3VudE5hbWUiOiJpbmZlcmVuY2UtcG0tYW1sLXdvcmtzcGFjZSIsIndvcmtzcGFjZUlkIjoiODI0NDI5MGEtMzUwNi00MDEyLTgxNjAtOGQ4NjcwMzMzZTE5In0sIm1vZGVscyI6e30sIm1vZGVsc0luZm8iOnt9fQ== | base64 --decode > /var/azureml-app/model_config_map.json
 ---> Running in ef9d6e1a2e45
 ---> c69319802a5a
Step 4/5 : RUN mv '/var/azurem

## Test the webservice

In [21]:
import requests

headers = {"Content-Type": "application/octet-stream"}

test_sample = requests.get("https://aka.ms/peacock-pic", allow_redirects=True).content
resp = requests.post(service.scoring_uri, data=test_sample, headers=headers)
print(resp.text)

84 : PEACOCK


## Delete the webservice and the downloaded model

In [22]:
service.delete()
delete_triton_models(prefix)

Container has been successfully cleaned up.
successfully deleted model: densenet_onnx
successfully deleted model: bidaf-9


# Next steps

Try changing the deployment configuration to [deploy to Azure Kubernetes Service](https://docs.microsoft.com/azure/machine-learning/how-to-deploy-azure-kubernetes-service?tabs=python) for higher availability and better scalability.