# Deploy to Triton Inference Server locally

description: (preview) deploy an image classification model trained on densenet locally via Triton

Please note that this Public Preview release is subject to the [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()
ws

## Download model

It's important that your model have this directory structure for Triton Inference Server to be able to load it. [Read more about the directory structure that Triton expects](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/model_repository.html).

In [None]:
import git
import os
import sys

from pathlib import Path

# get the root of the repo
prefix = Path(git.Repo(".", search_parent_directories=True).working_tree_dir)

# Enables us to import helper functions as Python modules
path_to_insert = prefix.joinpath("code", "deployment", "triton").__str__()
if path_to_insert not in sys.path:
    sys.path.insert(1, path_to_insert)

from model_utils import download_triton_models, delete_triton_models


download_triton_models(prefix)

## Register model

A registered model is a logical container stored in the cloud, containing all files located at `model_path`, which is associated with a version number and other metadata.

In [None]:
from azureml.core.model import Model

model_path = prefix.joinpath("models")

model = Model.register(
    model_path=model_path,
    model_name="densenet-onnx-example",
    tags={"area": "Image classification", "type": "classification"},
    description="Image classification trained on Imagenet Dataset",
    workspace=ws,
)

print(model)

## Deploy webservice

In this case we deploy to the local compute, but for other options, see [our documentation](https://docs.microsoft.com/azure/machine-learning/how-to-deploy-and-where?tabs=azcli)


In [None]:
from azureml.core.webservice import LocalWebservice
from azureml.core import Environment
from azureml.core.model import InferenceConfig
from random import randint

service_name = "triton-densenet-onnx-local" + str(randint(10000, 99999))
env = Environment.get(ws, "AzureML-Triton").clone("triton-example")

for pip_package in ["pillow"]:
    env.python.conda_dependencies.add_pip_package(pip_package)

inference_config = InferenceConfig(
    # this entry script is where we dispatch a call to the Triton server
    entry_script="score_densenet.py",
    source_directory=prefix.joinpath("code", "deployment", "triton"),
    environment=env,
)

config = LocalWebservice.deploy_configuration(port=6789)

service = Model.deploy(
    workspace=ws,
    name=service_name,
    models=[model],
    inference_config=inference_config,
    deployment_config=config,
    overwrite=True,
)

service.wait_for_deployment(show_output=True)

## Test the webservice

In [None]:
import requests

headers = {"Content-Type": "application/octet-stream"}

data_file = prefix.joinpath("data", "raw", "images", "peacock.jpg")
test_sample = open(data_file, "rb").read()
resp = requests.post(service.scoring_uri, data=test_sample, headers=headers)
print(resp.text)

## Delete the webservice and the downloaded model

In [None]:
service.delete()
delete_triton_models(prefix)

# Next steps

Try changing the deployment configuration to [deploy to Azure Kubernetes Service](https://docs.microsoft.com/azure/machine-learning/how-to-deploy-azure-kubernetes-service?tabs=python) for higher availability and better scalability.