Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

Please note that the this Triton Private Preview is subject to the [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/en-us/support/legal/preview-supplemental-terms/).

# (PREVIEW) Deploying a web service hosted on NVIDIA Triton to Azure Kubernetes Service (AKS)
This notebook shows the steps for deploying a service with [NVIDIA Triton Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server): registering a model, creating an image, provisioning a cluster (one-time action), and deploying a service to it. 

In this case, we use a Densenet image classification model running with ONNX Runtime, but Triton also supports TensorFlow, PyTorch, and Caffe models.
 
We then test and delete the service, image and model.

In [None]:
import azureml.core
print(azureml.core.VERSION)

# Get workspace
Load existing workspace from the config file info. If you are running this notebook in a Compute Instance, a configuration file has already been created for you. If you are running this notebook somewhere else, please follow the steps to [create a configuration file](https://docs.microsoft.com/azure/machine-learning/how-to-configure-environment#workspace).

In [None]:
from azureml.core.workspace import Workspace

ws = Workspace.from_config()

# Create compute cluster

This script creates a compute cluster. Uncomment the first line in the code cell below to see what is happening in that file.

Note that the setup script assumes you have quota for [NCSv3 machines](https://docs.microsoft.com/azure/virtual-machines/ncv3-series) in the South Central US region. If you need to request additional quota, please create a support request. You can also pass in the --vm_size parameter to setup.py and specify a different VM size.

Note that this cell can take 10-15 minutes to run.

In [None]:
# Uncomment the below line to see the contents of setup.py
# %cat ../scripts/setup.py

In [None]:
%%time

!python setup.py --compute_loc='westus2' --vm_size='Standard_NC6'

# Register the model
Register an existing trained model, add description and tags.

** Note: ** Under `model_path` there must be a sub-directory named `triton`, which has the structure of a Triton [Model Repository](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/model_repository.html#repository-layout).

In [None]:
from azureml.core.model import Model
import os

model = Model.register(
    model_path=os.path.join("..", "..", "models", "triton"),
    model_name="densenet_onnx",
    tags={'area': "Image classification", 'type': "classification"},
    description="Image classification trained on Imagenet Dataset",
    workspace=ws
)

print(model.name, model.description, model.version)

# Deploy the model as a web service
First create a scoring script. You can see the one we created for you in the `scripts` directory. Then, create an InferenceConfig and a DeploymentConfig and call Model.deploy().

Note that this step may take 10-15 minutes to run. 

In [None]:
%%time

from azureml.core.webservice import AksWebservice
from azureml.core import Environment
from azureml.core.model import Model, InferenceConfig


aks_service_name = "triton-densenet-onnx"
env = Environment.get(ws, "AzureML-Triton").clone("My-Triton")

for pip_package in ["pillow"]:
    env.python.conda_dependencies.add_pip_package(pip_package)

inference_config = InferenceConfig(
    # This entry script is where we dispatch a call to the Triton server
    entry_script="score_densenet.py", 
    source_directory=os.path.join("source_dir"),
    environment=env
)

aks_config = AksWebservice.deploy_configuration(
    cpu_cores=1,
    memory_gb=4,
    gpu_cores=1,
    compute_target_name='aks-gpu-deploy'
)

aks_service = Model.deploy(
    workspace=ws,
    name=aks_service_name,
    models=[model],
    inference_config=inference_config,
    deployment_config=aks_config)

aks_service.wait_for_deployment(show_output = True)
print(aks_service.state)

In [None]:
aks_service.delete()

# Test the web service
We test the web sevice by passing the test images content.

In [None]:
%%time
!python test_service.py --endpoint_name=$aks_service_name

# Delete resources
Delete the webservice and compute target.

In [None]:
%%time
!python delete_resources.py
