title | titleSuffix | description | services | ms.service | ms.subservice | author | ms.author | ms.reviewer | ms.date | ms.topic | ms.custom | ms.devlang |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Deploy a model in a custom container to an online endpoint |
Azure Machine Learning |
Learn how to use a custom container with an open-source server to deploy a model in Azure Machine Learning. |
machine-learning |
machine-learning |
inferencing |
dem108 |
sehan |
mopeakande |
03/26/2024 |
how-to |
deploy, devplatv2, devx-track-azurecli, cliv2, sdkv2 |
azurecli |
[!INCLUDE dev v2]
Learn how to use a custom container to deploy a model to an online endpoint in Azure Machine Learning.
Custom container deployments can use web servers other than the default Python Flask server used by Azure Machine Learning. Users of these deployments can still take advantage of Azure Machine Learning's built-in monitoring, scaling, alerting, and authentication.
The following table lists various deployment examples that use custom containers such as TensorFlow Serving, TorchServe, Triton Inference Server, Plumber R package, and Azure Machine Learning Inference Minimal image.
Example | Script (CLI) | Description |
---|---|---|
minimal/multimodel | deploy-custom-container-minimal-multimodel | Deploy multiple models to a single deployment by extending the Azure Machine Learning Inference Minimal image. |
minimal/single-model | deploy-custom-container-minimal-single-model | Deploy a single model by extending the Azure Machine Learning Inference Minimal image. |
mlflow/multideployment-scikit | deploy-custom-container-mlflow-multideployment-scikit | Deploy two MLFlow models with different Python requirements to two separate deployments behind a single endpoint using the Azure Machine Learning Inference Minimal Image. |
r/multimodel-plumber | deploy-custom-container-r-multimodel-plumber | Deploy three regression models to one endpoint using the Plumber R package |
tfserving/half-plus-two | deploy-custom-container-tfserving-half-plus-two | Deploy a Half Plus Two model using a TensorFlow Serving custom container using the standard model registration process. |
tfserving/half-plus-two-integrated | deploy-custom-container-tfserving-half-plus-two-integrated | Deploy a Half Plus Two model using a TensorFlow Serving custom container with the model integrated into the image. |
torchserve/densenet | deploy-custom-container-torchserve-densenet | Deploy a single model using a TorchServe custom container. |
torchserve/huggingface-textgen | deploy-custom-container-torchserve-huggingface-textgen | Deploy Hugging Face models to an online endpoint and follow along with the Hugging Face Transformers TorchServe example. |
triton/single-model | deploy-custom-container-triton-single-model | Deploy a Triton model using a custom container |
This article focuses on serving a TensorFlow model with TensorFlow (TF) Serving.
Warning
Microsoft might not be able to help troubleshoot problems caused by a custom image. If you encounter problems, you might be asked to use the default image or one of the images Microsoft provides to see if the problem is specific to your image.
[!INCLUDE cli & sdk]
-
You, or the service principal you use, must have Contributor access to the Azure resource group that contains your workspace. You have such a resource group if you configured your workspace using the quickstart article.
-
To deploy locally, you must have Docker engine running locally. This step is highly recommended. It helps you debug issues.
To follow along with this tutorial, clone the source code from GitHub.
git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples/cli
git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples/cli
See also the example notebook, but note that 3. Test locally
section in the notebook assumes that it runs under the azureml-examples/sdk
directory.
Define environment variables:
:::code language="azurecli" source="~/azureml-examples-main/cli/deploy-custom-container-tfserving-half-plus-two.sh" id="initialize_variables":::
Download and unzip a model that divides an input by two and adds 2 to the result:
:::code language="azurecli" source="~/azureml-examples-main/cli/deploy-custom-container-tfserving-half-plus-two.sh" id="download_and_unzip_model":::
Use docker to run your image locally for testing:
:::code language="azurecli" source="~/azureml-examples-main/cli/deploy-custom-container-tfserving-half-plus-two.sh" id="run_image_locally_for_testing":::
First, check that the container is alive, meaning that the process inside the container is still running. You should get a 200 (OK) response.
:::code language="azurecli" source="~/azureml-examples-main/cli/deploy-custom-container-tfserving-half-plus-two.sh" id="check_liveness_locally":::
Then, check that you can get predictions about unlabeled data:
:::code language="azurecli" source="~/azureml-examples-main/cli/deploy-custom-container-tfserving-half-plus-two.sh" id="check_scoring_locally":::
Now that you tested locally, stop the image:
:::code language="azurecli" source="~/azureml-examples-main/cli/deploy-custom-container-tfserving-half-plus-two.sh" id="stop_image":::
Next, deploy your online endpoint to Azure.
You can configure your cloud deployment using YAML. Take a look at the sample YAML for this example:
tfserving-endpoint.yml
:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/online/custom-container/tfserving/half-plus-two/tfserving-endpoint.yml":::
tfserving-deployment.yml
:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/online/custom-container//tfserving/half-plus-two/tfserving-deployment.yml":::
Connect to your Azure Machine Learning workspace, configure workspace details, and get a handle to the workspace as follows:
- Import the required libraries:
# import required libraries
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
ManagedOnlineEndpoint,
ManagedOnlineDeployment,
Model,
Environment,
CodeConfiguration,
)
from azure.identity import DefaultAzureCredential
- Configure workspace details and get a handle to the workspace:
# enter details of your Azure Machine Learning workspace
subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace = "<AZUREML_WORKSPACE_NAME>"
# get a handle to the workspace
ml_client = MLClient(
DefaultAzureCredential(), subscription_id, resource_group, workspace
)
For more information, see Deploy machine learning models to managed online endpoint using Python SDK v2.
Tip
name
: The name of the endpoint. It must be unique in the Azure region. The name for an endpoint must start with an upper- or lowercase letter and only consist of '-'s and alphanumeric characters. For more information on the naming rules, see endpoint limits.auth_mode
: Usekey
for key-based authentication. Useaml_token
for Azure Machine Learning token-based authentication. Akey
doesn't expire, butaml_token
does expire. For more information on authenticating, see Authenticate to an online endpoint.
Optionally, you can add description, tags to your endpoint.
# Creating a unique endpoint name with current datetime to avoid conflicts
import datetime
online_endpoint_name = "endpoint-" + datetime.datetime.now().strftime("%m%d%H%M%f")
# create an online endpoint
endpoint = ManagedOnlineEndpoint(
name=online_endpoint_name,
description="this is a sample online endpoint",
auth_mode="key",
tags={"foo": "bar"},
)
A deployment is a set of resources required for hosting the model that does the actual inferencing. Create a deployment for our endpoint using the ManagedOnlineDeployment
class.
Tip
name
- Name of the deployment.endpoint_name
- Name of the endpoint to create the deployment under.model
- The model to use for the deployment. This value can be either a reference to an existing versioned > model in the workspace or an inline model specification.environment
- The environment to use for the deployment. This value can be either a reference to an existing > versioned environment in the workspace or an inline environment specification.code_configuration
- the configuration for the source code and scoring scriptpath
- Path to the source code directory for scoring the modelscoring_script
- Relative path to the scoring file in the source code directory
instance_type
- The VM size to use for the deployment. For the list of supported sizes, see endpoints SKU list.instance_count
- The number of instances to use for the deployment
# create a blue deployment
model = Model(name="tfserving-mounted", version="1", path="half_plus_two")
env = Environment(
image="docker.io/tensorflow/serving:latest",
inference_config={
"liveness_route": {"port": 8501, "path": "/v1/models/half_plus_two"},
"readiness_route": {"port": 8501, "path": "/v1/models/half_plus_two"},
"scoring_route": {"port": 8501, "path": "/v1/models/half_plus_two:predict"},
},
)
blue_deployment = ManagedOnlineDeployment(
name="blue",
endpoint_name=online_endpoint_name,
model=model,
environment=env,
environment_variables={
"MODEL_BASE_PATH": "/var/azureml-app/azureml-models/tfserving-mounted/1",
"MODEL_NAME": "half_plus_two",
},
instance_type="Standard_DS2_v2",
instance_count=1,
)
There are a few important concepts to notice in this YAML/Python parameter:
An HTTP server defines paths for both liveness and readiness. A liveness route is used to check whether the server is running. A readiness route is used to check whether the server is ready to do work. In machine learning inference, a server could respond 200 OK to a liveness request before loading a model. The server could respond 200 OK to a readiness request only after the model is loaded into memory.
For more information about liveness and readiness probes, see the Kubernetes documentation.
Notice that this deployment uses the same path for both liveness and readiness, since TF Serving only defines a liveness route.
When you deploy a model as an online endpoint, Azure Machine Learning mounts your model to your endpoint. Model mounting allows you to deploy new versions of the model without having to create a new Docker image. By default, a model registered with the name foo and version 1 would be located at the following path inside of your deployed container: /var/azureml-app/azureml-models/foo/1
For example, if you have a directory structure of /azureml-examples/cli/endpoints/online/custom-container on your local machine, where the model is named half_plus_two:
:::image type="content" source="./media/how-to-deploy-custom-container/local-directory-structure.png" alt-text="Diagram showing a tree view of the local directory structure.":::
And tfserving-deployment.yml contains:
model:
name: tfserving-mounted
version: 1
path: ./half_plus_two
And Model
class contains:
model = Model(name="tfserving-mounted", version="1", path="half_plus_two")
Then your model will be located under /var/azureml-app/azureml-models/tfserving-deployment/1 in your deployment:
:::image type="content" source="./media/how-to-deploy-custom-container/deployment-location.png" alt-text="Diagram showing a tree view of the deployment directory structure.":::
You can optionally configure your model_mount_path
. It lets you change the path where the model is mounted.
Important
The model_mount_path
must be a valid absolute path in Linux (the OS of the container image).
For example, you can have model_mount_path
parameter in your tfserving-deployment.yml:
name: tfserving-deployment
endpoint_name: tfserving-endpoint
model:
name: tfserving-mounted
version: 1
path: ./half_plus_two
model_mount_path: /var/tfserving-model-mount
.....
For example, you can have model_mount_path
parameter in your ManagedOnlineDeployment
class:
blue_deployment = ManagedOnlineDeployment(
name="blue",
endpoint_name=online_endpoint_name,
model=model,
environment=env,
model_mount_path="/var/tfserving-model-mount",
...
)
Then your model is located at /var/tfserving-model-mount/tfserving-deployment/1 in your deployment. Note that it's no longer under azureml-app/azureml-models, but under the mount path you specified:
:::image type="content" source="./media/how-to-deploy-custom-container/mount-path-deployment-location.png" alt-text="Diagram showing a tree view of the deployment directory structure when using mount_model_path.":::
Now that you understand how the YAML was constructed, create your endpoint.
az ml online-endpoint create --name tfserving-endpoint -f endpoints/online/custom-container/tfserving-endpoint.yml
Creating a deployment might take a few minutes.
az ml online-deployment create --name tfserving-deployment -f endpoints/online/custom-container/tfserving-deployment.yml --all-traffic
Using the MLClient
created earlier, create the endpoint in the workspace. This command starts the endpoint creation and returns a confirmation response while the endpoint creation continues.
ml_client.begin_create_or_update(endpoint)
Create the deployment by running:
ml_client.begin_create_or_update(blue_deployment)
Once your deployment completes, see if you can make a scoring request to the deployed endpoint.
:::code language="azurecli" source="~/azureml-examples-main/cli/deploy-custom-container-tfserving-half-plus-two.sh" id="invoke_endpoint":::
Using the MLClient
created earlier, you get a handle to the endpoint. The endpoint can be invoked using the invoke
command with the following parameters:
endpoint_name
- Name of the endpointrequest_file
- File with request datadeployment_name
- Name of the specific deployment to test in an endpoint
Send a sample request using a JSON file. The sample JSON is in the example repository.
# test the blue deployment with some sample data
ml_client.online_endpoints.invoke(
endpoint_name=online_endpoint_name,
deployment_name="blue",
request_file="sample-request.json",
)
Now that you successfully scored with your endpoint, you can delete it:
az ml online-endpoint delete --name tfserving-endpoint
ml_client.online_endpoints.begin_delete(name=online_endpoint_name)