This repository has been archived by the owner on Apr 22, 2024. It is now read-only.

Managed Inference

sethjuarez edited this page May 10, 2022 · 6 revisions

During the "Scaling responsible MLOps with Azure Machine Learning" breakout you saw and endpoint created that created heroic marketing copy in the style of Homer (not Simpsons). If you want to skip the model training and want to get straight to deploying this new model then this is the right place!

Setting up AzureML CLI

There are comprehensive docs on setting up the Azure CLI. Once installed, we need to add the az ml CLI extension (see docs). Most of the commands assume you the pre-configuration of defaults as outlined in the docs:

az configure --defaults group=$GROUP workspace=$WORKSPACE location=$LOCATION

In my case:

Setup CLI Defaults

Creating AzureML Workspace

This is described in depth over in the Setup section of the docs. In this exercise, the workspace was set up with the following:

./provision.ps1 -name grippity -location westus2

Environment Creation

The first step is to have the right environment. To do this, run the following command:

az ml environment create --name hf-transformers-inference --version 1 --conda-file conda_inference.yml --image

You will see something like this:

Deployment Environment

This will kick start the build of the actual environment. You can view the progress of that in the AML Portal (this can sometimes take a bit of time):

Deployment Environment

If you click into the environment you can view the build log as the environment is created.

Download and Register Model

Since this part of the exercise is to deploy our fine-tuned model (and not train), we need to download and register the model ourselves.

To download the model run the following:


This will download a zip file and extract the model files to thr ./model directory.

The next step is to register the model:

az ml model create --name model-gpt2 --version 1 --local-path ./model

You should see something like this:

Model Registered

This model should now also be available in your registry:

Model in Portal

Endpoint Creation

The first step is to create the endpoint. This can be done using the following command:

az ml online-endpoint create --name <YOUR_UNIQUE_NAME> --auth-mode key

Here's what I used:

az ml online-endpoint create --name grippityep --auth-mode key

Here's what my output looks like:

Endpoint Creation


Endpoints in AzureML may contain many deployments. These deployments can also have different percentages of the traffic directed to each of them. In order to match what you have in your workspace, a couple of changes need to be made to the deployment.yml file. The first is the model: azureml:XXXX:1. This should be changed to the registered model name and version produced above (likely model: azureml:model-gpt2:1). This is also true for the environment: azureml:XXXXXX:1: as long as the environment matches the environment created earlier this will all work (it is likely environment: azureml:hf-transformers-inference:1). Finally, we need to change the endpoint_name to match what you named it above (I named mine grippityep).

After making those changes, the following commands create an endpoint and a deployment in that endpoint:

az ml online-deployment create -f cloud/deployment.yml --all-traffic

You will see something like:

Deployment Start

This process takes about 10-12m in total and should produce the following output:

Deployment Complete

Testing Your Endpoint

If you navigate over to your new endpoint you should see something like this:

Deployment Portal

If you click on the Test tab, you can test out the endpoint. (NOTE: If you increase the length and/or increase the count, you could exceed the capacity of the compute type. If this is something you want to do, feel free to up the compute in the deployment.yml file.)

You can copy and paste the following into the input data box:

    "prompt": "This heroic conference will",
    "length": 15,
    "count": 4

When complete, you should see something like this:

Deployment Start

It works!!!!

Conference App Integration

If you've arrived seeking to add this new endpoint to the conference app, there are just a few more steps needed to hook this brand new endpoint into the last tab.

Conference App

The first step is to retrieve this new endpoint and its associated key. These can be found by navigating to the newly created endpoint and clicking on the Consume tab:

Consume Endpoint

These 2 values need to be added to the Application Settings of the web app deployed over in the other example. Replace the TEXT_GENERATION_ENDPOINT with the endpoint value and the TEXT_GENERATION_KEY with the corresponding key.

Conference App Settings

Make sure to save the settings (the application should restart). After a few minutes your application is now connected to this custom heroic marketing text generator:

Conference App

It works!!