# Deploy to an online endpoint

To consume a model from an application, you can deploy the model to an online endpoint. You'll create an MLflow model from local files and test the endpoint.

## Before you start

You'll need the latest version of the  **azureml-ai-ml** package to run the code in this notebook. Run the cell below to verify that it is installed.

> **Note**:
> If the **azure-ai-ml** package is not installed, run `pip install azure-ai-ml` to install it.

In [None]:
pip show azure-ai-ml

## Connect to your workspace

With the required SDK packages installed, now you're ready to connect to your workspace.

To connect to a workspace, we need identifier parameters - a subscription ID, resource group name, and workspace name. Since you're working with a compute instance, managed by Azure Machine Learning, you can use the default values to connect to the workspace.

In [1]:
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml import MLClient

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

In [2]:
# Get a handle to workspace
ml_client = MLClient.from_config(credential=credential)

Found the config file in: /config.json


In [17]:
import mlflow.sklearn

model_name = "diabetes-mlflow"
model_uri = f"models:/{model_name}/latest"
model = mlflow.sklearn.load_model(model_uri)
model

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


In [10]:
import mltable

data_name = "diabetes_train_mltable"
registered_data_asset = ml_client.data.get(name=data_name, version=1)
tbl = mltable.load(f"azureml:/{registered_data_asset.id}")
df = tbl.to_pandas_dataframe()
df.head(5)

Unnamed: 0,Pregnancies,PlasmaGlucose,DiastolicBloodPressure,TricepsThickness,SerumInsulin,BMI,DiabetesPedigree,Age,Diabetic
0,0,171,80,34,23,43.509726,1.213191,21,0
1,8,92,93,47,36,21.240576,0.158365,23,0
2,7,115,47,52,35,41.511523,0.079019,23,0
3,9,103,78,25,304,29.582192,1.28287,43,1
4,1,85,59,27,35,42.604536,0.549542,22,0


In [7]:
X, y = df.drop(['Diabetic'], axis=1).values, df.Diabetic.values

In [39]:
import mlflow
from mlflow.models.signature import infer_signature
model_info = mlflow.models.get_model_info(model_uri)
signature = model_info.signature
# Infer the signature from your input data and model predictions
inferred_signature = infer_signature(df, model.predict(X[:5]).astype(bool))

print(signature, inferred_signature)
# Compare the inferred signature with the model's signature
if signature == inferred_signature:
    print("The input data matches the model's signature.")
else:
    print("The input data does not match the model's signature.")

inputs: 
  ['PatientID': integer, 'Pregnancies': integer, 'PlasmaGlucose': integer, 'DiastolicBloodPressure': integer, 'TricepsThickness': integer, 'DiastolicBloodPressure': integer, 'SerumInsulin': integer, 'BMI': double, 'DiabetesPedigree': double, 'Age': integer]
outputs: 
  [boolean]
 inputs: 
  ['Pregnancies': long, 'PlasmaGlucose': long, 'DiastolicBloodPressure': long, 'TricepsThickness': long, 'SerumInsulin': long, 'BMI': double, 'DiabetesPedigree': double, 'Age': long, 'Diabetic': long]
outputs: 
  [Tensor('bool', (-1,))]

The input data does not match the model's signature.


  inputs = _infer_schema(model_input)


In [53]:
model.predict(X[:100]), y[:100]

(array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
 array([0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0,
        0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0,
        1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0,
        0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1,
        0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1]))

## Define and create an endpoint

Ultimately, the goal is to deploy a model to an endpoint. Therefore, you first need to create an endpoint. The endpoint will be a HTTPS endpoint that an application can call to receive predictions from the model. An application can consume an endpoint by using its URI, and authenticating with a key or token.

Run the following cell to define the endpoint. Note that the name of the endpoint has to be unique. You'll use the `datetime` function to generate a unique name.

In [3]:
from azure.ai.ml.entities import ManagedOnlineEndpoint
import datetime

online_endpoint_name = "endpoint-" + datetime.datetime.now().strftime("%m%d%H%M%f")

# create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    description="Online endpoint for MLflow diabetes model",
    auth_mode="key",
)

Next, you'll create the endpoint by running the following cell. This may take several minutes. While your endpoint is being created, you can read about [what are Azure Machine Learning endpoints](https://learn.microsoft.com/azure/machine-learning/concept-endpoints).

In [4]:
ml_client.begin_create_or_update(endpoint).result()

ManagedOnlineEndpoint({'public_network_access': 'Enabled', 'provisioning_state': 'Succeeded', 'scoring_uri': 'https://endpoint-04011404660362.eastus2.inference.ml.azure.com/score', 'openapi_uri': 'https://endpoint-04011404660362.eastus2.inference.ml.azure.com/swagger.json', 'name': 'endpoint-04011404660362', 'description': 'Online endpoint for MLflow diabetes model', 'tags': {}, 'properties': {'azureml.onlineendpointid': '/subscriptions/31b2efe9-cc1f-447c-b8be-13e3e95feb59/resourcegroups/ar5g15-rg/providers/microsoft.machinelearningservices/workspaces/dp_prep/onlineendpoints/endpoint-04011404660362', 'AzureAsyncOperationUri': 'https://management.azure.com/subscriptions/31b2efe9-cc1f-447c-b8be-13e3e95feb59/providers/Microsoft.MachineLearningServices/locations/eastus2/mfeOperationsStatus/oe:936c0e44-fa5a-4331-8858-6a6246667406:badb6d48-9383-4704-9e38-4018fb06b108?api-version=2022-02-01-preview'}, 'print_as_yaml': True, 'id': '/subscriptions/31b2efe9-cc1f-447c-b8be-13e3e95feb59/resourceGr

<p style="color:red;font-size:120%;background-color:yellow;font-weight:bold"> IMPORTANT! Wait until the endpoint is created successfully before continuing! A green notification should appear in the studio. </p>

## Configure the deployment

You can deploy multiple models to an endpoint. This is mostly useful when you want to update the deployed model while keeping the current model in production. You'll need to configure the deployment to specify which model needs to be deployed to an endpoint. In the following cell, you'll refer to the model trained and stored in the local `model` folder (stored in the same folder as this notebook). Note that since you're working with an MLflow model, you don't need to specify the environment or scoring script.

You'll also specify the infrastructure needed for the model to be deployed.

In [5]:
from azure.ai.ml.entities import Model, ManagedOnlineDeployment
from azure.ai.ml.constants import AssetTypes

# create a blue deployment
model = Model(
    path="./model",
    type=AssetTypes.MLFLOW_MODEL,
    description="my sample mlflow model",
)

blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name=online_endpoint_name,
    model=model,
    instance_type="Standard_F4s_v2",
    instance_count=1,
)

## Create the deployment

Finally, you can actually deploy the model to the endpoint by running the following cell:

In [6]:
ml_client.online_deployments.begin_create_or_update(blue_deployment).result()

Check: endpoint endpoint-04011404660362 exists
[32mUploading model (0.0 MBs): 100%|██████████| 2106/2106 [00:00<00:00, 19054.98it/s]
[39m



................................................................................................................

ManagedOnlineDeployment({'private_network_connection': None, 'provisioning_state': 'Succeeded', 'endpoint_name': 'endpoint-04011404660362', 'type': 'Managed', 'name': 'blue', 'description': None, 'tags': {}, 'properties': {'AzureAsyncOperationUri': 'https://management.azure.com/subscriptions/31b2efe9-cc1f-447c-b8be-13e3e95feb59/providers/Microsoft.MachineLearningServices/locations/eastus2/mfeOperationsStatus/od:936c0e44-fa5a-4331-8858-6a6246667406:71308596-046f-4500-8379-758c0c96e433?api-version=2023-04-01-preview'}, 'print_as_yaml': True, 'id': '/subscriptions/31b2efe9-cc1f-447c-b8be-13e3e95feb59/resourceGroups/ar5g15-rg/providers/Microsoft.MachineLearningServices/workspaces/dp_prep/onlineEndpoints/endpoint-04011404660362/deployments/blue', 'Resource__source_path': None, 'base_path': '/mnt/batch/tasks/shared/LS_root/mounts/clusters/cpu-compute7/code/Users/ar5g15/azure-ml-labs/Labs/11', 'creation_context': None, 'serialize': <msrest.serialization.Serializer object at 0x7f59f52ea710>, '

The deployment of the model may take 10-15 minutes. While waiting for the model to be deployed, you can learn more about [managed endpoints in this video](https://www.youtube.com/watch?v=SxFGw_OBxNM&ab_channel=MicrosoftDeveloper).

<p style="color:red;font-size:120%;background-color:yellow;font-weight:bold"> IMPORTANT! Wait until the deployment is completed before continuing! A green notification should appear in the studio.</p>

Since you only have one model deployed to the endpoint, you want this deployment to take 100% of the traffic. If you deploy multiple models to the endpoint, you could use the same approach to distribute traffic across the deployed models.

In [7]:
# blue deployment takes 100 traffic
endpoint.traffic = {"blue": 100}
ml_client.begin_create_or_update(endpoint).result()

ManagedOnlineEndpoint({'public_network_access': 'Enabled', 'provisioning_state': 'Succeeded', 'scoring_uri': 'https://endpoint-04011404660362.eastus2.inference.ml.azure.com/score', 'openapi_uri': 'https://endpoint-04011404660362.eastus2.inference.ml.azure.com/swagger.json', 'name': 'endpoint-04011404660362', 'description': 'Online endpoint for MLflow diabetes model', 'tags': {}, 'properties': {'azureml.onlineendpointid': '/subscriptions/31b2efe9-cc1f-447c-b8be-13e3e95feb59/resourcegroups/ar5g15-rg/providers/microsoft.machinelearningservices/workspaces/dp_prep/onlineendpoints/endpoint-04011404660362', 'AzureAsyncOperationUri': 'https://management.azure.com/subscriptions/31b2efe9-cc1f-447c-b8be-13e3e95feb59/providers/Microsoft.MachineLearningServices/locations/eastus2/mfeOperationsStatus/oe:936c0e44-fa5a-4331-8858-6a6246667406:b6272d78-ed14-4d79-be57-7faf8244c0d4?api-version=2022-02-01-preview'}, 'print_as_yaml': True, 'id': '/subscriptions/31b2efe9-cc1f-447c-b8be-13e3e95feb59/resourceGr

<p style="color:red;font-size:120%;background-color:yellow;font-weight:bold"> IMPORTANT! Wait until the blue deployment is configured before continuing! A green notification should appear in the studio. </p> 

## Test the deployment

Let's test the deployed model by invoking the endpoint. A JSON file with sample data is used as input. The trained model predicts whether a patient has diabetes or not, based on medical data like age, BMI, and the number of pregnancies. A `[0]` indicates a patient doesn't have diabetes. A `[1]` means a patient does have diabetes.

In [24]:
# test the blue deployment with some sample data
response = ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    deployment_name="blue",
    request_file="sample-data.json",
)

if response[1]=='1':
    print("Diabetic")
else:
    print ("Not diabetic")

Diabetic


In [12]:
response

'[1]'

Optionally, you can change the values in the `sample-data.json` file to try and get a different prediction.

## List endpoints

Although you can view all endpoints in the Studio, you can also list all endpoints using the SDK:

In [9]:
endpoints = ml_client.online_endpoints.list()
for endp in endpoints:
    print(endp.name)

endpoint-04011404660362


## Get endpoint details

If you want more information about a specific endpoint, you can explore the details using the SDK too.

In [10]:
# Get the details for online endpoint
endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)

# existing traffic details
print(endpoint.traffic)

# Get the scoring URI
print(endpoint.scoring_uri)

{'blue': 100}
https://endpoint-04011404660362.eastus2.inference.ml.azure.com/score


## Delete the endpoint and deployment

As an endpoint is always available, it can't be paused to save costs. To avoid unnecessary costs, delete the endpoint.

In [26]:
ml_client.online_endpoints.begin_delete(name=online_endpoint_name)

.

ResourceExistsError: (Conflict) Conflict
Code: Conflict
Message: Conflict
Exception Details:	(OperationDuplicationConflict) Conflict of operation, another operation on same entity is already running in workspace dp_prep.
	Code: OperationDuplicationConflict
	Message: Conflict of operation, another operation on same entity is already running in workspace dp_prep.