In [1]:
# import required libraries
from azure.ai.ml import MLClient, command, Input, Output, load_component
from azure.identity import DefaultAzureCredential
from azure.ai.ml.entities import Data, Environment, ManagedOnlineEndpoint
from azure.ai.ml.constants import AssetTypes, InputOutputModes
from azure.ai.ml.dsl import pipeline

In [None]:
# Enter details of your AML workspace
subscription_id = ""
resource_group = ""
workspace = ""

In [3]:
# get a handle to the workspace
ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)

# Online Endpoint

Online endpoints are endpoints that are used for online (real-time) inferencing. They receive data from clients and can send responses back in real time.

An **endpoint** is an HTTPS endpoint that clients can call to receive the inferencing (scoring) output of a trained model. It provides:
* Authentication using "key & token" based auth
* SSL termination
* A stable scoring URI (endpoint-name.region.inference.ml.azure.com)

A **deployment** is a set of resources required for hosting the model that does the actual inferencing.
A single endpoint can contain multiple deployments.

Features of the managed online endpoint:

* **Test and deploy locally** for faster debugging
* Traffic to one deployment can also be **mirrored** (copied) to another deployment.
* **Application Insights integration**
* Security
* Authentication: Key and Azure ML Tokens
* Automatic Autoscaling
* Visual Studio Code debugging

**blue-green deployment**: An approach where a new version of a web service is introduced to production by deploying it to a small subset of users/requests before deploying it fully.

<center>
<img src="../../imgs/endpoint_concept.png" width = "500px" alt="Online Endpoint Concept cli vs sdk">
</center>

## 1. Create Online Endpoint

We can create an **online endpoint** with cli v2 or sdk v2 using the following syntax:

<center>
<img src="../../imgs/create_online_endpoint.png" width = "700px" alt="Create Online Endpoint cli vs sdk">
</center>

In [None]:
from azure.ai.ml.entities import ManagedOnlineEndpoint
import random

rand = random.randint(0, 10000)

endpoint_name = f"taxi-online-endpoint-{rand}"
# create an online endpoint
online_endpoint = ManagedOnlineEndpoint(
    name=endpoint_name, 
    description="Taxi online endpoint",
    auth_mode="aml_token",
)
poller = ml_client.____________.begin_create_or_update( ### !!!! TODO !! # Create the online endpoint
    online_endpoint,   
)

poller.wait()

In [5]:
from azure.ai.ml.exceptions import DeploymentException

status = poller.status()
if status != "Succeeded":
    raise DeploymentException(status)
else:
    print("Endpoint creation succeeded")
    endpoint = poller.result()
    print(endpoint)

Endpoint creation succeeded
auth_mode: aml_token
description: Taxi online endpoint
id: /subscriptions/5bef918d-59f1-49d6-897b-919e5d5c05a0/resourceGroups/rg-gasunimlopsv2-prod/providers/Microsoft.MachineLearningServices/workspaces/mlw-gasunimlopsv2-prod/onlineEndpoints/taxi-online-endpoint-4849
identity:
  principal_id: fedd2b65-e88f-40fb-8b07-3967d185eb35
  tenant_id: 1a640b76-52c0-4a5c-b26b-3148406b593f
  type: system_assigned
kind: Managed
location: westeurope
mirror_traffic: {}
name: taxi-online-endpoint-4849
openapi_uri: https://taxi-online-endpoint-4849.westeurope.inference.ml.azure.com/swagger.json
properties:
  AzureAsyncOperationUri: https://management.azure.com/subscriptions/5bef918d-59f1-49d6-897b-919e5d5c05a0/providers/Microsoft.MachineLearningServices/locations/westeurope/mfeOperationsStatus/oeidp:9d59d97a-fd9d-4edf-962b-3bc29e6a57e5:2253dc8f-64f9-48ba-a548-fe7af84f266a?api-version=2022-02-01-preview
  azureml.onlineendpointid: /subscriptions/5bef918d-59f1-49d6-897b-919e5d

## 2. Create Online Deployment

To create a deployment to online endpoint, you need to specify the following elements:

* Model files (or specify a registered model in your workspace)
* Scoring script - code needed to do scoring/inferencing
* Environment - a Docker image with Conda dependencies, or a dockerfile
* Compute instance & scale settings

Note that if you're deploying **MLFlow models**, there's no need to provide **a scoring script** and execution **environment**, as both are autogenerated.

We can create an **online deployment** with cli v2 or sdk v2 using the following syntax:

<center>
<img src="../../imgs/create_online_deployment.png" width = "700px" alt="Create Online Deployment cli vs sdk">
</center>

In [None]:
# create online deployment
from azure.ai.ml.entities import ManagedOnlineDeployment, Model, Environment

blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name=endpoint_name,
    model="taxi-model@latest",
    instance_type="Standard_DS2_v2",
    instance_count=1,
)

poller = ml_client.online_deployments.begin_create_or_update(
    deployment=_____________### !!!! TODO !! # Create the online deployment
)
poller.wait()

Instance type Standard_DS2_v2 may be too small for compute resources. Minimum recommended compute SKU is Standard_DS3_v2 for general purpose endpoints. Learn more about SKUs here: https://learn.microsoft.com/azure/machine-learning/referencemanaged-online-endpoints-vm-sku-list
Check: endpoint taxi-online-endpoint-4849 exists


....................................................................................................................................

## 3. Allocate Traffic

In [7]:
# allocate traffic
# blue deployment takes 100 traffic
online_endpoint.traffic = {"blue": 100}
poller = ml_client.begin_create_or_update(online_endpoint)
poller.wait()

## 4. Invoke and Test Endpoint

We can invoke the **online deployment** with cli v2 or sdk v2 using the following syntax:

<center>
<img src="../../imgs/invoke_online_endpoint.png" width = "700px" alt="Invoke online endpoint cli vs sdk">
</center>

In [8]:
# invoke and test endpoint
ml_client.online_endpoints.invoke(
    endpoint_name=endpoint_name,
    request_file="../../data/taxi-request.json",
)


'[11.821297944525352, 15.327631675652293]'