# Create and manage an online endpoint for inferencing

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace. [Check this notebook for creating a workspace](/sdk/resources/workspace/workspace.ipynb) 
- A Compute Cluster. [Check this notebook to create a compute cluster](/sdk/resources/compute/compute.ipynb)
- A python environment
- Installed Azure Machine Learning Python SDK v2 - [install instructions](/sdk/README.md#getting-started)

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Connect to your AML workspace from the Python SDK
- Create a managed online endpoint from Python SDK
- Create deployments on that endpoint from Python SDK
- Test a deployment with a sample request
- Scale a deployment
- Update traffic flow to a deployment
- Delete a deployment

**Motivations** - This notebook explains how to create an online endpoint and manage deployments on that endpoint. An endpoint is an HTTPS endpoint that clients can call to receive the inferencing (scoring) output of a trained model. Online endpoints are endpoints that are used for online (real-time) inferencing. 

# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1. Import the required libraries

In [None]:
#import required libraries
from azure.ml import MLClient
from azure.ml.entities import ManagedOnlineEndpoint, ManagedOnlineDeployment, Model, Environment, CodeConfiguration
from azure.identity import InteractiveBrowserCredential

## 1.2. Configure workspace details and get a handle to the workspace

To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ml` to get a handle to the required Azure Machine Learning workspace. We use the default [interactive authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.interactivebrowsercredential?view=azure-python) for this tutorial. More advanced connection methods can be found [here](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python).

In [None]:
#enter details of your AML workspace
subscription_id = '<SUBSCRIPTION_ID>'
resource_group = '<RESOURCE_GROUP>'
workspace = '<AML_WORKSPACE_NAME>'

In [None]:
#get a handle to the workspace
ml_client = MLClient(InteractiveBrowserCredential(), subscription_id, resource_group, workspace)

# 2. Create Online Endpoint
Online endpoints are endpoints that are used for online (real-time) inferencing. Online endpoints contain deployments that are ready to receive data from clients and can send responses back in real time.

To create an online endpoint we will use `ManagedOnlineEndpoint`. This class allows user to configure the following key aspects:
- `name` - Name of the endpoint. Needs to be unique at the Azure region level
- `auth_mode` - The authentication method for the endpoint. Key-based authentication and Azure ML token-based authentication are supported. Key-based authentication doesn't expire but Azure ML token-based authentication does. Possible values are `key` or `aml_token`.
- `identity`- The managed identity configuration for accessing Azure resources for endpoint provisioning and inference.
   - `type` - The type of managed identity. Azure Machine Learning supports `system_assigned` or `user_assigned` identity.
   - `user_assigned_identities` - List (array) of fully qualified resource IDs of the user-assigned identities. This property is required is `identity.type` is user_assigned.
- `description`- Description of the endpoint.

## 2.1 Configure the endpoint

In [None]:
# Creating a unique endpoint name with current datetime to avoid conflicts
import datetime
online_endpoint_name = "my-online-endpoint-" + datetime.datetime.now().strftime("%Y%m%d%H%M")

#create an online endpoint
endpoint = ManagedOnlineEndpoint(name=online_endpoint_name,
            description='this is a sample online endpoint',
            auth_mode='key',
            tags={'foo': 'bar'})
                     


## 2.2 Create the endpoint
Using the `MLClient` created earlier, we will now create the Endpoint in the workspace. This command will start the endpoint creation and return a confirmation response while the endpoint creation continues.

In [None]:
ml_client.begin_create_or_update(endpoint)

# 3. Create a deployment
A deployment is a set of resources required for hosting the model that does the actual inferencing. We will create a deployment for our endpoint using the `ManagedOnlineDeployment` class. This class allows user to configure the following key aspects.
- `name` - Name of the deployment.
- `endpoint_name` - Name of the endpoint to create the deployment under.
- `model` - The model to use for the deployment. This value can be either a reference to an existing versioned model in the workspace or an inline model specification.
- `environment` - The environment to use for the deployment. This value can be either a reference to an existing versioned environment in the workspace or an inline environment specification.
- `code_configuration` - the configuration for the source code and scoring script
    - `path`- Path to the source code directory for scoring the model
    - `scoring_script` - Relative path to the scoring file in the source code directory
- `instance_type` - The VM size to use for the deployment. For the list of supported sizes, see [Managed online endpoints SKU list](https://docs.microsoft.com/en-us/azure/machine-learning/reference-managed-online-endpoints-vm-sku-list).
- `instance_count` - The number of instances to use for the deployment

## 3.1 Configure the deployment

In [None]:
model = Model(
    path='./model-1/model/sklearn_regression_model.pkl')
env = Environment(
    conda_file='./model-1/environment/conda.yml', 
    image='mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1')

blue_deployment = ManagedOnlineDeployment(
    name='blue',
    endpoint_name=online_endpoint_name,
    model=model,
    environment=env,
    code_configuration=CodeConfiguration(
        code="./model-1/onlinescoring",
        scoring_script="score.py"),
    instance_type='Standard_F2s_v2',
    instance_count=1)

## 3.2 Create the deployment
Using the `MLClient` created earlier, we will now create the deployment in the workspace. This command will start the deployment creation and return a confirmation response while the deployment creation continues.

In [None]:
ml_client.begin_create_or_update(blue_deployment)

## 3.3 Test the endpoint with sample data
Using the `MLClient` created earlier, we will get a handle to the endpoint. The endpoint can be invoked using the `invoke` command with the following parameters:
- `endpoint_name` - Name of the endpoint
- `request_file` - File with request data
- `deployment_name` - Name of the specific deployment to test in an endpoint

We will send a sample request using a [json](./model-1/sample-request.json) file. 

In [None]:
#test the blue deployment with some sample data
ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    deployment_name='blue',
    request_file='./model-1/sample-request.json')

# 3.4 Scale the deployment 
Using the `MLClient` created earlier, we will get a handle to the deployment. The deployment can be scaled by increasing or decreasing the `instance count`. 

In [None]:
#scale the deployment
blue_deployment = ml_client.online_deployments.get(
    name='blue', 
    endpoint_name=online_endpoint_name) #redundant step, added to show the get feature
blue_deployment.instance_count = 2
ml_client.online_deployments.begin_create_or_update(blue_deployment)

# 4 Create a second deployment
Using the `MLClient` created earlier, we will now create the deployment in the workspace. This command will start the deployment creation and return a confirmation response while the deployment creation continues.

## 4.1 Configure the deployment

In [None]:
#create a second deployment
model2 = Model(path='./model-2/model/sklearn_regression_model.pkl')
env2 = Environment(conda_file='./model-2/environment/conda.yml', image='mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1')

green_deployment = ManagedOnlineDeployment(
    name='green',
    endpoint_name=online_endpoint_name,
    model=model2,
    environment=env2,
    code_configuration=CodeConfiguration(
        code="./model-2/onlinescoring",
        scoring_script="score.py"),
    instance_type='Standard_F2s_v2',
    instance_count=1)



## 4.2 Create the deployment

In [None]:
ml_client.begin_create_or_update(green_deployment)

## 4.2 Test the new deployment specifically with sample data
We will now get a handle to the endpoint and the deployment. We will send a sample request to this specific deployment using a [json](./model-2/sample-request.json) file. 

In [None]:
ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name, 
    deployment_name='green', 
    request_file='./model-2/sample-request.json')

# 5 Managing endpoints and deployments

## 5.1 Get details of the endpoint

In [None]:
# Get the details for online endpoint
endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)
#existing traffic details
print(endpoint.traffic)
#Get the scoring URI
print(endpoint.scoring_uri)

## 5.2 Update some traffic on the endpoint to the new deployment
Update the traffic to divert 10% of traffic to the green deployment

In [None]:
endpoint.traffic = {'blue': 90, 'green': 10}
ml_client.begin_create_or_update(endpoint)

## 5.3 Get the logs for the new deployment
Get the logs for the green deployment and verify as needed

In [None]:
ml_client.online_deployments.get_logs(name='green', endpoint_name=online_endpoint_name, lines=50)

## 5.2 Update all traffic on the endpoint to the new deployment
Update the traffic to divert 100% of traffic to the green deployment

In [None]:
endpoint.traffic = {'blue': 0, 'green': 100}
ml_client.begin_create_or_update(endpoint)

## 5.4 Delete the old deployment
Delete the blue deployment

In [None]:
# We are deleting the endpoint as part of the clean up - so commenting this out.
# ml_client.online_deployments.delete(name='blue', endpoint_name=online_endpoint_name)

# 6. Clean up

Delete the endpoint to save on resources

In [None]:
ml_client.online_endpoints.begin_delete(name=online_endpoint_name)