## Create handle to workspace

Before we dive in the code, you need a way to reference your workspace. You'll create `ml_client` for a handle to the workspace.  You'll then use `ml_client` to manage resources and jobs.

In the next cell, enter your Subscription ID, Resource Group name and Workspace name. To find these values:

1. In the upper right Azure Machine Learning studio toolbar, select your workspace name.
1. Copy the value for workspace, resource group and subscription ID into the code.
1. You'll need to copy one value, close the area and paste, then come back for the next one.

In [34]:
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

# authenticate
credential = DefaultAzureCredential()
# # Get a handle to the workspace
import os 
from dotenv import load_dotenv

# load the environment variables from .env
load_dotenv()

# authenticate
credential = DefaultAzureCredential()

# Get a handle to the workspace
ml_client = MLClient(
    credential=credential,
    subscription_id = os.environ.get('SUBSCRIPTION_ID'),
    resource_group_name = os.environ.get('RESOURCE_GROUP_NAME'),
    workspace_name = os.environ.get('WORKSPACE_NAME'),
)

In [35]:
import os
print(os.environ.get('WORKSPACE_NAME'))
print(os.environ.get('RESOURCE_GROUP_NAME'))


aigbb-aml-bootcamp
aigbb-aml-bootcamp


First, we'll define the endpoint, using the `ManagedOnlineEndpoint` class.



> [!TIP]
> * `auth_mode` : Use `key` for key-based authentication. Use `aml_token` for Azure Machine Learning token-based authentication. A `key` doesn't expire, but `aml_token` does expire. For more information on authenticating, see [Authenticate to an online endpoint](https://learn.microsoft.com/azure/machine-learning/how-to-authenticate-online-endpoint).
> * Optionally, you can add a description and tags to your endpoint.

In [36]:
online_endpoint_name = "deployment-test"

In [37]:
from azure.ai.ml.entities import ManagedOnlineEndpoint, KubernetesOnlineEndpoint

online_endpoint_name = "deployment-test"

# define an online endpoint
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    description="this is an online endpoint",
    auth_mode="key",
)

In [38]:
# create the online endpoint
# expect the endpoint to take approximately 2 minutes.

endpoint = ml_client.online_endpoints.begin_create_or_update(endpoint).result()

In [39]:
endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)

print(
    f'Endpoint "{endpoint.name}" with provisioning state "{endpoint.provisioning_state}" is retrieved'
)

Endpoint "geico-deployment-test" with provisioning state "Succeeded" is retrieved


In [40]:
from azure.ai.ml.entities import ManagedOnlineDeployment, KubernetesOnlineDeployment
from azure.ai.ml.entities import TargetUtilizationScaleSettings

# Choose the latest version of our registered model for deployment
model = ml_client.models.get(name="credit_defaults_model", label="latest")

# define an online deployment
blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name = online_endpoint_name,
    model=model,
    instance_type="Standard_DS3_v2",
    instance_count=1
)

Using the `MLClient` created earlier, we'll now create the deployment in the workspace. This command will start the deployment creation and return a confirmation response while the deployment creation continues.

In [41]:
# create the online deployment
blue_deployment = ml_client.online_deployments.begin_create_or_update(
    blue_deployment
).result()

# blue deployment takes 100% traffic
# expect the deployment to take approximately 8 to 10 minutes.
endpoint.traffic = {"blue": 100}
ml_client.online_endpoints.begin_create_or_update(endpoint).result()

Check: endpoint geico-deployment-test exists
data_collector is not a known attribute of class <class 'azure.ai.ml._restclient.v2022_02_01_preview.models._models_py3.ManagedOnlineDeployment'> and will be ignored


...............................................................................

Readonly attribute principal_id will be ignored in class <class 'azure.ai.ml._restclient.v2022_05_01.models._models_py3.ManagedServiceIdentity'>
Readonly attribute tenant_id will be ignored in class <class 'azure.ai.ml._restclient.v2022_05_01.models._models_py3.ManagedServiceIdentity'>


ManagedOnlineEndpoint({'public_network_access': 'Enabled', 'provisioning_state': 'Succeeded', 'scoring_uri': 'https://geico-deployment-test.eastus.inference.ml.azure.com/score', 'openapi_uri': 'https://geico-deployment-test.eastus.inference.ml.azure.com/swagger.json', 'name': 'geico-deployment-test', 'description': 'this is an online endpoint', 'tags': {'training_dataset': 'geico'}, 'properties': {'azureml.onlineendpointid': '/subscriptions/781b03e7-6eb7-4506-bab8-cf3a0d89b1d4/resourcegroups/aigbb-aml-bootcamp/providers/microsoft.machinelearningservices/workspaces/aigbb-aml-bootcamp/onlineendpoints/geico-deployment-test', 'AzureAsyncOperationUri': 'https://management.azure.com/subscriptions/781b03e7-6eb7-4506-bab8-cf3a0d89b1d4/providers/Microsoft.MachineLearningServices/locations/eastus/mfeOperationsStatus/oe:79884ed9-220e-45cf-b7fe-bc488567ee26:cae7a5ce-901b-4744-8ca1-db03739f8aa6?api-version=2022-02-01-preview'}, 'print_as_yaml': True, 'id': '/subscriptions/781b03e7-6eb7-4506-bab8-cf

In [44]:
!az ml online-deployment show -e "deployment-test" -n "blue" -o tsv --query "id"

/subscriptions/781b03e7-6eb7-4506-bab8-cf3a0d89b1d4/resourceGroups/aigbb-aml-bootcamp/providers/Microsoft.MachineLearningServices/workspaces/aigbb-aml-bootcamp/onlineEndpoints/geico-deployment-test/deployments/blue


In [17]:
!az ml online-deployment show -e deployment-test -n blue -o tsv --query "id"

/subscriptions/781b03e7-6eb7-4506-bab8-cf3a0d89b1d4/resourceGroups/aigbb-aml-bootcamp/providers/Microsoft.MachineLearningServices/workspaces/aigbb-aml-bootcamp/onlineEndpoints/geico-deployment-test/deployments/blue


In [45]:
!az monitor autoscale create \
  --name autoscale-deployment-test \
  --resource "/subscriptions/781b03e7-6eb7-4506-bab8-cf3a0d89b1d4/resourceGroups/aigbb-aml-bootcamp/providers/Microsoft.MachineLearningServices/workspaces/aigbb-aml-bootcamp/onlineEndpoints/deployment-test/deployments/blue" \
  --min-count 1 --max-count 5 --count 1

{
  "enabled": true,
  "id": "/subscriptions/781b03e7-6eb7-4506-bab8-cf3a0d89b1d4/resourceGroups/aigbb-aml-bootcamp/providers/microsoft.insights/autoscalesettings/autoscale-geico-deployment-test",
  "location": "eastus",
  "name": "autoscale-geico-deployment-test",
  "notifications": [
    {
      "email": {
        "customEmails": [],
        "sendToSubscriptionAdministrator": false,
        "sendToSubscriptionCoAdministrators": false
      },
      "operation": "Scale",
      "webhooks": []
    }
  ],
  "predictiveAutoscalePolicy": {
    "scaleMode": "Disabled"
  },
  "profiles": [
    {
      "capacity": {
        "default": "1",
        "maximum": "5",
        "minimum": "1"
      },
      "name": "default",
      "rules": []
    }
  ],
  "resourceGroup": "aigbb-aml-bootcamp",
  "tags": {},
  "targetResourceUri": "/subscriptions/781b03e7-6eb7-4506-bab8-cf3a0d89b1d4/resourceGroups/aigbb-aml-bootcamp/providers/Microsoft.MachineLearningServices/workspaces/aigbb-aml-bootcamp/onlineEndp



In [46]:
!az monitor autoscale rule create \
  --autoscale-name autoscale-deployment-test \
  --condition "CpuUtilizationPercentage > 50 avg 5m" \
  --scale out 2

{




  "metricTrigger": {
    "dividePerInstance": false,
    "metricName": "CpuUtilizationPercentage",
    "metricNamespace": "",
    "metricResourceUri": "/subscriptions/781b03e7-6eb7-4506-bab8-cf3a0d89b1d4/resourceGroups/aigbb-aml-bootcamp/providers/Microsoft.MachineLearningServices/workspaces/aigbb-aml-bootcamp/onlineEndpoints/geico-deployment-test/deployments/blue",
    "operator": "GreaterThan",
    "statistic": "Average",
    "threshold": 50.0,
    "timeAggregation": "Average",
    "timeGrain": "PT1M",
    "timeWindow": "PT5M"
  },
  "scaleAction": {
    "cooldown": "PT5M",
    "direction": "Increase",
    "type": "ChangeCount",
    "value": "2"
  }
}


In [47]:
!az monitor autoscale rule create \
  --autoscale-name autoscale-deployment-test \
  --condition "CpuUtilizationPercentage < 25 avg 5m" \
  --scale in 1

{
  "metricTrigger": {
    "dividePerInstance": false,
    "metricName": "CpuUtilizationPercentage",
    "metricNamespace": "",
    "metricResourceUri": "/subscriptions/781b03e7-6eb7-4506-bab8-cf3a0d89b1d4/resourceGroups/aigbb-aml-bootcamp/providers/Microsoft.MachineLearningServices/workspaces/aigbb-aml-bootcamp/onlineEndpoints/geico-deployment-test/deployments/blue",
    "operator": "LessThan",
    "statistic": "Average",
    "threshold": 25.0,
    "timeAggregation": "Average",
    "timeGrain": "PT1M",
    "timeWindow": "PT5M"
  },
  "scaleAction": {
    "cooldown": "PT5M",
    "direction": "Decrease",
    "type": "ChangeCount",
    "value": "1"
  }
}


In [48]:
!az monitor autoscale rule list --autoscale-name autoscale-deployment-test

[
  {
    "index": 0,
    "metricTrigger": {
      "dividePerInstance": false,
      "metricName": "CpuUtilizationPercentage",
      "metricNamespace": "",
      "metricResourceUri": "/subscriptions/781b03e7-6eb7-4506-bab8-cf3a0d89b1d4/resourceGroups/aigbb-aml-bootcamp/providers/Microsoft.MachineLearningServices/workspaces/aigbb-aml-bootcamp/onlineEndpoints/geico-deployment-test/deployments/blue",
      "operator": "GreaterThan",
      "statistic": "Average",
      "threshold": 50.0,
      "timeAggregation": "Average",
      "timeGrain": "PT1M",
      "timeWindow": "PT5M"
    },
    "scaleAction": {
      "cooldown": "PT5M",
      "direction": "Increase",
      "type": "ChangeCount",
      "value": "2"
    }
  },
  {
    "index": 1,
    "metricTrigger": {
      "dividePerInstance": false,
      "metricName": "CpuUtilizationPercentage",
      "metricNamespace": "",
      "metricResourceUri": "/subscriptions/781b03e7-6eb7-4506-bab8-cf3a0d89b1d4/resourceGroups/aigbb-aml-bootcamp/providers

In [49]:
!az monitor autoscale rule delete --autoscale-name autoscale-deployment-test --index=*

In [50]:
!az monitor autoscale rule create \
  --autoscale-name autoscale-deployment-test \
  --condition "CpuUtilizationPercentage > 30 avg 5m" \
  --scale out 2

{
  "metricTrigger": {
    "dividePerInstance": false,
    "metricName": "CpuUtilizationPercentage",
    "metricNamespace": "",
    "metricResourceUri": "/subscriptions/781b03e7-6eb7-4506-bab8-cf3a0d89b1d4/resourceGroups/aigbb-aml-bootcamp/providers/Microsoft.MachineLearningServices/workspaces/aigbb-aml-bootcamp/onlineEndpoints/geico-deployment-test/deployments/blue",
    "operator": "GreaterThan",
    "statistic": "Average",
    "threshold": 30.0,
    "timeAggregation": "Average",
    "timeGrain": "PT1M",
    "timeWindow": "PT5M"
  },
  "scaleAction": {
    "cooldown": "PT5M",
    "direction": "Increase",
    "type": "ChangeCount",
    "value": "2"
  }
}




In [51]:
!az monitor autoscale rule create \
  --autoscale-name autoscale-deployment-test \
  --condition "CpuUtilizationPercentage < 15 avg 5m" \
  --scale in 1

{
  "metricTrigger": {
    "dividePerInstance": false,
    "metricName": "CpuUtilizationPercentage",
    "metricNamespace": "",
    "metricResourceUri": "/subscriptions/781b03e7-6eb7-4506-bab8-cf3a0d89b1d4/resourceGroups/aigbb-aml-bootcamp/providers/Microsoft.MachineLearningServices/workspaces/aigbb-aml-bootcamp/onlineEndpoints/geico-deployment-test/deployments/blue",
    "operator": "LessThan",
    "statistic": "Average",
    "threshold": 15.0,
    "timeAggregation": "Average",
    "timeGrain": "PT1M",
    "timeWindow": "PT5M"
  },
  "scaleAction": {
    "cooldown": "PT5M",
    "direction": "Decrease",
    "type": "ChangeCount",
    "value": "1"
  }
}


## Check the status of the endpoint
You can check the status of the endpoint to see whether the model was deployed without error:

In [17]:
# return an object that contains metadata for the endpoint
endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)

# print a selection of the endpoint's metadata
print(
    f"Name: {endpoint.name}\nStatus: {endpoint.provisioning_state}\nDescription: {endpoint.description}"
)

Name: geico-deployment-test
Status: Succeeded
Description: this is an online endpoint


In [10]:
# existing traffic details
print(endpoint.traffic)

# Get the scoring URI
print(endpoint.scoring_uri)

{'blue': 100}
https://credit-endpoint-f7daf59c.eastus.inference.ml.azure.com/score


## Test the endpoint with sample data

Now that the model is deployed to the endpoint, you can run inference with it. Let's create a sample request file following the design expected in the run method in the scoring script.

In [None]:
import os

# Create a directory to store the sample request file.
deploy_dir = "./deploy"
os.makedirs(deploy_dir, exist_ok=True)

Now, create the file in the deploy directory. The cell below uses IPython magic to write the file into the directory you just created.

In [None]:
%%writefile {deploy_dir}/sample-request.json
{
  "input_data": {
    "columns": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22],
    "index": [0, 1],
    "data": [
            [20000,2,2,1,24,2,2,-1,-1,-2,-2,3913,3102,689,0,0,0,0,689,0,0,0,0],
            [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 10, 9, 8]
            ]
                }
}

Using the `MLClient` created earlier, we'll get a handle to the endpoint. The endpoint can be invoked using the `invoke` command with the following parameters:

* `endpoint_name` - Name of the endpoint
* `request_file` - File with request data
* `deployment_name` - Name of the specific deployment to test in an endpoint

We'll test the blue deployment with the sample data.

In [11]:
# test the blue deployment with the sample data
ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    deployment_name="blue",
    request_file="./deploy/sample-request.json",
)

'[1, 0]'

## Get logs of the deployment
Check the logs to see whether the endpoint/deployment were invoked successfully
If you face errors, see [Troubleshooting online endpoints deployment](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-troubleshoot-online-endpoints?tabs=cli).

In [12]:
logs = ml_client.online_deployments.get_logs(
    name="blue", endpoint_name=online_endpoint_name, lines=50
)
print(logs)

Instance status:
SystemSetup: Succeeded
UserContainerImagePull: Succeeded
ModelDownload: Succeeded
UserContainerStart: Succeeded

Container events:
Kind: Pod, Name: ContainerReady, Type: Normal, Time: 2023-09-06T21:20:43.95521617Z, Message: Container is ready

Container logs:
Health Port: 31311
Application Insights Enabled: false
Application Insights Key: None
Inferencing HTTP server version: azmlinfsrv/0.8.4.1
CORS for the specified origins: None
Create dedicated endpoint for health: None


Server Routes
---------------
Liveness Probe: GET   127.0.0.1:31311/
Score:          POST  127.0.0.1:31311/score

2023-09-06 21:20:39,890 I [699] azmlinfsrv - AML_FLASK_ONE_COMPATIBILITY is set. Patched Flask to ensure compatibility with Flask 1.
Initializing logger
2023-09-06 21:20:39,892 I [699] azmlinfsrv - Starting up app insights client
2023-09-06 21:20:41,199 I [699] azmlinfsrv.user_script - Found user script at /var/mlflow_resources/mlflow_score_script.py
2023-09-06 21:20:41,199 I [699] azml

## Create a second deployment 
Deploy the model as a second deployment called `green`. In practice, you can create several deployments and compare their performance. These deployments could use a different version of the same model, a completely different model, or a more powerful compute instance. In our example, you'll deploy the same model version using a more powerful compute instance that could potentially improve performance.

In [None]:
# picking the model to deploy. Here we use the latest version of our registered model
model = ml_client.models.get(name=registered_model_name, version=latest_model_version)

# define an online deployment using a more powerful instance type
green_deployment = ManagedOnlineDeployment(
    name="green",
    endpoint_name=online_endpoint_name,
    model=model,
    instance_type="Standard_F4s_v2",
    instance_count=1,
)

# create the online deployment
# expect the deployment to take approximately 8 to 10 minutes
green_deployment = ml_client.online_deployments.begin_create_or_update(
    green_deployment
).result()

## Scale deployment to handle more traffic

Using the `MLClient` created earlier, we'll get a handle to the `green` deployment. The deployment can be scaled by increasing or decreasing the `instance_count`.

In the following code, you'll increase the VM instance manually. However, note that it is also possible to autoscale online endpoints. Autoscale automatically runs the right amount of resources to handle the load on your application. Managed online endpoints support autoscaling through integration with the Azure monitor autoscale feature. To configure autoscaling, see [autoscale online endpoints](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-autoscale-endpoints?tabs=python).

In [None]:
# update definition of the deployment
green_deployment.instance_count = 2

# update the deployment
# expect the deployment to take approximately 8 to 10 minutes
ml_client.online_deployments.begin_create_or_update(green_deployment).result()

## Update traffic allocation for deployments
You can split production traffic between deployments. You may first want to test the `green` deployment with sample data, just like you did for the `blue` deployment. Once you've tested your green deployment, allocate a small percentage of traffic to it.

In [None]:
endpoint.traffic = {"blue": 80, "green": 20}
ml_client.online_endpoints.begin_create_or_update(endpoint).result()

You can test traffic allocation by invoking the endpoint several times:

In [None]:
# You can invoke the endpoint several times
for i in range(30):
    ml_client.online_endpoints.invoke(
        endpoint_name=online_endpoint_name,
        request_file="./deploy/sample-request.json",
    )

Show logs from the `green` deployment to check that there were incoming requests and the model was scored successfully. 

In [None]:
logs = ml_client.online_deployments.get_logs(
    name="green", endpoint_name=online_endpoint_name, lines=50
)
print(logs)

## View metrics using Azure Monitor
You can view various metrics (request numbers, request latency, network bytes, CPU/GPU/Disk/Memory utilization, and more) for an online endpoint and its deployments by following links from the endpoint's **Details** page in the studio. Following these links will take you to the exact metrics page in the Azure portal for the endpoint or deployment.

![metrics page 1](./media/deployment-metrics-from-endpoint-details-page.png)


If you open the metrics for the online endpoint, you can set up the page to see metrics such as the average request latency as shown in the following figure.

![metrics page 2](./media/view-endpoint-metrics-in-azure-portal.png)

For more information on how to view online endpoint metrics, see [Monitor online endpoints](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-monitor-online-endpoints#metrics).

## Send all traffic to the new deployment
Once you're fully satisfied with your `green` deployment, switch all traffic to it.

In [None]:
endpoint.traffic = {"blue": 0, "green": 100}
ml_client.begin_create_or_update(endpoint).result()

## Delete the old deployment
Remove the old (blue) deployment:

In [None]:
ml_client.online_deployments.begin_delete(
    name="blue", endpoint_name=online_endpoint_name
).result()

## Clean up resources

If you aren't going use the endpoint and deployment after completing this tutorial, you should delete them.

> [!NOTE]
> Expect the complete deletion to take approximately 20 minutes.

In [None]:
ml_client.online_endpoints.begin_delete(name=online_endpoint_name).result()