# Deploy and score a machine learning model by using an online endpoint 

Learn how to use an online endpoint to deploy your model, so you don't have to create and manage the underlying infrastructure. You'll begin by deploying a model on your local machine to debug any errors, and then you'll deploy and test it in Azure.

For more information, see [What are Azure Machine Learning endpoints?](https://docs.microsoft.com/azure/machine-learning/concept-endpoints).

## Prerequisites

* To use Azure Machine Learning, you must have an Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://azure.microsoft.com/free/).

* Install and configure the [Python SDK v2](sdk/setup.sh).

* You must have an Azure resource group, and you (or the service principal you use) must have Contributor access to it.

* You must have an Azure Machine Learning workspace. 

* To deploy locally, you must install Docker Engine on your local computer. We highly recommend this option, so it's easier to debug issues.

# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1. Import the required libraries

In [1]:
# import required libraries
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
    KubernetesOnlineEndpoint,
    KubernetesOnlineDeployment,
    Model,
    Environment,
    CodeConfiguration,
)
from azure.identity import DefaultAzureCredential

## 1.2. Configure workspace details and get a handle to the workspace

To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ai.ml` to get a handle to the required Azure Machine Learning workspace. We use the default [interactive authentication](https://docs.microsoft.com/python/api/azure-identity/azure.identity.interactivebrowsercredential?view=azure-python) for this tutorial. More advanced connection methods can be found [here](https://docs.microsoft.com/python/api/azure-identity/azure.identity?view=azure-python).

In [2]:
# enter details of your AML workspace
subscription_id = "4a571c1c-a483-4a43-9930-490479d70db0"
resource_group = "Learn_MLOps"
workspace = "MLOs_WS"

In [3]:
# get a handle to the workspace
ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)

## Deploy and debug locally by using local endpoints

### Note
* To deploy locally, [Docker Engine](https://docs.docker.com/engine/install/) must be installed.
* Docker Engine must be running. Docker Engine typically starts when the computer starts. If it doesn't, you can [troubleshoot Docker Engine](https://docs.docker.com/config/daemon/#start-the-daemon-manually).

# 2. Create local endpoint and deployment

## 2.1 Create local endpoint

The goal of a local endpoint deployment is to validate and debug your code and configuration before you deploy to Azure. Local deployment has the following limitations:
* Local endpoints *do not support* traffic rules, authentication, or probe settings.
* Local endpoints support only one deployment per endpoint

In [4]:
# Creating a local endpoint
import datetime

local_endpoint_name = "local-" + datetime.datetime.now().strftime("%m%d%H%M%f")

# create an online endpoint
endpoint = KubernetesOnlineEndpoint(
    name=local_endpoint_name, description="this is a sample local endpoint"
)

In [None]:
# Install docker package in the current Jupyter kernel
import sys

!{sys.executable} -m pip install docker

In [5]:
ml_client.online_endpoints.begin_create_or_update(endpoint, local=True)

Creating local endpoint (local-01041658046757) Done (0m 0s)
Field 'mirror_traffic': This is an experimental field, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


ManagedOnlineEndpoint({'public_network_access': None, 'provisioning_state': None, 'scoring_uri': None, 'openapi_uri': None, 'name': 'local-01041658046757', 'description': 'this is a sample local endpoint', 'tags': {}, 'properties': {}, 'id': None, 'Resource__source_path': None, 'base_path': PosixPath('/home/azureuser/.azureml/inferencing/local-01041658046757'), 'creation_context': None, 'serialize': <msrest.serialization.Serializer object at 0x7fe67d8412d0>, 'auth_mode': 'key', 'location': None, 'identity': None, 'traffic': {}, 'mirror_traffic': {}, 'kind': None})

## 2.2 Create local deployment

The example contains all the files needed to deploy a model on an online endpoint. To deploy a model, you must have:

* Model files (or the name and version of a model that's already registered in your workspace). In the example, we have a scikit-learn model that does regression.
* The code that's required to score the model. In this case, we have a score.py file.
* An environment in which your model runs. As you'll see, the environment might be a Docker image with Conda dependencies, or it might be a Dockerfile.
* Settings to specify the instance type and scaling capacity.

### Key aspects of deployment 

- `name` - Name of the deployment.
- `endpoint_name` - Name of the endpoint to create the deployment under.
- `model` - The model to use for the deployment. This value can be either a reference to an existing versioned model in the workspace or an inline model specification.
- `environment` - The environment to use for the deployment. This value can be either a reference to an existing versioned environment in the workspace or an inline environment specification.
- `code_configuration` - the configuration for the source code and scoring script
    - `path`- Path to the source code directory for scoring the model
    - `scoring_script` - Relative path to the scoring file in the source code directory
- `instance_type` - The VM size to use for the deployment.
- `instance_count` - The number of instances to use for the deployment

In [6]:
model1 = Model(name='weather-aci-prediction',path='./model')
env = Environment(
    conda_file="./model/conda.yml",
    image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:latest",
)

blue_deployment = KubernetesOnlineDeployment(
    name="blue",
    endpoint_name=local_endpoint_name,
    model=model1,
    environment=env,
    code_configuration=CodeConfiguration(
        code="./", scoring_script="score.py"
    ),
    instance_count=1,
)

In [8]:
ml_client.online_deployments.begin_create_or_update(
    deployment=blue_deployment, local=True
)

Updating local deployment (local-01041658046757 / blue) .
Building Docker image from Dockerfile
Step 1/6 : FROM mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:latest
 ---> b48ef4ddb5f8
Step 2/6 : RUN mkdir -p /var/azureml-app/
 ---> Using cache
 ---> 75e5d409a221
Step 3/6 : WORKDIR /var/azureml-app/
 ---> Using cache
 ---> b2ea8302779c
Step 4/6 : COPY conda.yml /var/azureml-app/
 ---> Using cache
 ---> c58d000501ac
Step 5/6 : RUN conda env create -n inf-conda-env --file conda.yml
 ---> Using cache
 ---> e3812d4ee387
Step 6/6 : CMD ["conda", "run", "--no-capture-output", "-n", "inf-conda-env", "runsvdir", "/var/runit"]
 ---> Using cache
 ---> aa3198d2834c
Successfully built aa3198d2834c
Successfully tagged local-01041658046757:blue

Starting up endpoint.....Done (0m 30s)


KubernetesOnlineDeployment({'provisioning_state': 'Succeeded', 'endpoint_name': 'local-01041658046757', 'type': 'Kubernetes', 'name': 'blue', 'description': None, 'tags': {}, 'properties': {}, 'id': None, 'Resource__source_path': None, 'base_path': PosixPath('/mnt/batch/tasks/shared/LS_root/mounts/clusters/eddyhakz1/code/Learn_Mlops/06_Model_Deployment'), 'creation_context': None, 'serialize': <msrest.serialization.Serializer object at 0x7fe67c70cd00>, 'model': Model({'job_name': None, 'is_anonymous': False, 'auto_increment_version': True, 'name': 'weather-aci-prediction', 'description': None, 'tags': {}, 'properties': {}, 'id': None, 'Resource__source_path': None, 'base_path': PosixPath('/mnt/batch/tasks/shared/LS_root/mounts/clusters/eddyhakz1/code/Learn_Mlops/06_Model_Deployment'), 'creation_context': None, 'serialize': <msrest.serialization.Serializer object at 0x7fe67c70d330>, 'version': None, 'latest_version': None, 'path': '/mnt/batch/tasks/shared/LS_root/mounts/clusters/eddyhak

# 3. Verify the local deployment succeeded

## 3.1 Check the status to see whether the model was deployed without error

In [9]:
ml_client.online_endpoints.get(name=local_endpoint_name, local=True)

ManagedOnlineEndpoint({'public_network_access': None, 'provisioning_state': 'Succeeded', 'scoring_uri': 'http://localhost:49158/score', 'openapi_uri': None, 'name': 'local-01041658046757', 'description': 'this is a sample local endpoint', 'tags': {}, 'properties': {}, 'id': None, 'Resource__source_path': None, 'base_path': PosixPath('/mnt/batch/tasks/shared/LS_root/mounts/clusters/eddyhakz1/code/Learn_Mlops/06_Model_Deployment'), 'creation_context': None, 'serialize': <msrest.serialization.Serializer object at 0x7fe67c70efe0>, 'auth_mode': 'key', 'location': 'local', 'identity': None, 'traffic': {}, 'mirror_traffic': {}, 'kind': None})

## 3.2 Get logs

In [11]:
print(ml_client.online_deployments.get_logs(
    name="blue", endpoint_name=local_endpoint_name, local=True, lines=50
))

urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1669259737463/work
websocket-client==1.4.2
Werkzeug==2.2.2
wrapt==1.12.1
zipp==3.11.0

2023-01-04T17:04:31,432800076+00:00 | gunicorn/run | 
2023-01-04T17:04:31,434796587+00:00 | gunicorn/run | ###############################################
2023-01-04T17:04:31,436643589+00:00 | gunicorn/run | AzureML Inference Server
2023-01-04T17:04:31,438515993+00:00 | gunicorn/run | ###############################################
2023-01-04T17:04:31,440172885+00:00 | gunicorn/run | 
2023-01-04T17:04:32,306136121+00:00 | gunicorn/run | Starting AzureML Inference Server HTTP.

Azure ML Inferencing HTTP server v0.7.7


Server Settings
---------------
Entry Script Name: /var/azureml-app/06_Model_Deployment/score.py
Model Directory: /var/azureml-app/azureml-models//weather-aci-prediction/None
Worker Count: 1
Worker Timeout (seconds): 300
Server Port: 31311
Application Insights Enabled: false
Application Insights Key: None
Inferencing HT

## 3.3 Invoke the local endpoint
Invoke the endpoint to score the model by using the convenience command invoke and passing query parameters that are stored in a JSON file

In [12]:
ml_client.online_endpoints.invoke(
    endpoint_name=local_endpoint_name,
    request_file="./model/sample-request.json",
    local=True,
)

'[1]'

# 4. Configure Kubernetes cluster for machine learning
Next, configure Azure Kubernetes Service (AKS) and Azure Arc-enabled Kubernetes clusters for inferencing machine learning workloads.
There're some prerequisites for below steps, you can check them [here](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-attach-arc-kubernetes).

## 4.1 Connect an existing Kubernetes cluster to Azure Arc
This step is optional for [AKS cluster](https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough).
Follow this [guidance](https://docs.microsoft.com/en-us/azure/azure-arc/kubernetes/quickstart-connect-cluster) to connect Kubernetes clusters.

## 4.2 Deploy Azure Machine Learning extension
Depending on your network setup, Kubernetes distribution variant, and where your Kubernetes cluster is hosted (on-premises or the cloud), choose one of options to deploy the Azure Machine Learning extension and enable inferencing workloads on your Kubernetes cluster.
Follow this [guidance](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-attach-arc-kubernetes?tabs=studio#inferencing).

## 4.3 Attach Arc Cluster
You can use Studio, Python SDK and CLI to attach Arc cluster to Machine Learning workspace.
Below code shows the attachment of AKS that the compute type is managedClusters. For Arc connected cluster, it should be connectedClusters.
Follow this [guidance](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-attach-arc-kubernetes?tabs=studio#attach-arc-cluster) for more details.

In [None]:
from azure.ai.ml import load_compute

# for arc connected cluster, the resource_id should be something like '/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.ContainerService/connectedClusters/<CLUSTER_NAME>''
compute_params = [
    {"name": "<COMPUTE_NAME>"},
    {"type": "kubernetes"},
    {
        "resource_id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.ContainerService/managedClusters/<CLUSTER_NAME>"
    },
]
k8s_compute = load_compute(source=None, params_override=compute_params)

# !!!bug https://msdata.visualstudio.com/Vienna/_workitems/edit/1740311
ml_client.begin_create_or_update(k8s_compute).result()

# 5. Deploy your online endpoint to Azure
Next, deploy your online endpoint to Azure.

## 5.1 Configure online endpoint
`endpoint_name`: The name of the endpoint.

`auth_mode` : Use `key` for key-based authentication. Use `aml_token` for Azure Machine Learning token-based authentication. A `key` does not expire, but `aml_token` does expire. 

Optionally, you can add description, tags to your endpoint.

In [15]:
# Creating a unique endpoint name with current datetime to avoid conflicts
import datetime

online_endpoint_name = "k8s-endpoint-" + datetime.datetime.now().strftime("%m%d%H%M%f")

# create an online endpoint
endpoint = KubernetesOnlineEndpoint(
    name=online_endpoint_name,
    compute="",
    description="this is a sample online endpoint",
    auth_mode="key",
    tags={"foo": "bar"},
)

## 5.2 Create the endpoint

Using the `MLClient` created earlier, we will now create the Endpoint in the workspace. This command will start the endpoint creation and return a confirmation response while the endpoint creation continues.

In [16]:
ml_client.begin_create_or_update(endpoint).result()

HttpResponseError: (UserError) The given resource scope /subscriptions/4a571c1c-a483-4a43-9930-490479d70db0/resourceGroups/Learn_MLOps/providers/Microsoft.MachineLearningServices/workspaces/MLOs_WS/computes/ is not valid; scope should start like /subscriptions/<subscriptionId>/resourceGroups/<resourceGroup>/providers/Microsoft.MachineLearningServices/workspaces/<workspaceName>.
Code: UserError
Message: The given resource scope /subscriptions/4a571c1c-a483-4a43-9930-490479d70db0/resourceGroups/Learn_MLOps/providers/Microsoft.MachineLearningServices/workspaces/MLOs_WS/computes/ is not valid; scope should start like /subscriptions/<subscriptionId>/resourceGroups/<resourceGroup>/providers/Microsoft.MachineLearningServices/workspaces/<workspaceName>.
Additional Information:Type: ComponentName
Info: {
    "value": "managementfrontend"
}Type: Correlation
Info: {
    "value": {
        "operation": "06db3472507330bb525c0b07cde769c8",
        "request": "b725c861e50b062e"
    }
}Type: Environment
Info: {
    "value": "eastus"
}Type: Location
Info: {
    "value": "eastus"
}Type: Time
Info: {
    "value": "2023-01-06T13:44:26.0742842+00:00"
}Type: InnerError
Info: {
    "value": {
        "code": "BadArgument",
        "innerError": {
            "code": "ArgumentInvalid",
            "innerError": {
                "code": "ArmScopeStructureInvalid",
                "innerError": null
            }
        }
    }
}Type: MessageFormat
Info: {
    "value": "The given resource scope {scope} is not valid; scope should start like {expectedStructure}."
}Type: MessageParameters
Info: {
    "value": {
        "scope": "/subscriptions/4a571c1c-a483-4a43-9930-490479d70db0/resourceGroups/Learn_MLOps/providers/Microsoft.MachineLearningServices/workspaces/MLOs_WS/computes/",
        "expectedStructure": "/subscriptions/<subscriptionId>/resourceGroups/<resourceGroup>/providers/Microsoft.MachineLearningServices/workspaces/<workspaceName>"
    }
}

## 5.3 Configure online deployment

A deployment is a set of resources required for hosting the model that does the actual inferencing. We will create a deployment for our endpoint using the `KubernetesOnlineDeployment` class.

In [10]:
model1 = Model(name='weather-aci-prediction',path='./model')
env = Environment(
    conda_file="./model/conda.yml",
    image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:latest",
)

blue_deployment = KubernetesOnlineDeployment(
    name="blue",
    endpoint_name=online_endpoint_name,
    model=model1,
    environment=env,
    code_configuration=CodeConfiguration(
        code="./", scoring_script="score.py"
    ),
    instance_count=1,
)

## 5.4 Create the deployment

Using the `MLClient` created earlier, we will now create the deployment in the workspace. This command will start the deployment creation and return a confirmation response while the deployment creation continues.

In [11]:
ml_client.begin_create_or_update(blue_deployment).result()

Check: endpoint k8s-endpoint-01061319757821 exists
[32mUploading 06_Model_Deployment (0.45 MBs): 100%|██████████| 454199/454199 [00:00<00:00, 3508189.59it/s]
[39m



HttpResponseError: (UserError) Cannot create a K8S deployment in endpoint k8s-endpoint-01061319757821 because it is a Managed endpoint.
Code: UserError
Message: Cannot create a K8S deployment in endpoint k8s-endpoint-01061319757821 because it is a Managed endpoint.
Additional Information:Type: ComponentName
Info: {
    "value": "managementfrontend"
}Type: Correlation
Info: {
    "value": {
        "operation": "141b9b21ad16411d6d0d974be7e72d0b",
        "request": "69cc770323340bbd"
    }
}Type: Environment
Info: {
    "value": "eastus"
}Type: Location
Info: {
    "value": "eastus"
}Type: Time
Info: {
    "value": "2023-01-06T13:35:37.9820008+00:00"
}Type: InnerError
Info: {
    "value": {
        "code": "BadArgument",
        "innerError": {
            "code": "InvalidComputeTypeCombination",
            "innerError": null
        }
    }
}Type: MessageFormat
Info: {
    "value": "Cannot create a {deploymentComputeType} deployment in endpoint {endpointName} because it is a {endpointComputeType} endpoint."
}Type: MessageParameters
Info: {
    "value": {
        "deploymentComputeType": "K8S",
        "endpointName": "k8s-endpoint-01061319757821",
        "endpointComputeType": "Managed"
    }
}

Bad pipe message: %s [b'', b'7%\x82r\x97\x1c\xbbS\x1b\xe2\xe5\xe1<4\x00@ \x0c\x8b~u_ML\x7f\xdb\x87`\xbe$C\xaaJ\xf3\x17[s\xd8\xa4g\x0f\xbeA\x17\xa9H\xd0\xf9']
Bad pipe message: %s [b'\x08\x13\x02\x13\x03\x13\x01\x00\xff\x01\x00\x00\x8f\x00\x00\x00\x0e\x00\x0c\x00\x00\t127.0.0.1\x00\x0b\x00\x04\x03\x00\x01\x02\x00\n\x00\x0c\x00\n\x00\x1d\x00\x17\x00\x1e\x00\x19\x00\x18\x00#\x00\x00\x00\x16\x00\x00\x00\x17\x00\x00\x00\r\x00\x1e\x00\x1c\x04\x03\x05\x03\x06\x03\x08\x07\x08\x08\x08\t\x08\n\x08\x0b\x08\x04\x08\x05\x08\x06\x04\x01\x05\x01\x06\x01\x00+\x00\x03\x02\x03\x04\x00-\x00\x02\x01\x01\x003\x00&\x00$\x00\x1d\x00 \xa6=\xd9\xaa\xc0\x05']
Bad pipe message: %s [b'H', b'sr\x06\x9b\xe9\x1d{\\S\xa549\xb5\x06\xdb\x00\x00\xa6\xc0,\xc00\x00\xa3\x00\x9f\xcc\xa9\xcc\xa8\xcc\xaa\xc0\xaf\xc0\xad\xc0\xa3\xc0\x9f\xc0]\xc0a\xc0W\xc0S\xc0+\xc0/\x00\xa2\x00\x9e\xc0\xae\xc0\xac\xc0\xa2\xc0\x9e\xc0\\\xc0`\xc0V\xc0R\xc0']
Bad pipe message: %s [b"(\x00k\x00j\xc0s\xc0w\x00\xc4\x00\xc3\xc0#\xc0'\x00g\x00@\xc0r\x

In [None]:
# blue deployment takes 100 traffic
endpoint.traffic = {"blue": 100}
ml_client.begin_create_or_update(endpoint).result()

# 6. Test the endpoint with sample data
Using the `MLClient` created earlier, we will get a handle to the endpoint. The endpoint can be invoked using the `invoke` command with the following parameters:
- `endpoint_name` - Name of the endpoint
- `request_file` - File with request data
- `deployment_name` - Name of the specific deployment to test in an endpoint

We will send a sample request using a [json](./model-1/sample-request.json) file. 

In [None]:
# test the blue deployment with some sample data
# comment this out as cluster under dev subscription can't be accessed from public internet.
# ml_client.online_endpoints.invoke(
#    endpoint_name=online_endpoint_name,
#    deployment_name='blue',
#    request_file='../model-1/sample-request.json')

# 7. Managing endpoints and deployments

## 7.1 Get details of the endpoint

In [None]:
# Get the details for online endpoint
endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)

# existing traffic details
print(endpoint.traffic)

# Get the scoring URI
print(endpoint.scoring_uri)

## 7.2 Get the logs for the new deployment
Get the logs for the green deployment and verify as needed

In [None]:
ml_client.online_deployments.get_logs(
    name="blue", endpoint_name=online_endpoint_name, lines=50
)

# 8. Delete the endpoint


In [12]:
ml_client.online_endpoints.begin_delete(name=online_endpoint_name)

<azure.core.polling._poller.LROPoller at 0x7fd600784c40>

...