# Deploying a Model as an Endpoint 

### What is an Endpoint? 🤔

Azureml allows you to deploy models as endpoints. This means that you can send data to the endpoint and get a prediction back. In order to do this, you need to create a scoring script that will be used to make predictions, and an endpoint configuration that will define the compute resources that will be used to serve the endpoint. 

We will be making use of the `KubernetesOnlineEndpoint` class to deploy the model as an endpoint. This class will deploy the model to the already running K8s cluster. The class will also create a service that will expose the model as an endpoint. We use this instead of the `ManagedOnlineEndpoint` class because the `ManagedOnlineEndpoint` class will create Azure compute instances to serve the endpoint, which is more expensive and less performant than using the already running K8s cluster. 

### Local Deployment 🏠

We will begin by deploying the model locally. This is useful for testing the endpoint before deploying it to the cloud. It is also useful for debugging the endpoint, and for testing the endpoint with a small amount of data. It will make the deployment process faster and easier.

First, we will create the endpoint configuration. This will define the compute resources that will be used to serve the endpoint.

**_Note: The local deployment will only work if you have Docker installed on your machine, and the Docker daemon is running._**

- Import the necessary libraries and modules.

In [1]:
# import required libraries
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
    KubernetesOnlineEndpoint,
    KubernetesOnlineDeployment,
    Model,
    Environment,
    CodeConfiguration,
)
from azure.identity import DefaultAzureCredential, ClientSecretCredential
from azure.ai.ml.entities._deployment.resource_requirements_settings import (
    ResourceRequirementsSettings,
)
from azure.ai.ml.entities._deployment.container_resource_settings import (
    ResourceSettings,
)



- Connect to the MLClient 

In [2]:
subscription_id = "0a94de80-6d3b-49f2-b3e9-ec5818862801"
resource_group = "buas-y2"
workspace_name = "Staff-Test"
tenant_id = "0a33589b-0036-4fe8-a829-3ed0926af886"
client_id = "a2230f31-0fda-428d-8c5c-ec79e91a49f5"
client_secret = "Y-q8Q~H63btsUkR7dnmHrUGw2W0gMWjs0MxLKa1C"

credential = ClientSecretCredential(tenant_id, client_id, client_secret)
# get a handle to the workspace
ml_client = MLClient(
    credential, subscription_id, resource_group, workspace_name
)

- Create the endpoint configuration for the local deployment 

In [3]:
# Creating a local endpoint
import datetime

local_endpoint_name = "local-" + datetime.datetime.now().strftime("%m%d%H%M%f")

# create an online endpoint
endpoint = KubernetesOnlineEndpoint(
    name=local_endpoint_name, description="this is a sample local endpoint"
)

In [4]:
print(f"Creating local endpoint: {local_endpoint_name}")

Creating local endpoint: local-06201921868040


- Install the docker package

In [5]:
# Install docker package in the current Jupyter kernel
import sys

!{sys.executable} -m pip install docker



- Create a local endpoint, using the `local` deployment target. This will create a local endpoint that will run on your machine using Docker.

In [6]:
ml_client.online_endpoints.begin_create_or_update(endpoint, local=True)

Creating local endpoint (local-06201921868040) .Done (0m 5s)


ManagedOnlineEndpoint({'public_network_access': None, 'provisioning_state': None, 'scoring_uri': None, 'openapi_uri': None, 'name': 'local-06201921868040', 'description': 'this is a sample local endpoint', 'tags': {}, 'properties': {}, 'print_as_yaml': False, 'id': None, 'Resource__source_path': '', 'base_path': WindowsPath('C:/Users/Soheil/.azureml/inferencing/local-06201921868040'), 'creation_context': None, 'serialize': <msrest.serialization.Serializer object at 0x0000018C4CA57610>, 'auth_mode': 'key', 'location': None, 'identity': None, 'traffic': {}, 'mirror_traffic': {}, 'kind': None})

- Now create the deployment config for the local endpoint. This will be used to create a service that will expose the model as an endpoint. You need to specfiy the model, the environment, the scoring script, and the endpoint configuration. The model needs to be a local model for this to work locally.

**_Note: The environment and the scoring script are the main sources of error at this stage. You will need to make sure the environment has all the pre-requisites and that all of your path definitions for scoring are correct. Extensive logging is recommended for debugging._**

In [7]:
from azure.ai.ml.entities import Environment
import os

print(os.listdir('./models/weights/'))
model = Model(
    # path="../models/download/model.keraenv
    path = "./models/weights/baseline_weights.pt"
)
env = Environment(
    conda_file="./environment/environment.yml",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest"  #"deanis/azure-gpu-inference"#"tensorflow/tensorflow:latest-gpu"#"mcr.microsoft.com/azureml/curated/tensorflow-2.16-cuda11:4"#"mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
)

print(os.getcwd())
print(os.listdir('./src/emotion_clf_pipeline'))

blue_deployment = KubernetesOnlineDeployment(
    name="blue",
    endpoint_name=local_endpoint_name,
    model=model,
    environment=env,
    code_configuration=CodeConfiguration(
        code="./src/emotion_clf_pipeline", 
        scoring_script="scoring.py"
    ),
    instance_count=1,
    resources=ResourceRequirementsSettings(
        requests=ResourceSettings(
            cpu="100m",
            memory="0.5Gi",
            gpu="1",
        ),
    ),
)

['baseline_weights.pt', 'current_run_temp_weights', 'emotion-clf-dynamic', 'model_config.json', 'sync_status.json']
x:\University\2024-25d-fai2-adsai-group-nlp6
['api.py', 'azure_endpoint.py', 'azure_pipeline.py', 'azure_score.py', 'azure_sync.py', 'cli.py', 'data.py', 'features.py', 'hyperparameter_tuning.py', 'model.py', 'predict.py', 'stt.py', 'train.py', 'transcript.py', 'transcript_translator.py', 'translator.py', '__init__.py', '__pycache__']


- Next, deploy the model as an endpoint, using the `local` flag. This will deploy the model to the local endpoint that was created earlier.

In [8]:
# Pre-deployment troubleshooting
import docker

# Check Docker daemon status
try:
    client = docker.from_env()
    print("Docker daemon is running")
    print("Docker version:", client.version())
except Exception as e:
    print(f"Docker issue: {e}")

# Test network connectivity to MCR
import urllib.request
try:
    response = urllib.request.urlopen('https://mcr.microsoft.com', timeout=10)
    print("MCR connectivity: OK")
except Exception as e:
    print(f"MCR connectivity issue: {e}")

# Verify scoring script exists
scoring_script_path = "./src/emotion_clf_pipeline/azure_score.py"
if os.path.exists(scoring_script_path):
    print(f"✓ Scoring script found: {scoring_script_path}")
else:
    print(f"✗ Scoring script missing: {scoring_script_path}")

Docker daemon is running
Docker version: {'Platform': {'Name': 'Docker Desktop 4.42.1 (196648)'}, 'Components': [{'Name': 'Engine', 'Version': '28.2.2', 'Details': {'ApiVersion': '1.50', 'Arch': 'amd64', 'BuildTime': '2025-05-30T12:07:26.000000000+00:00', 'Experimental': 'false', 'GitCommit': '45873be', 'GoVersion': 'go1.24.3', 'KernelVersion': '6.6.87.1-microsoft-standard-WSL2', 'MinAPIVersion': '1.24', 'Os': 'linux'}}, {'Name': 'containerd', 'Version': '1.7.27', 'Details': {'GitCommit': '05044ec0a9a75232cad458027ca83437aae3f4da'}}, {'Name': 'runc', 'Version': '1.2.5', 'Details': {'GitCommit': 'v1.2.5-0-g59923ef'}}, {'Name': 'docker-init', 'Version': '0.19.0', 'Details': {'GitCommit': 'de40ad0'}}], 'Version': '28.2.2', 'ApiVersion': '1.50', 'MinAPIVersion': '1.24', 'GitCommit': '45873be', 'GoVersion': 'go1.24.3', 'Os': 'linux', 'Arch': 'amd64', 'KernelVersion': '6.6.87.1-microsoft-standard-WSL2', 'BuildTime': '2025-05-30T12:07:26.000000000+00:00'}
MCR connectivity issue: <urlopen erro

In [9]:
ml_client.online_deployments.begin_create_or_update(
    deployment=blue_deployment, local=True
)

Creating local deployment (local-06201921868040 / blue) .
Building Docker image from Dockerfile


Step 1/6 : FROM mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest
failed to resolve reference "mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest": failed to do request: Head "https://mcr.microsoft.com/v2/azureml/openmpi4.1.0-ubuntu20.04/manifests/latest": EOFDone (0m 5s)


LocalEndpointImageBuildError: Building the local endpoint image failed with error: failed to resolve reference "mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest": failed to do request: Head "https://mcr.microsoft.com/v2/azureml/openmpi4.1.0-ubuntu20.04/manifests/latest": EOF

In [None]:
ml_client.online_deployments.begin_create_or_update(
    deployment=blue_deployment, local=True
)

Creating local deployment (local-06190730925377 / blue) .
Building Docker image from Dockerfile
Step 1/6 : FROM mcr.microsoft.com/azureml/curated/tensorflow-2.16-cuda11:4
 ---> 6d0ec349c317
Step 2/6 : RUN mkdir -p /var/azureml-app/
 ---> Using cache
 ---> a6fb8a24a661
Step 3/6 : WORKDIR /var/azureml-app/
 ---> Using cache
 ---> 351579f60e7e
Step 4/6 : COPY conda.yml /var/azureml-app/
 ---> Using cache
 ---> ea5d11993718
Step 5/6 : RUN conda env create -n inf-conda-env --file conda.yml
 ---> Using cache
 ---> be1157b1d1bc
Step 6/6 : CMD ["conda", "run", "--no-capture-output", "-n", "inf-conda-env", "runsvdir", "/var/runit"]
 ---> Using cache
 ---> f80d2ed2858f
Successfully built f80d2ed2858f
Successfully tagged local-06190730925377:blue

Starting up endpoint...Done (0m 20s)


KubernetesOnlineDeployment({'provisioning_state': 'Succeeded', 'endpoint_name': 'local-06190730925377', 'type': 'Kubernetes', 'name': 'blue', 'description': None, 'tags': {}, 'properties': {}, 'print_as_yaml': False, 'id': None, 'Resource__source_path': '', 'base_path': WindowsPath('c:/Users/deanv/Dropbox/0_Buas/2023-2024/y2D/Azure Content Testing/Example-App-master/Example-App-master/Notebooks'), 'creation_context': None, 'serialize': <msrest.serialization.Serializer object at 0x000001A8E7579BA0>, 'model': Model({'job_name': None, 'intellectual_property': None, 'is_anonymous': False, 'auto_increment_version': False, 'auto_delete_setting': None, 'name': '6cb2a977a20c09f3db4caf0af16a5008', 'description': None, 'tags': {}, 'properties': {}, 'print_as_yaml': False, 'id': None, 'Resource__source_path': '', 'base_path': WindowsPath('c:/Users/deanv/Dropbox/0_Buas/2023-2024/y2D/Azure Content Testing/Example-App-master/Example-App-master/Notebooks'), 'creation_context': None, 'serialize': <msr

- You can check the status of the deployment by looking at the logs of the docker container that was created. In docker desktop, you can do this by clicking on the container, and then clicking on the logs tab. The container will be named something like `local-06160804621045.blue`.

- You can also check the status of the deployment by looking at the logs of the deployment with the `ml_client`. You can do this by calling the `get_logs` method on the deployment object.

In [35]:
status = ml_client.online_endpoints.get(name=local_endpoint_name, local=True)
print(status)

auth_mode: key
description: this is a sample local endpoint
location: local
mirror_traffic: {}
name: local-06190730925377
properties: {}
provisioning_state: Succeeded
scoring_uri: http://localhost:32776/score
tags: {}
traffic: {}



In [37]:
logs = ml_client.online_deployments.get_logs(
    name="blue", endpoint_name=local_endpoint_name, local=True, lines=500
)

print(logs)


== CUDA ==

CUDA Version 11.8.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

2024-06-19T05:30:37,166462030+00:00 - rsyslog/run 
2024-06-19T05:30:37,178807446+00:00 - gunicorn/run 
2024-06-19T05:30:37,183517686+00:00 | gunicorn/run | 
2024-06-19T05:30:37,184568390+00:00 - nginx/run 
2024-06-19T05:30:37,187054749+00:00 | gunicorn/run | ###############################################
2024-06-19T05:30:37,189716981+00:00 | gunicorn/run | Az

### Local Testing 🧪

- If the deployment is successful, you can test the endpoint by sending data to it. We will do this by sending a request to the endpoint using the `invoke` method on the deployment object. This will send a request to the endpoint, and return the response. The format of the request and response will depend on the scoring script that was used to create the endpoint. In this case the scoring script is `scoring.py`, which expects a JSON object with a key `data` that contains the data to be scored. The data is a base64 encoded image.

In [38]:
ml_client.online_endpoints.invoke(
    endpoint_name=local_endpoint_name,
    request_file="sample-request4.json",
    local=True,
)

'"4"'

# Cloud Deployment 🌐

Once the local deployment is successful, we can deploy the model to the cloud. This is useful for serving the model to a large number of users, and for making the model accessible from anywhere. It is also useful for deploying the model to a production environment.

- First we need to create the endpoint configuration for the cloud deployment. This will define thename, authorisation method, and compute resources that will be used to serve the endpoint. We will use the `KubernetesOnlineEndpoint` class to deploy the model to the already running K8s cluster. This class will create a service that will expose the model as an endpoint. We use this instead of the `ManagedOnlineEndpoint` class because the `ManagedOnlineEndpoint` class will create Azure compute instances to serve the endpoint, which is more expensive and less performant than using the already running K8s cluster.

In [51]:
# Creating a unique endpoint name with current datetime to avoid conflicts
import datetime

online_endpoint_name = "k8s-endpoint-" + datetime.datetime.now().strftime("%m%d%H%M%f")

# create an online endpoint
endpoint = KubernetesOnlineEndpoint(
    name=online_endpoint_name,
    compute="adsai1",
    description="this is a sample online endpoint",
    auth_mode="key",
    tags={"Type": "Review Session"},
)

In [52]:
ml_client.begin_create_or_update(endpoint).result()

KubernetesOnlineEndpoint({'provisioning_state': 'Succeeded', 'scoring_uri': 'http://194.171.191.227:30397/api/v1/endpoint/k8s-endpoint-06190829258926/score', 'openapi_uri': 'http://194.171.191.227:30397/api/v1/endpoint/k8s-endpoint-06190829258926/swagger.json', 'name': 'k8s-endpoint-06190829258926', 'description': 'this is a sample online endpoint', 'tags': {'Type': 'Review Session'}, 'properties': {'createdBy': 'a2230f31-0fda-428d-8c5c-ec79e91a49f5', 'createdAt': '2024-06-19T06:29:52.864258+0000', 'lastModifiedAt': '2024-06-19T06:29:52.864258+0000', 'azureml.onlineendpointid': '/subscriptions/0a94de80-6d3b-49f2-b3e9-ec5818862801/resourcegroups/buas-y2/providers/microsoft.machinelearningservices/workspaces/staff-test/onlineendpoints/k8s-endpoint-06190829258926', 'AzureAsyncOperationUri': 'https://management.azure.com/subscriptions/0a94de80-6d3b-49f2-b3e9-ec5818862801/providers/Microsoft.MachineLearningServices/locations/westeurope/mfeOperationsStatus/oeidp:798f953a-277e-4ed7-90e2-0e1cd

- Next we need to create the deployment config for the cloud deployment. This will be used to create a service that will expose the model as an endpoint. You need to specfiy the model, the environment, the scoring script, and the endpoint configuration. We can now use registered models (and environments) for this deployment.

In [53]:
from azure.ai.ml.entities import Environment
import os

env = Environment(
    # conda_file="../../../example_conda.yml",
    image="mcr.microsoft.com/azureml/curated/tensorflow-2.16-cuda11:4"#"deanis/azure-gpu-inference"#"mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
)

registered_model_name = "example_2"
latest_model_version = 2
registered_environment_name = "endpoint_env_inference"
latest_environment_version = 2

model = ml_client.models.get(name=registered_model_name, version=latest_model_version)
# env = ml_client.environments.get(name=registered_environment_name, version=latest_environment_version)

print(os.getcwd())
print(os.listdir('../src/number_predictor'))

blue_deployment = KubernetesOnlineDeployment(
    name="blue",
    endpoint_name=online_endpoint_name,
    model=model,
    environment=env,
    code_configuration=CodeConfiguration(
        code="../src/number_predictor", scoring_script="scoring.py"
    ),
    instance_count=1,
    resources=ResourceRequirementsSettings(
        requests=ResourceSettings(
            cpu="100m",
            memory="0.5Gi",
        ),
        # limits=ResourceSettings(
        #     cpu="1",
        #     memory="2Gi",
        #     gpu=1,
        # ),
    ),
)

c:\Users\deanv\Dropbox\0_Buas\2023-2024\y2D\Azure Content Testing\Example-App-master\Example-App-master\Notebooks
['app.py', 'azure_utils', 'evaluate.py', 'load_data.py', 'model.py', 'models', 'predict.py', 'register.py', 'scoring.py', 'train.py', '__init__.py', '__pycache__']


- We can now deploy the model as an endpoint, **without** the `local` flag. This will deploy the model to the cloud endpoint that was created earlier.

In [54]:
ml_client.begin_create_or_update(blue_deployment).result()

Check: endpoint k8s-endpoint-06190829258926 exists
[32mUploading number_predictor (4.34 MBs): 100%|##########| 4340323/4340323 [00:01<00:00, 2659462.35it/s]
[39m



.........................

KubernetesOnlineDeployment({'provisioning_state': 'Succeeded', 'endpoint_name': 'k8s-endpoint-06190829258926', 'type': 'Kubernetes', 'name': 'blue', 'description': None, 'tags': {}, 'properties': {'AzureAsyncOperationUri': 'https://management.azure.com/subscriptions/0a94de80-6d3b-49f2-b3e9-ec5818862801/providers/Microsoft.MachineLearningServices/locations/westeurope/mfeOperationsStatus/odidp:798f953a-277e-4ed7-90e2-0e1cd6d888eb:919dfcea-9109-4ad1-a385-b3434693594b?api-version=2023-04-01-preview'}, 'print_as_yaml': False, 'id': '/subscriptions/0a94de80-6d3b-49f2-b3e9-ec5818862801/resourceGroups/buas-y2/providers/Microsoft.MachineLearningServices/workspaces/Staff-Test/onlineEndpoints/k8s-endpoint-06190829258926/deployments/blue', 'Resource__source_path': '', 'base_path': 'c:\\Users\\deanv\\Dropbox\\0_Buas\\2023-2024\\y2D\\Azure Content Testing\\Example-App-master\\Example-App-master\\Notebooks', 'creation_context': None, 'serialize': <msrest.serialization.Serializer object at 0x000001A8E

- The last step is to route traffic to the deploymennt. We only have 1 deployment (blue), so we can route all traffic to it. We can do this by calling the `endpoint.traffic` method on the endpoint object, and passing in the percentage of traffic that should be routed to the deployment. In this case we will route 100% of the traffic to the deployment.

In [55]:
# blue deployment takes 100 traffic
endpoint.traffic = {"blue": 100}
ml_client.begin_create_or_update(endpoint).result()

KubernetesOnlineEndpoint({'provisioning_state': 'Succeeded', 'scoring_uri': 'http://194.171.191.227:30397/api/v1/endpoint/k8s-endpoint-06190829258926/score', 'openapi_uri': 'http://194.171.191.227:30397/api/v1/endpoint/k8s-endpoint-06190829258926/swagger.json', 'name': 'k8s-endpoint-06190829258926', 'description': 'this is a sample online endpoint', 'tags': {'Type': 'Review Session'}, 'properties': {'createdBy': 'a2230f31-0fda-428d-8c5c-ec79e91a49f5', 'createdAt': '2024-06-19T06:29:52.864258+0000', 'lastModifiedAt': '2024-06-19T06:29:52.864258+0000', 'azureml.onlineendpointid': '/subscriptions/0a94de80-6d3b-49f2-b3e9-ec5818862801/resourcegroups/buas-y2/providers/microsoft.machinelearningservices/workspaces/staff-test/onlineendpoints/k8s-endpoint-06190829258926', 'AzureAsyncOperationUri': 'https://management.azure.com/subscriptions/0a94de80-6d3b-49f2-b3e9-ec5818862801/providers/Microsoft.MachineLearningServices/locations/westeurope/mfeOperationsStatus/oeidp:798f953a-277e-4ed7-90e2-0e1cd

- You can check the status of the deployment by looking at the logs of the deployment with the `ml_client`. You can do this by calling the `get_logs` method on the deployment object.

In [56]:
status = ml_client.online_endpoints.get(name=online_endpoint_name)

print(status)

auth_mode: key
compute: azureml:/subscriptions/0a94de80-6d3b-49f2-b3e9-ec5818862801/resourceGroups/buas-y2/providers/Microsoft.MachineLearningServices/workspaces/Staff-Test/computes/adsai1
description: this is a sample online endpoint
id: /subscriptions/0a94de80-6d3b-49f2-b3e9-ec5818862801/resourceGroups/buas-y2/providers/Microsoft.MachineLearningServices/workspaces/Staff-Test/onlineEndpoints/k8s-endpoint-06190829258926
identity:
  principal_id: 9dd64db7-c70c-422b-8214-db43bca9407e
  tenant_id: 0a33589b-0036-4fe8-a829-3ed0926af886
  type: system_assigned
kind: K8S
location: westeurope
mirror_traffic: {}
name: k8s-endpoint-06190829258926
openapi_uri: http://194.171.191.227:30397/api/v1/endpoint/k8s-endpoint-06190829258926/swagger.json
properties:
  AzureAsyncOperationUri: https://management.azure.com/subscriptions/0a94de80-6d3b-49f2-b3e9-ec5818862801/providers/Microsoft.MachineLearningServices/locations/westeurope/mfeOperationsStatus/oeidp:798f953a-277e-4ed7-90e2-0e1cd6d888eb:2ae2b47d-d

Exception occurred while exporting the data.
Traceback (most recent call last):
  File "C:\Users\deanv\AppData\Roaming\Python\Python310\site-packages\opencensus\ext\azure\trace_exporter\__init__.py", line 228, in emit
    self._transmit_from_storage()
  File "C:\Users\deanv\AppData\Roaming\Python\Python310\site-packages\opencensus\ext\azure\common\transport.py", line 80, in _transmit_from_storage
    for blob in self.storage.gets():
  File "C:\Users\deanv\AppData\Roaming\Python\Python310\site-packages\opencensus\ext\azure\common\storage.py", line 129, in gets
    for name in sorted(os.listdir(self.path)):
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'C:\\Users\\deanv\\AppData\\Local\\Temp\\opencensus-python-71b954a8-6b7d-43f5-986c-3d3a6605d803'
Exception occurred while exporting the data.
Traceback (most recent call last):
  File "C:\Users\deanv\AppData\Roaming\Python\Python310\site-packages\opencensus\ext\azure\trace_exporter\__init__.py", line 228, in em

In [85]:
logs = ml_client.online_deployments.get_logs(
    name="blue", endpoint_name=online_endpoint_name, lines=50
)

print(logs)

'2024-06-16 09:50:37,076 I [326] gunicorn.access - 127.0.0.1 - - [16/Jun/2024:09:50:37 +0000] "GET / HTTP/1.0" 200 7 "-" "kube-probe/1.29"\n2024-06-16 09:50:37,077 I [326] gunicorn.access - 127.0.0.1 - - [16/Jun/2024:09:50:37 +0000] "GET / HTTP/1.0" 200 7 "-" "kube-probe/1.29"\n2024-06-16 09:50:47,075 I [326] gunicorn.access - 127.0.0.1 - - [16/Jun/2024:09:50:47 +0000] "GET / HTTP/1.0" 200 7 "-" "kube-probe/1.29"\n2024-06-16 09:50:47,076 I [326] gunicorn.access - 127.0.0.1 - - [16/Jun/2024:09:50:47 +0000] "GET / HTTP/1.0" 200 7 "-" "kube-probe/1.29"\n2024-06-16 09:50:57,077 I [326] gunicorn.access - 127.0.0.1 - - [16/Jun/2024:09:50:57 +0000] "GET / HTTP/1.0" 200 7 "-" "kube-probe/1.29"\n2024-06-16 09:50:57,083 I [326] gunicorn.access - 127.0.0.1 - - [16/Jun/2024:09:50:57 +0000] "GET / HTTP/1.0" 200 7 "-" "kube-probe/1.29"\n2024-06-16 09:51:07,075 I [326] gunicorn.access - 127.0.0.1 - - [16/Jun/2024:09:51:07 +0000] "GET / HTTP/1.0" 200 7 "-" "kube-probe/1.29"\n2024-06-16 09:51:07,076 I 

In [88]:
# Get the details for online endpoint
endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)

# existing traffic details
print(endpoint.traffic)

# Get the scoring URI
print(endpoint.scoring_uri)

{'blue': 100}
http://194.171.191.227:30397/api/v1/endpoint/k8s-endpoint-06161121150181/score


### Testing the Cloud Endpoint and Blue Deployment🧪

The scoring script for the cloud deployment is the same as the scoring script for the local deployment. It expects a JSON object with a key `data` that contains the data to be scored. The data is a base64 encoded image. 

In [63]:
# load image and encode it in base64
import base64
import json

image_path = "../data/MNIST_44_0.png"
# image_path = "../data/test_image5.png"
with open(image_path, "rb") as image_file:
    base64_image = base64.b64encode(image_file.read()).decode('utf-8')

print(base64_image)

iVBORw0KGgoAAAANSUhEUgAAAO4AAADuCAYAAAA+7jsiAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAABvhJREFUeJzt3Uuojfsfx/G1jktSoqTYaSsKuaSUqZCBy4CiGDEzYSK3qTBQRiIlJSmXlMtEJBlgIreJZGCAMhPawkCsM/uP/uu7tn07+7PX6zX9rGft5xy9z3M6v7PWbrZarQaQ5Z//+gaAvydcCCRcCCRcCCRcCCRcCCRcCCRcCCRcCDT+b17cbDb9b1YwzFqtVrPTazxxIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIZBwIdD4//oGusHkyZPL/cqVK223uXPnltc+fvy43G/cuFHumzdvLvd//mn/z/YnT54M6mf39fWVO+154kIg4UIg4UIg4UIg4UIg4UIg4UKgZqvV6v+Lm83+v5j/mTZtWrl//vx5hO5kZN29e7fct27dWu4/fvwYytuJ0Wq1mp1e44kLgYQLgYQLgYQLgYQLgYQLgRwHjYBms/6v+4sXL267PXz4sLy201HTaHb48OFyP3LkyMjcyCjjOAjGKOFCIOFCIOFCIOFCIOFCIOFCIF/POgKWLl1a7ufPn2+7Dfac9sGDB+U+Z86ccp83b96gfn5l3Lhxw/beY50nLgQSLgQSLgQSLgQSLgQSLgQSLgRyjjsEJk2aVO6dPle6fPnyAf/skydPlvuBAwfKfdasWeVefR640xkww8cTFwIJFwIJFwIJFwIJFwIJFwIJFwI5xx0Cq1evLvdNmzaVe/Xd

If you open the deployment in AzureML studio, you can see the status of the deployment, and the logs of the deployment. You can also see the traffic that is being routed to the deployment. If you click on the `consume` tab, you can see the code that you need to use to send data to the endpoint. You can also see the response that you get back from the endpoint.

In [2]:
import urllib.request
import json
import os
import ssl

def allowSelfSignedHttps(allowed):
    # bypass the server certificate verification on client side
    if allowed and not os.environ.get('PYTHONHTTPSVERIFY', '') and getattr(ssl, '_create_unverified_context', None):
        ssl._create_default_https_context = ssl._create_unverified_context

allowSelfSignedHttps(True) # this line is needed if you use self-signed certificate in your scoring service.

# Request data goes here
# The example below assumes JSON formatting which may be updated
# depending on the format your endpoint expects.
# More information can be found here:
# https://docs.microsoft.com/azure/machine-learning/how-to-deploy-advanced-entry-script
data = {
    "text": "hello there"
}

body = str.encode(json.dumps(data))

#url = 'http://194.171.191.227:30397/api/v1/endpoint/k8s-endpoint-06161121150181/score'
# url = 'https://fb77-194-171-191-227.ngrok-free.app/api/v1/endpoint/k8s-endpoint-06190829258926/score'
# url = 'http://194.171.191.227:3092/api/v1/endpoint/test-endpoint-06191104713962/score'
url = "http://194.171.191.227:30526/api/v1/endpoint/deberta-endpoint/score"

# url = 'https://fb77-194-171-191-227.ngrok-free.app/api/v1/endpoint/test-endpoint-06191104713962/score'
# Replace this with the primary/secondary key, AMLToken, or Microsoft Entra ID token for the endpoint
api_key = '5p9ggpJc34NY3FH65JmJgIOiaWsBpre2cBH4r38EnmHsCKt0iuAmJQQJ99BFAAAAAAAAAAAAINFRAZML3sA3'
if not api_key:
    raise Exception("A key should be provided to invoke the endpoint")

# The azureml-model-deployment header will force the request to go to a specific deployment.
# Remove this header to have the request observe the endpoint traffic rules
headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key), 'azureml-model-deployment': 'blue' }

req = urllib.request.Request(url, body, headers)

try:
    response = urllib.request.urlopen(req)

    result = response.read()
    print(result)
except urllib.error.HTTPError as error:
    print("The request failed with status code: " + str(error.code))

    # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
    print(error.info())
    print(error.read().decode("utf8", 'ignore'))

URLError: <urlopen error [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond>

# YOU NEED TO INSTALL RUNIT!!!, or use an azure image