# Creating a Real-Time Inferencing Service with AKS

This is very similar to deploying to ACI, but we want to use an AKS inferencing cluster which will allow us to have more control and scale.  

I'll point out the specific differences, otherwise the code can be considered to be the same

In [1]:
import azureml.core
from azureml.core import Workspace
from azureml.core import Model

# Load the workspace from the saved config file
ws = Workspace.from_config()
print('Ready to use Azure ML {} to work with {}'.format(azureml.core.VERSION, ws.name))

Ready to use Azure ML 1.27.0 to work with mlops


In [2]:
# Set the folder for the experiment files used
training_folder = 'driver-training'

In [3]:
# Get the latest registered version of the model
model = ws.models['driver_model']
print(model.name, 'version', model.version)

driver_model version 3


In [4]:
# you need to figure out where you want to put the service folder
# ideally this should be at the same level as the driver-training folder
!pwd

/mnt/batch/tasks/shared/LS_root/mounts/clusters/davew202105/code/git/MLOps-E2E-sdkv2/Lab12


In [7]:
# we will recycle the same inferencing code, so service_path should already exist and work
# we won't rebuild those files.  
import os

# changeme
service_path = 'inf-svc'

In [8]:
# we should have score.py and driver_env.yml in this folder
# we will use those next
!ls $service_path

driver_env.yml	score.py


## Create/Attach to AKS AMLS Compute Target

We want to create or attach to the AMLS inferencing cluster.  Mine is called `infer-cluster`, but you can call it whatever you want.  Your team can share an inferencing cluster but the service names will need to be unique for each team member.  


In [9]:
from azureml.core.compute import AmlCompute, ComputeTarget
from azureml.core.compute_target import ComputeTargetException

aml_compute_target = 'infer-cluster'

try:
    aml_compute = AmlCompute(ws, aml_compute_target)
    print("found existing compute target.")
except ComputeTargetException:
    print("creating new compute target")
    
    provisioning_config = AmlCompute.provisioning_configuration(vm_size = "STANDARD_D2_V2",
                                                                min_nodes = 1, 
                                                                max_nodes = 1)    
    aml_compute = ComputeTarget.create(ws, args.aml_compute_target, provisioning_config)
    aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
    
print("Aml Compute attached")


found existing compute target.
Aml Compute attached


## Scoring/Inferencing Configuration

* Now we need to configure the scoring environment.  
* You can view the status of the deployment in the AMLS portal under `Endpoints`.  
* the first deployment always takes a little while.  

We are going to deploy to AKS such that if the webservice exists it will "drop and recreate".  This may not always be what you want to do.  You have options.

Deployment will take some time as it first runs a process to create a container image, and then runs a process to create a web service based on the image. When deployment has completed successfully, you'll see a status of **Healthy**.

In [11]:
from azureml.core.compute import AksCompute, ComputeTarget
from azureml.core.webservice import AksWebservice, Webservice
from azureml.core.model import Model
from azureml.core.model import InferenceConfig

service_name = "driver-service-sdk2-aks"

# Configure the scoring environment
inference_config = InferenceConfig(runtime= "python",
                                   source_directory = service_path,
                                   entry_script="score.py",
                                   conda_file="driver_env.yml")

deployment_config = AksWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)

service = Model.deploy(ws, service_name, [model], inference_config, deployment_config, aml_compute)

service.wait_for_deployment(show_output = True)
print(service.state)
print(service.get_logs())


Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-06-01 16:15:36+00:00 Creating Container Registry if not exists.
2021-06-01 16:15:37+00:00 Registering the environment.
2021-06-01 16:15:38+00:00 Use the existing image.
2021-06-01 16:15:41+00:00 Creating resources in AKS..
2021-06-01 16:15:41+00:00 Submitting deployment to compute.
2021-06-01 16:15:42+00:00 Checking the status of deployment driver-service-sdk2-aks..
2021-06-01 16:17:25+00:00 Checking the status of inference endpoint driver-service-sdk2-aks.
Succeeded
AKS service creation operation finished, operation "Succeeded"
Healthy
2021-06-01T16:17:14,437054043+00:00 - iot-server/run 
2021-06-01T16:17:14,438284041+00:00 - rsyslog/run 
2021-06-01T16:17:14,439061304+00:00 - gunicorn/run 
2021-06-01T16:17:14,448339048+00:00 - nginx/run 
/usr/sbin/nginx: /azureml-envs/azureml_deade6e796a908fdec1

Hopefully, the deployment has been successful and you can see a status of **Healthy**. If not, you can use the following code to check the status and get the service logs to help you troubleshoot.

Take a look at your workspace in [Azure ML Studio](https://ml.azure.com) and view the **Endpoints** page, which shows the deployed services in your workspace.

In [21]:
# if you need to make a change and redeploy you may need to delete the old, unhealthy service.  Use the following code:
# service.delete()
# but don't delete the service yet after you get a good deployment.  We will use test the service next.  

In [12]:
# print the webservice endpoints
for webservice_name in ws.webservices:
    print(webservice_name)

driver-service-sdk2-aks
driver-service-sdk2
driver-service
bikebuyeraciws
ar-factoring-2class
predict-attrition-svc1
support-ticket-duration
bikebuyer2-aks-service
bikebuyer-aci
compliance-classifier-service
bb-aks-service


## Use the Web Service

With the service deployed, now you can consume it from a client application.  
Using AKS requires REST requests to include an **Authorization** header.  Let's do all of that

In [14]:
endpoint = service.scoring_uri
print(endpoint)

http://104.41.151.19:80/api/v1/service/driver-service-sdk2-aks/score


In [17]:
import urllib.request
import json
import os
import ssl

def allowSelfSignedHttps(allowed):
    # bypass the server certificate verification on client side
    if allowed and not os.environ.get('PYTHONHTTPSVERIFY', '') and getattr(ssl, '_create_unverified_context', None):
        ssl._create_default_https_context = ssl._create_unverified_context

allowSelfSignedHttps(True) # this line is needed if you use self-signed certificate in your scoring service.

# Request data goes here
data = {"data":
    [
        [0,1,8,1,0,0,1,0,0,0,0,0,0,0,12,1,0,0,0.5,0.3,0.610327781,7,1,-1,0,-1,1,1,1,2,1,65,1,0.316227766,0.669556409,0.352136337,3.464101615,0.1,0.8,0.6,1,1,6,3,6,2,9,1,1,1,12,0,1,1,0,0,1],
        [4,2,5,1,0,0,0,0,1,0,0,0,0,0,5,1,0,0,0.9,0.5,0.771362431,4,1,-1,0,0,11,1,1,0,1,103,1,0.316227766,0.60632002,0.358329457,2.828427125,0.4,0.5,0.4,3,3,8,4,10,2,7,2,0,3,10,0,0,1,1,0,1]
    ]
}

body = str.encode(json.dumps(data))

api_key = '4LBGhoC7fHRd7DLvIowFgP58mfwzbEtv' # Replace this with the API key for the web service
headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)}

req = urllib.request.Request(endpoint, body, headers)

try:
    response = urllib.request.urlopen(req)

    result = response.read()
    print(result)
except urllib.error.HTTPError as error:
    print("The request failed with status code: " + str(error.code))

    # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
    print(error.info())
    print(json.loads(error.read().decode("utf8", 'ignore')))

b'{"result": [0.027311045841034335, 0.0261231327307869]}'


## Delete the Service

When you no longer need your service, you should delete it to avoid incurring unecessary charges.

In [None]:
service.delete()
print ('Service deleted.')

For more information about publishing a model as a service, see the [documentation](https://docs.microsoft.com/azure/machine-learning/how-to-deploy-and-where)