Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Deploying a web service to Azure Kubernetes Service (AKS)
In this notebook, we show the following steps for deploying a web service using AzureML:
- Provision an AKS cluster (one time action)
- Deploy the service
- Test the web service
- Scale up the service

In [None]:
import json
import os
import subprocess

import numpy as np
import pandas as pd
import requests
from MetricsUtils.hpStatisticsCollection import statisticsCollector, CollectionEntry
from azure_utils.machine_learning.utils import load_configuration
from azure_utils.utilities import text_to_json
from azureml.core.compute import AksCompute, ComputeTarget
from azureml.core.webservice import Webservice, AksWebservice
from azure_utils.machine_learning.utils import get_workspace_from_config


In [None]:
AML will use the following information to create an image, provision a cluster and deploy a service. Replace the 
values in the following cell with your information.

In [None]:
cfg = load_configuration("../workspace_conf.yml")

In [None]:
image_name = cfg['image_name']
aks_service_name = cfg['aks_service_name']
aks_name = cfg['aks_name']
aks_location = cfg['workspace_region']
storageConnString = cfg['storageConnString']


In [None]:
## Get workspace
Load existing workspace from the config file.

ws = get_workspace_from_config()
print(ws.name, ws.resource_group, ws.location, sep="\n")

In [None]:
image = ws.images[image_name]

Restore the statistics data.

In [None]:
statisticsCollector.hydrateFromStorage(storageConnString)

In [None]:
## Provision the AKS Cluster
This is a one time setup. You can reuse this cluster for multiple deployments after it has been created. If you delete 
the cluster or the resource group that contains it, then you would have to recreate it. Let's first check if there are 
enough cores in the subscription for the cluster .

In [None]:
vm_family = "Dv2"
vm_size = "Standard_D4_v2"
vm_cores = 8
node_count = 4

In [None]:
vm_dict = {
    vm_family: {
        "size": vm_size,
        "cores": vm_cores
    }
}

In [None]:
requested_cores = node_count * vm_dict[vm_family]["cores"]

In [None]:
results = subprocess.run([
    "az", "vm", "list-usage", 
    "--location", aks_location, 
    "--query", "[?contains(localName, '%s')].{max:limit, current:currentValue}" % (vm_family)
], stdout=subprocess.PIPE)
quota = json.loads(''.join(results.stdout.decode('utf-8')))
diff = int(quota[0]['max']) - int(quota[0]['current'])

In [None]:
prov_config = AksCompute.provisioning_configuration(
    agent_count=node_count, vm_size=vm_size, location=aks_location
)

# Create the cluster
statisticsCollector.startTask(CollectionEntry.AML_COMPUTE_CREATION)
aks_target = ComputeTarget.create(
    workspace=ws, name=aks_name, provisioning_configuration=prov_config
)
statisticsCollector.endTask(CollectionEntry.AML_COMPUTE_CREATION)
print(statisticsCollector.getEntry(CollectionEntry.AML_COMPUTE_CREATION))

%%time
aks_target.wait_for_completion(show_output = True)
print(aks_target.provisioning_state)
print(aks_target.provisioning_errors)

In [None]:
Let's check that the cluster is created successfully.

In [None]:
aks_status = aks_target.get_status()

assert aks_status == 'Succeeded', 'AKS failed to create'

## Deploy web service to AKS

In [None]:
Next, we deploy the web service. We deploy two pods with 1 CPU core each.

In [None]:
num_replicas = 2
cpu_cores = 1

In [None]:
#Set the web service configuration 
aks_config = AksWebservice.deploy_configuration(num_replicas=num_replicas, cpu_cores=cpu_cores)

In [None]:
aks_service = Webservice.deploy_from_image(
    workspace=ws,
    name=aks_service_name,
    image=image,
    deployment_config=aks_config,
    deployment_target=aks_target,
)

%%time
aks_service.wait_for_deployment(show_output=True)
print(aks_service.state)

In [None]:
You can check the logs of the web service with the below.

aks_service.get_logs()

In [None]:
## Test the web service
We now test the web service.

In [None]:
num_dupes_to_score = 4

In [None]:
dupes_test_path = './data_folder/dupes_test.tsv'
dupes_test = pd.read_csv(dupes_test_path, sep='\t', encoding='latin1')
text_to_score = dupes_test.iloc[0, num_dupes_to_score]
text_to_score

In [None]:
json_text = text_to_json(text_to_score)

%%time
prediction = aks_service.run(input_data = json_text)
print(prediction)

In [None]:
Let's try a few more duplicate questions and display their top 3 original matches. Let's first get the scoring URL 
and API key for the web service.

scoring_url = aks_service.scoring_uri
api_key = aks_service.get_keys()[0]

In [None]:
Write the URI and key to the statistics tracker.

In [None]:
statisticsCollector.addEntry(CollectionEntry.AKS_REALTIME_ENDPOINT, scoring_url)
statisticsCollector.addEntry(CollectionEntry.AKS_REALTIME_KEY, api_key)

In [None]:
headers = {'content-type': 'application/json', 'Authorization':('Bearer '+ api_key)}
r = requests.post(scoring_url, data=json_text, headers=headers) # Run the request twice since the first time takes a 
%time r = requests.post(scoring_url, data=jsontext, headers=headers) # little longer due to the loading of the model
print(r)
r.json()

In [None]:
results = [
    requests.post(scoring_url, data=text_to_json(text), headers=headers)
    for text in dupes_to_score
]

Let's print top 3 matches for each duplicate question.

In [None]:
[eval(results[i].json())[0:3] for i in range(0, len(results))]

Next let's quickly check what the request response performance is for the deployed model on AKS cluster.

In [None]:
text_data = list(map(text_to_json, dupes_to_score))  # Retrieve the text data

In [None]:
timer_results = list()
for text in text_data:
    res=%timeit -r 1 -o -q requests.post(scoring_url, data=text, headers=headers)
    timer_results.append(res.best)

In [None]:
timer_results

In [None]:
print("Average time taken: {0:4.2f} ms".format(10 ** 3 * np.mean(timer_results)))

## Scaling

In this part, we scale the number of pods to make sure we fully utilize the AKS cluster. To connect to the Kubernetes 
cluster, we will use kubectl, the Kubernetes command-line client. To install, run the following:

In [None]:
!sudo az aks install-cli

Next, we will get the credentials to connect to the cluster.

In [None]:
os.makedirs(os.path.join(os.path.expanduser('~'),'.kube'), exist_ok=True) 

In [None]:
config_path = os.path.join(os.path.expanduser('~'),'.kube/config')

In [None]:
with open(config_path, 'a') as f:
    f.write(aks_target.get_credentials()['userKubeConfig'])

Let's check the nodes and pods of the cluster.

In [None]:
!kubectl get nodes

In [None]:
!kubectl get pods --all-namespaces

In [None]:
!kubectl get events

We can now scale up the number of pods.

In [None]:
new_num_replicas = 10

In [None]:
!kubectl get namespaces

In [None]:
!kubectl scale --current-replicas=$num_replicas \
    --replicas=$new_num_replicas {"deployments/" + aks_service_name} \
    --namespace azureml-workspace

In [None]:
!kubectl get pods --all-namespaces

In [None]:
!kubectl get deployment

Save the statistics collected so far.

In [None]:
statisticsCollector.uploadContent(storageConnString)

Next, we will test the [throughput of the web service](06_SpeedTestWebApp.ipynb).