##  Deploy real-time machine learning services with Azure Machine Learning

### Pipeline

_Inferencing_ refers to the ability of a trained model to predict labels fornew data which the model has not been trained. The model is deployed as part of a service which enables applications to request some kind of inference from a small number of data observations.

Azure Kubernetes Services (AKS) can be used to deploy the model as a service.

AzureML uses containers as deployment mechanism by packaging the model and the code to use it as an image taht can be deployed to a container in your chosen compute target. For testing and development is ok to use your local machine, a compute instances or Azure Container Instances (ACI) but for production, you need to use some compute which meets some requirements related to scalability, performance and security.

In order to do so, you should follow these tasks:


1. After successfully training a model, it should be registered because the real-time inference service will load it when required.

<code>
from azureml.core import Model

classification_model = Model.register(workpace = ws,
                                      model_name = 'classification_model',
                                      model_path = 'model.pkl', # local path
                                      decsription = 'A classification model')
</code>

Or if you have a reference to the Run used to train the modely, you can exploit register_model:
<code>
run.register_model( model_name='classification_model',
                    model_path='outputs/model.pkl', # run outputs path
                    description='A classification model')
</code>


2. The model will be deployed as a service that consists of a script which load the model and return the predictions along with the environment. So, both of them must be defined. As regards the script, create an _entry script_ as a Python file which must include _init()_, for the initialization, and _run(raw_data)_, called when new data are submitted to the service. Typically, _init_ take the model from the model registry, while _run_ generate the predictions through the model.

<code>
import json
import joblib
import numpy as np
import os


def init():
    global model
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'model.pkl')
    model = joblib.load(model_path)

def run(raw_data):
    data = np.array(json.loads(raw_data)['data'])
    predictions = model.predict(data)
    return predictions.tolist()

</code>

Let's create an environment in order to run the script.

<code>
from azureml.core import Environment

service_env = Environment(name='service-env')
python_packages = ['scikit-learn', 'numpy'] # whatever packages your entry script uses
for package in python_packages:
    service_env.python.conda_dependencies.add_pip_package(package)
</code>

Finally, you must combine the Script and the environment by means of InferenceConfig object.

<code>
from azureml.core.model import InferenceConfig

classifier_inference_config = InferenceConfig(source_directory = 'service_files',
                                              entry_script="score.py",
                                              environment=service_env)
</code>


3. Define the compute to which the service will be deployed. If an AKS cluster is used, it must be created before the deployment.

<code>
from azureml.core.compute import ComputeTarget, AksCompute

cluster_name = 'aks-cluster'
compute_config = AksCompute.provisioning_configuration(location='eastus')
production_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
production_cluster.wait_for_completion(show_output=True)
</code>

Then, the configuration should be set in the following way:
<code>
from azureml.core.webservices import AksWebservice
classifier_deploy_config = AksWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)
</code>

Moreover, it can be done in a similar way for the Azure Instances and for local services.

4. _Deployment_. Let's implement the following line of codes.

<code>
from azureml.core.model import Model

model = ws.models['classification_model']
service = Model.deploy(workspace=ws,
                       name = 'classifier-service',
                       models = [model],
                       inference_config = classifier_inference_config,
                       deployment_config = classifier_deploy_config,
                       deployment_target = production_cluster)
service.wait_for_deployment(show_output = True)
</code>

### Consume the service

After deploying the service, it can be consumed in the following way:

<code>
import json

x_new = [[0.1,2.3,4.1,2.0],
         [0.2,1.8,3.9,2.1]]

json_data = json.dumps({'data': x_new})

response = service.run(input_data = json_data)

predictions = json.loads(response)

for i in range(len(x_new)):
    print (x_new[i], predictions[i])
</code>

If the client doesn't support SDK, a simple REST request can be done in the followinf way:

<code>
import requests
import json

x_new = [[0.1,2.3,4.1,2.0],
         [0.2,1.8,3.9,2.1]]

json_data = json.dumps({"data": x_new})

request_headers = { 'Content-Type':'application/json' }

response = requests.post(url = endpoint,
                         data = json_data,
                         headers = request_headers)

predictions = json.loads(response.json())

for i in range(len(x_new)):
    print (x_new[i]), predictions[i] )
</code>

### Authentication

In production you want to restrict the request to your services by means of _key_ or _token_.

By default, authentication is disabled for ACI services and it is key-based for AKS. ou can optionally configure an AKS service to use token-based authentication (which is not supported for ACI services).

If you are interested in an authentication ny means of keys, let's retrieve them through:

<code>
primary_key, secondary_key = service.get_keys()
</code>

Otherwise, if you'd rather tokens, use _get_tokens_.

Finally, to make an authenticated call to the service's REST endpoint, run the following code:

<code>
import requests
import json

x_new = [[0.1,2.3,4.1,2.0],
         [0.2,1.8,3.9,2.1]]

json_data = json.dumps({"data": x_new})

request_headers = { "Content-Type":"application/json",
                    "Authorization":"Bearer " + key_or_token }

response = requests.post(url = endpoint,
                         data = json_data,
                         headers = request_headers)

predictions = json.loads(response.json())

for i in range(len(x_new)):
    print (x_new[i]), predictions[i] )
</code>

### Troubleshooting

Examine the state of the service:

<code>
from azureml.core.webservices import AksWebservice

service = AksWebservice(name = 'classifier-device', workspace = ws)
print(service.state)
</code>

If everythoing is ok, the status should be Healthy.

You can also print some logs:
<code>
print(service.get_logs())
</code>

Moreover, sometimes debugging from a local container can be simpler:
<code>
from azureml.core.webservice import LocalWebservice

deployment_config = LocalWebservice.deploy_configuration(port=8890)
service = Model.deploy(ws, 'test-svc', [model], inference_config, deployment_config)
print(service.run(input_data = json_data))
</code>

Then, during the debuggind phase, your code will change some times. So, it will be reloaded:
<code>
service.reload()
print(service.run(input_data = json_data))
</code>

### Tutorial

In [47]:
from azureml.core import Workspace

ws = Workspace.from_config()

In [49]:
from azureml.core import Environment

env = Environment.get(workspace=ws, name='experiment_env')

1. Train and Register a model.

In [50]:
from azureml.core import Experiment, ScriptRunConfig, Environment
from azureml.core.runconfig import DockerConfiguration

# Create a Python environment for the experiment
env = Environment.from_conda_specification("experiment_env", "environment.yml")

# Create a script config
script_config = ScriptRunConfig(source_directory='Script',
                                script='2_Train.py',
                                environment=env,
                                compute_target = 'ravazzil-compute',
                                docker_runtime_config=DockerConfiguration(use_docker=True))

# Submit the experiment
experiment = Experiment(workspace=ws, name='real-time-service')
run = experiment.submit(config=script_config)
run.wait_for_completion(show_output = True)

RunId: real-time-service_1672761877_d2595d80
Web View: https://ml.azure.com/runs/real-time-service_1672761877_d2595d80?wsid=/subscriptions/d12c1b85-0a70-4232-b483-12d1ffcfc148/resourcegroups/ResourceGroupRavazzi/workspaces/ravazzil-workspace&tid=b00367e2-193a-4f48-94de-7245d45c0947

Streaming user_logs/std_log.txt

  from cryptography.hazmat.backends import default_backend
  import mlflow
Loading Data...
Training a logistic regression model with regularization rate of 0.01
Accuracy: 0.774
AUC: 0.8483441962286681
Cleaning up all outstanding Run operations, waiting 300.0 seconds
1 items cleaning up...
Cleanup took 0.04692721366882324 seconds

Execution Summary
RunId: real-time-service_1672761877_d2595d80
Web View: https://ml.azure.com/runs/real-time-service_1672761877_d2595d80?wsid=/subscriptions/d12c1b85-0a70-4232-b483-12d1ffcfc148/resourcegroups/ResourceGroupRavazzi/workspaces/ravazzil-workspace&tid=b00367e2-193a-4f48-94de-7245d45c0947



{'runId': 'real-time-service_1672761877_d2595d80',
 'target': 'ravazzil-compute',
 'status': 'Completed',
 'startTimeUtc': '2023-01-03T16:04:52.737694Z',
 'endTimeUtc': '2023-01-03T16:05:09.802275Z',
 'services': {},
 'properties': {'_azureml.ComputeTargetType': 'amlctrain',
  'ContentSnapshotId': '9d82149f-4de2-4943-9bba-db47c0c06b42',
  'azureml.git.repository_uri': 'https://github.com/LuciaRavazzi/AzureML.git',
  'mlflow.source.git.repoURL': 'https://github.com/LuciaRavazzi/AzureML.git',
  'azureml.git.branch': 'main',
  'mlflow.source.git.branch': 'main',
  'azureml.git.commit': '9068858ad851f85119c972ff6d638e028207fa25',
  'mlflow.source.git.commit': '9068858ad851f85119c972ff6d638e028207fa25',
  'azureml.git.dirty': 'True',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'script': '2_Train.py',
  'command': '',
  'useAbsolutePath': False,
  'arguments'

In [39]:
run.register_model(model_path = 'outputs/my_diabetes_model.pkl',
                   model_name = 'diabetes_model',
                   tags = {'real-time-services': '1'},
                   properties={'AUC': run.get_metrics()['AUC'], 'Accuracy': run.get_metrics()['Accuracy']})

Model(workspace=Workspace.create(name='ravazzil-workspace', subscription_id='d12c1b85-0a70-4232-b483-12d1ffcfc148', resource_group='ResourceGroupRavazzi'), name=diabetes_model, id=diabetes_model:6, version=6, tags={'real-time-services': '1'}, properties={'AUC': '0.8483441962286681', 'Accuracy': '0.774'})

In [40]:
from azureml.core import Model

for model in Model.list(ws):
    for tag_name in model.tags:
        print(f'Tag: {tag_name}')
    for key, value in model.properties.items():
        print(f'Key: {key}, Value: {value}')

Tag: real-time-services
Key: AUC, Value: 0.8483441962286681
Key: Accuracy, Value: 0.774
Tag: Training context
Key: AUC, Value: 0.8483441962286681
Key: Accuracy, Value: 0.774
Tag: Training context
Key: AUC, Value: 0.8731578685506741
Key: Accuracy, Value: 0.888
Tag: Training context
Key: AUC, Value: 0.8756256530175631
Key: Accuracy, Value: 0.8893333333333333
Tag: Training context
Key: AUC, Value: 0.8726180406985422
Key: Accuracy, Value: 0.8856666666666667
Tag: Training context
Key: AUC, Value: 0.8709040250758743
Key: Accuracy, Value: 0.8853333333333333


In [42]:
# Take the last model.
model = ws.models['diabetes_model']

In [52]:
import os

deployment_folder = './Script/diabetes_service'
os.makedirs(deployment_folder, exist_ok=True)

script_file = 'entry_script.py'
script_path = os.path.join(deployment_folder, script_file)

2. Generate the scoring script for inferencing.

3. Create
The Docker container will host the web service along with all python dependencies which must be specified. Deployment can take some time because first of all, the image must be created and then, the web services must be ran.

In [53]:
from azureml.core import Environment
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice

service_env = Environment.get(workspace=ws, name="AzureML-sklearn-0.24.1-ubuntu18.04-py37-cpu-inference")
service_env.inferencing_stack_version="latest"

inference_config = InferenceConfig(source_directory=deployment_folder,
                                   entry_script=script_file,
                                   environment=service_env)
# Configure the web service container.
deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)

# Deploy the model as a service.
print('Deploying model...')
service_name = 'diabetes-service'
service = Model.deploy(ws, service_name, [model], inference_config, deployment_config, overwrite=True)
service.wait_for_deployment(True)
print(service.state)

Deploying model...


To leverage new model deployment capabilities, AzureML recommends using CLI/SDK v2 to deploy models as online endpoint, 
please refer to respective documentations 
https://docs.microsoft.com/azure/machine-learning/how-to-deploy-managed-online-endpoints /
https://docs.microsoft.com/azure/machine-learning/how-to-attach-kubernetes-anywhere 
For more information on migration, see https://aka.ms/acimoemigration. 
  app.launch_new_instance()


Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2023-01-03 17:10:42+01:00 Creating Container Registry if not exists.
2023-01-03 17:10:42+01:00 Registering the environment.
2023-01-03 17:10:43+01:00 Use the existing image.
2023-01-03 17:10:43+01:00 Generating deployment configuration.
2023-01-03 17:10:44+01:00 Submitting deployment to compute.
2023-01-03 17:10:51+01:00 Checking the status of deployment diabetes-service..
2023-01-03 17:12:13+01:00 Checking the status of inference endpoint diabetes-service.
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy


In [54]:
# For troubleshooting.
print(service.get_logs())

2023-01-03T16:12:04,327037900+00:00 - gunicorn/run 
2023-01-03T16:12:04,329259400+00:00 | gunicorn/run | 
2023-01-03T16:12:04,333536400+00:00 | gunicorn/run | ###############################################
2023-01-03T16:12:04,333697400+00:00 - nginx/run 
2023-01-03T16:12:04,335010200+00:00 | gunicorn/run | AzureML Container Runtime Information
2023-01-03T16:12:04,336521300+00:00 | gunicorn/run | ###############################################
2023-01-03T16:12:04,338344600+00:00 | gunicorn/run | 
2023-01-03T16:12:04,346233900+00:00 | gunicorn/run | 
2023-01-03T16:12:04,352890800+00:00 | gunicorn/run | AzureML image information: sklearn-0.24.1-ubuntu18.04-py37-cpu-inference:20221024.v1
2023-01-03T16:12:04,354272500+00:00 | gunicorn/run | 
2023-01-03T16:12:04,359602500+00:00 | gunicorn/run | 
2023-01-03T16:12:04,360987400+00:00 | gunicorn/run | PATH environment variable: /opt/miniconda/envs/amlenv/bin:/opt/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2023-01

In [55]:
for webservices in ws.webservices:
    print(webservices)

diabetes-service


In [64]:
# Use the web service.

import json

x_new = [[2,180,74,24,21,23.9091702,1.488172308,22]]

input_json = json.dumps({'data': x_new})

predictions = service.run(input_data = input_json)

predicted_classes = json.loads(predictions)
print(predicted_classes)

['not-diabetic']


In [67]:
x_new = [[2,180,74,24,21,23.9091702,1.488172308,22],
         [2,180,74,24,21,23.9091702,1.488172308,22],
         [2,180,74,24,21,23.9091702,1.488172308,22]]

input_json = json.dumps({'data': x_new})

predictions = service.run(input_data = input_json)

predicted_classes = json.loads(predictions)

for i in range(len(x_new)):
    print ("Patient {}".format(x_new[i]), predicted_classes[i] )

Patient [2, 180, 74, 24, 21, 23.9091702, 1.488172308, 22] not-diabetic
Patient [2, 180, 74, 24, 21, 23.9091702, 1.488172308, 22] not-diabetic
Patient [2, 180, 74, 24, 21, 23.9091702, 1.488172308, 22] not-diabetic


In [72]:
import requests

# In production, sometimes an HTTP request is made.
x_new = [[2,180,74,24,21,23.9091702,1.488172308,22],
         [2,180,74,24,21,23.9091702,1.488172308,22],
         [2,180,74,24,21,23.9091702,1.488172308,22]]

input_json = json.dumps({'data': x_new})

headers = { 'Content-Type':'application/json' }
endpoint = service.scoring_uri

predictions = requests.post(endpoint, input_json, headers = headers)

predicted_classes = json.loads(predictions.json())

for i in range(len(x_new)):
    print ("Patient {}".format(x_new[i]), predicted_classes[i] )

Patient [2, 180, 74, 24, 21, 23.9091702, 1.488172308, 22] not-diabetic
Patient [2, 180, 74, 24, 21, 23.9091702, 1.488172308, 22] not-diabetic
Patient [2, 180, 74, 24, 21, 23.9091702, 1.488172308, 22] not-diabetic


In production, you must add a valid authentication.

In [73]:
service.delete()
print ('Service deleted.')

Service deleted.
