##  Deploy real-time machine learning services with Azure Machine Learning

### Pipeline

_Inferencing_ refers to the ability of a trained model to predict labels fornew data which the model has not been trained. The model is deployed as part of a service which enables applications to request some kind of inference from a small number of data observations.

Azure Kubernetes Services (AKS) can be used to deploy the model as a service.

AzureML uses containers as deployment mechanism by packaging the model and the code to use it as an image taht can be deployed to a container in your chosen compute target. For testing and development is ok to use your local machine, a compute instances or Azure Container Instances (ACI) but for production, you need to use some compute which meets some requirements related to scalability, performance and security.

In order to do so, you should follow these tasks:


1. After successfully training a model, it should be registered because the real-time inference service will load it when required.

<code>
from azureml.core import Model

classification_model = Model.register(workpace = ws,
                                      model_name = 'classification_model',
                                      model_path = 'model.pkl', # local path
                                      decsription = 'A classification model')
</code>

Or if you have a reference to the Run used to train the modely, you can exploit register_model:
<code>
run.register_model( model_name='classification_model',
                    model_path='outputs/model.pkl', # run outputs path
                    description='A classification model')
</code>


2. The model will be deployed as a service that consists of a script which load the model and return the predictions along with the environment. So, both of them must be defined. As regards the script, create an _entry script_ as a Python file which must include _init()_, for the initialization, and _run(raw_data)_, called when new data are submitted to the service. Typically, _init_ take the model from the model registry, while _run_ generate the predictions through the model.

<code>
import json
import joblib
import numpy as np
import os


def init():
    global model
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'model.pkl')
    model = joblib.load(model_path)

def run(raw_data):
    data = np.array(json.loads(raw_data)['data'])
    predictions = model.predict(data)
    return predictions.tolist()

</code>

Let's create an environment in order to run the script.

<code>
from azureml.core import Environment

service_env = Environment(name='service-env')
python_packages = ['scikit-learn', 'numpy'] # whatever packages your entry script uses
for package in python_packages:
    service_env.python.conda_dependencies.add_pip_package(package)
</code>

Finally, you must combine the Script and the environment by means of InferenceConfig object.

<code>
from azureml.core.model import InferenceConfig

classifier_inference_config = InferenceConfig(source_directory = 'service_files',
                                              entry_script="score.py",
                                              environment=service_env)
</code>


3. Define the compute to which the service will be deployed. If an AKS cluster is used, it must be created before the deployment.

<code>
from azureml.core.compute import ComputeTarget, AksCompute

cluster_name = 'aks-cluster'
compute_config = AksCompute.provisioning_configuration(location='eastus')
production_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
production_cluster.wait_for_completion(show_output=True)
</code>

Then, the configuration should be set in the following way:
<code>
from azureml.core.webservices import AksWebservice
classifier_deploy_config = AksWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)
</code>

Moreover, it can be done in a similar way for the Azure Instances and for local services.

4. _Deployment_. Let's implement the following line of codes.

<code>
from azureml.core.model import Model

model = ws.models['classification_model']
service = Model.deploy(workspace=ws,
                       name = 'classifier-service',
                       models = [model],
                       inference_config = classifier_inference_config,
                       deployment_config = classifier_deploy_config,
                       deployment_target = production_cluster)
service.wait_for_deployment(show_output = True)
</code>

### Consume the service

After deploying the service, it can be consumed in the following way:

<code>
import json

x_new = [[0.1,2.3,4.1,2.0],
         [0.2,1.8,3.9,2.1]]

json_data = json.dumps({'data': x_new})

response = service.run(input_data = json_data)

predictions = json.loads(response)

for i in range(len(x_new)):
    print (x_new[i], predictions[i])
</code>

If the client doesn't support SDK, a simple REST request can be done in the followinf way:

<code>
import requests
import json

x_new = [[0.1,2.3,4.1,2.0],
         [0.2,1.8,3.9,2.1]]

json_data = json.dumps({"data": x_new})

request_headers = { 'Content-Type':'application/json' }

response = requests.post(url = endpoint,
                         data = json_data,
                         headers = request_headers)

predictions = json.loads(response.json())

for i in range(len(x_new)):
    print (x_new[i]), predictions[i] )
</code>

### Authentication

In production you want to restrict the request to your services by means of _key_ or _token_.

By default, authentication is disabled for ACI services and it is key-based for AKS. ou can optionally configure an AKS service to use token-based authentication (which is not supported for ACI services).

If you are interested in an authentication ny means of keys, let's retrieve them through:

<code>
primary_key, secondary_key = service.get_keys()
</code>

Otherwise, if you'd rather tokens, use _get_tokens_.

Finally, to make an authenticated call to the service's REST endpoint, run the following code:

<code>
import requests
import json

x_new = [[0.1,2.3,4.1,2.0],
         [0.2,1.8,3.9,2.1]]

json_data = json.dumps({"data": x_new})

request_headers = { "Content-Type":"application/json",
                    "Authorization":"Bearer " + key_or_token }

response = requests.post(url = endpoint,
                         data = json_data,
                         headers = request_headers)

predictions = json.loads(response.json())

for i in range(len(x_new)):
    print (x_new[i]), predictions[i] )
</code>

### Troubleshooting

Examine the state of the service:

<code>
from azureml.core.webservices import AksWebservice

service = AksWebservice(name = 'classifier-device', workspace = ws)
print(service.state)
</code>

If everythoing is ok, the status should be Healthy.

You can also print some logs:
<code>
print(service.get_logs())
</code>

Moreover, sometimes debugging from a local container can be simpler:
<code>
from azureml.core.webservice import LocalWebservice

deployment_config = LocalWebservice.deploy_configuration(port=8890)
service = Model.deploy(ws, 'test-svc', [model], inference_config, deployment_config)
print(service.run(input_data = json_data))
</code>

Then, during the debuggind phase, your code will change some times. So, it will be reloaded:
<code>
service.reload()
print(service.run(input_data = json_data))
</code>