**<center><h1>Introduction</h1></center>**

In machine learning, inferencing refers to the use of a trained model to predict labels for new data on which the model has not been trained. Often, the model is deployed as part of a service that enables applications to request immediate, or real-time, predictions for individual, or small numbers of data observations.

<img src = "images/07-01-real-time.jpeg" />

In Azure Machine Learning, you can create real-time inferencing solutions by deploying a model as a service, hosted in a containerized platform, such as Azure Kubernetes Services (AKS).

**<h2>Learning objectives</h2>**

In this module, you will learn how to:

- Deploy a model as a real-time inferencing service.
- Consume a real-time inferencing service.
- Troubleshoot service deployment
<hr>

**<center><h1>Deploy a model as a real-time service</h1></center>**

ou can deploy a model as a real-time web service to several kinds of compute target, including local compute, an Azure Machine Learning compute instance, an Azure Container Instance (ACI), an Azure Kubernetes Service (AKS) cluster, an Azure Function, or an Internet of Things (IoT) module. Azure Machine Learning uses containers as a deployment mechanism, packaging the model and the code to use it as an image that can be deployed to a container in your chosen compute target.

<mark>Note: Deployment to a local service, a compute instance, or an ACI is a good choice for testing and development. For production, you should deploy to a target that meets the specific performance, scalability, and security needs of your application architecture.</mark>

To deploy a model as a real-time inferencing service, you must perform the following tasks:

**<h2>1. Register a trained model</h2>**

After successfully training a model, you must register it in your Azure Machine Learning workspace. Your real-time service will then be able to load the model when required.

To register a model from a local file, you can use the **register** method of the **Model** object as shown here:
```
# Pytyhon
from azureml.core import Model

classification_model = Model.register(workspace=ws,
                       model_name='classification_model',
                       model_path='model.pkl', # local path
                       description='A classification model')
```
Alternatively, if you have a reference to the **Run** used to train the model, you can use its **register_model** method as shown here:
```
# Pytyhon
run.register_model( model_name='classification_model',
                    model_path='outputs/model.pkl', # run outputs path
                    description='A classification model')
```
**<h2>2. Define an inference configuration</h2>**

The model will be deployed as a service that consist of:

- A script to load the model and return predictions for submitted data.
- An environment in which the script will be run.

You must therefore define the script and environment for the service.

**<h3>Create an entry script</h3>**

Create the entry script (sometimes referred to as the scoring script) for the service as a Python (.py) file. It must include two functions:

- **init():** Called when the service is initialized.
- **run(raw_data):** Called when new data is submitted to the service.

Typically, you use the init function to load the model from the model registry, and use the run function to generate predictions from the input data. The following example script shows this pattern:
```
# Pytyhon
import json
import joblib
import numpy as np
import os

# Called when the service is loaded
def init():
    global model
    # Get the path to the registered model file and load it
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'model.pkl')
    model = joblib.load(model_path)

# Called when a request is received
def run(raw_data):
    # Get the input data as a numpy array
    data = np.array(json.loads(raw_data)['data'])
    # Get a prediction from the model
    predictions = model.predict(data)
    # Return the predictions as any JSON serializable format
    return predictions.tolist()
```
Save the script in a folder so you can easily identify it later. For example, you might save the script above as score.py in a folder named service_files.

**<h3>Create an environment</h3>**

Your service requires a Python environment in which to run the entry script, which you can define by creating an **Environment** that contains the required packages:
```
# Pytyhon
from azureml.core import Environment

service_env = Environment(name='service-env')
python_packages = ['scikit-learn', 'numpy'] # whatever packages your entry script uses
for package in python_packages:
    service_env.python.conda_dependencies.add_pip_package(package)
```

**<h3>Combine the script and environment in an InferenceConfig</h3>**

After creating the entry script and environment, you can combine them in an **InferenceConfig** for the service like this:
```
# Pytyhon
from azureml.core.model import InferenceConfig

classifier_inference_config = InferenceConfig(source_directory = 'service_files',
                                              entry_script="score.py",
                                              environment=service_env)
```

**<h2>3. Define a deployment configuration</h2>**

Now that you have the entry script and environment, you need to configure the compute to which the service will be deployed. If you are deploying to an AKS cluster, you must create the cluster and a compute target for it before deploying:
```
# Python
from azureml.core.compute import ComputeTarget, AksCompute

cluster_name = 'aks-cluster'
compute_config = AksCompute.provisioning_configuration(location='eastus')
production_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
production_cluster.wait_for_completion(show_output=True)
```
With the compute target created, you can now define the deployment configuration, which sets the **target-specific** compute specification for the containerized deployment:
```
# Python
from azureml.core.webservice import AksWebservice

classifier_deploy_config = AksWebservice.deploy_configuration(cpu_cores = 1,
                                                              memory_gb = 1)
                                    
```
The code to configure an ACI deployment is similar, except that you do not need to explicitly create an ACI compute target, and you must use the **deploy_configuration** class from the **azureml.core.webservice.AciWebservice **namespace. Similarly, you can use the **azureml.core.webservice.LocalWebservice** namespace to configure a local Docker-based service.

<mark>Note: To deploy a model to an Azure Function, you do not need to create a deployment configuration. Instead, you need to package the model based on the type of function trigger you want to use. This functionality is in preview at the time of writing. For more details, see Deploy a machine learning model to Azure Functions in the Azure Machine Learning documentation.</mark>

**<h2>4. Deploy the model</h2>**

After all of the configuration is prepared, you can deploy the model. The easiest way to do this is to call the **deploy** method of the **Model** class, like this:
```
# Python
from azureml.core.model import Model

model = ws.models['classification_model']
service = Model.deploy(workspace=ws,
                       name = 'classifier-service',
                       models = [model],
                       inference_config = classifier_inference_config,
                       deployment_config = classifier_deploy_config,
                       deployment_target = production_cluster)
service.wait_for_deployment(show_output = True)
```

For ACI or local services, you can omit the **deployment_target** parameter (or set it to **None**).

<hr>



**<center><h1>Consume a real-time inferencing service</h1></center>**

After deploying a real-time service, you can consume it from client applications to predict labels for new data cases.

**<h2>Use the Azure Machine Learning SDK</h2>**

For testing, you can use the Azure Machine Learning SDK to call a web service through the **run** method of a **WebService** object that references the deployed service. Typically, you send data to the **run** method in JSON format with the following structure:


```
# JSON
{
  "data":[
      [0.1,2.3,4.1,2.0], // 1st case
      [0.2,1.8,3.9,2.1],  // 2nd case,
      ...
  ]
}
```
The response from the **run** method is a JSON collection with a prediction for each case that was submitted in the data. The following code sample calls a service and displays the response:

```
# Python
import json

# An array of new data cases
x_new = [[0.1,2.3,4.1,2.0],
         [0.2,1.8,3.9,2.1]]

# Convert the array to a serializable list in a JSON document
json_data = json.dumps({"data": x_new})

# Call the web service, passing the input data
response = service.run(input_data = json_data)

# Get the predictions
predictions = json.loads(response)

# Print the predicted class for each case.
for i in range(len(x_new)):
    print (x_new[i], predictions[i])
```

**<h2>Use a REST endpoint</h2>**

In production, most client applications will not include the Azure Machine Learning SDK, and will consume the service through its REST interface. You can determine the endpoint of a deployed service in Azure Machine Learning studio, or by retrieving the **scoring_uri** property of the **Webservice** object in the SDK, like this:
```
# Python
endpoint = service.scoring_uri
print(endpoint)
```
With the endpoint known, you can use an HTTP POST request with JSON data to call the service. The following example shows how to do this using Python:
```
# Python
import requests
import json

# An array of new data cases
x_new = [[0.1,2.3,4.1,2.0],
         [0.2,1.8,3.9,2.1]]

# Convert the array to a serializable list in a JSON document
json_data = json.dumps({"data": x_new})

# Set the content type in the request headers
request_headers = { 'Content-Type':'application/json' }

# Call the service
response = requests.post(url = endpoint,
                         data = json_data,
                         headers = request_headers)

# Get the predictions from the JSON response
predictions = json.loads(response.json())

# Print the predicted class for each case.
for i in range(len(x_new)):
    print (x_new[i]), predictions[i] )
```

**<h2>Authentication</h2>**

In production, you will likely want to restrict access to your services by applying authentication. There are two kinds of authentication you can use:

 - **Key:** Requests are authenticated by specifying the key associated with the service.
- **Token:** Requests are authenticated by providing a JSON Web Token (JWT).
By default, authentication is disabled for ACI services, and set to key-based authentication for AKS services (for which primary and secondary keys are automatically generated). You can optionally configure an AKS service to use token-based authentication (which is not supported for ACI services).

Assuming you have an authenticated session established with the workspace, you can retrieve the keys for a service by using the **get_keys** method of the **WebService** object associated with the service:
```
# Python
 primary_key, secondary_key = service.get_keys()
```
For token-based authentication, your client application needs to use service-principal authentication to verify its identity through Azure Active Directory (Azure AD) and call the get_token method of the service to retrieve a time-limited token.

To make an authenticated call to the service's REST endpoint, you must include the key or token in the request header like this:

```
# Python
import requests
import json

# An array of new data cases
x_new = [[0.1,2.3,4.1,2.0],
         [0.2,1.8,3.9,2.1]]

# Convert the array to a serializable list in a JSON document
json_data = json.dumps({"data": x_new})

# Set the content type in the request headers
request_headers = { "Content-Type":"application/json",
                    "Authorization":"Bearer " + key_or_token }

# Call the service
response = requests.post(url = endpoint,
                         data = json_data,
                         headers = request_headers)

# Get the predictions from the JSON response
predictions = json.loads(response.json())

# Print the predicted class for each case.
for i in range(len(x_new)):
    print (x_new[i]), predictions[i] )
```

***<center><h1>Troubleshoot service deployment</h1></center>**

There are a lot of elements to a real-time service deployment, including the trained model, the runtime environment configuration, the scoring script, the container image, and the container host. Troubleshooting a failed deployment or an error when consuming a deployed service can be complex.

**<h2>Check the service state</h2>**

As an initial troubleshooting step, you can check the status of a service by examining its **state**:

```
#Python
from azureml.core.webservice import AksWebservice

# Get the deployed service
service = AksWebservice(name='classifier-service', workspace=ws)

# Check its state
print(service.state)
```

<mark>Note: To view the state of a service, you must use the compute-specific service type (for example AksWebservice) and not a generic WebService object.</mark>

For an operational service, the state should be Healthy.

**<h2>Review service logs</h2>**
If a service is not healthy, or you are experiencing errors when using it, you can review its logs:
```
#Python
print(service.get_logs())
```

The logs include detailed information about the provisioning of the service, and the requests it has processed. They can often provide an insight into the cause of unexpected errors.

**<h2>Deploy to a local container</h2>**

Deployment and runtime errors can be easier to diagnose by deploying the service as a container in a local Docker instance, like this:
```
#Python
from azureml.core.webservice import LocalWebservice

deployment_config = LocalWebservice.deploy_configuration(port=8890)
service = Model.deploy(ws, 'test-svc', [model], inference_config, deployment_config)
```
You can then test the locally deployed service using the SDK:
```
# Python
print(service.run(input_data = json_data))
```
You can then troubleshoot runtime issues by making changes to the scoring file that is referenced in the inference configuration, and reloading the service without redeploying it (something you can only do with a local service):
```
#Python
service.reload()
print(service.run(input_data = json_data))
```

**<center><h1>Exercise - Deploy a model as a real-time service</h1></center>**

Now it's your chance to use Azure Machine Learning to deploy a machine learning model as a real-time service.

In this exercise, you will:

- Train and register a model.
- Deploy the model as a real-time service.
- Consume the deployed service.


**<h2>Instructions</h2>**

Follow these instructions to complete the exercise.

1. If you do not already have an Azure subscription, sign up for a free trial at https://azure.microsoft.com.
2. View the exercise repo at https://aka.ms/mslearn-dp100.
3. If you have not already done so, complete the **Create an Azure Machine Learning workspace** exercise to provision an Azure Machine Learning workspace, create a compute instance, and clone the required files.
4. Complete the **Create a real-time inference** exercise.
