# Creating a Real-Time Inferencing Service

You've spent a lot of time in this course training and registering the flight delays machine learning model. Now it's time to deploy the model as a real-time service that clients can use to get predictions from new data.

## Connect to Your Workspace

The first thing you need to do is to connect to your workspace using the Azure ML SDK.

> **Note**: If you do not have a current authenticated session with your Azure subscription, you'll be prompted to authenticate. Follow the instructions to authenticate using the code provided.

In [None]:

import azureml.core
from azureml.core import Workspace

# Load the workspace from the saved config file
ws = Workspace.from_config()
print('Ready to use Azure ML {} to work with {}'.format(azureml.core.VERSION, ws.name))

## Deploy a Model as a Web Service

Now we have trained and registered the machine learning model that classifies flights based on the likelihood of them delaying. 
This model could be used in a production environment such as an airline agency where passengers and scheduled pick-ups are notified about arrival delays. 
To support this scenario, you will deploy the model as a web service.

First, let's determine what models you have registered in the workspace.

In [None]:
from azureml.core import Model

for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

Right, now let's get the model that we want to deploy. By default, if we specify a model name, the latest version will be returned.

In [None]:
model = ws.models['flight_delays_model']
print(model.name, 'version', model.version)

We're going to create a web service to host this model, and this will require some code and configuration files; so let's create a folder for those.

In [None]:
import os

folder_name = 'flight_delays_service'

# Create a folder for the web service files
experiment_folder = './' + folder_name
os.makedirs(folder_name, exist_ok=True)

print(folder_name, 'folder created.')

### Creating the Entry Script

The web service where we deploy the model will need some Python code to load the input data, get the model from the workspace, and generate and return predictions. We'll save this code in an *entry script* that will be deployed to the web service:

In [None]:
%%writefile $folder_name/scoring_script.py
import json
import joblib
import numpy as np
from azureml.core.model import Model

# Called when the service is loaded
def init():
    global model
    # Get the path to the deployed model file and load it
    model_path = Model.get_model_path('flight_delays_model')
    model = joblib.load(model_path)


# Called when a request is received
def run(raw_data):
    # Get the input data as a numpy array
    data = np.array(json.loads(raw_data)['data'])
    # Get a prediction from the model
    predictions = model.predict(data)
    # Get the corresponding classname for each prediction (0 or 1)
    classnames = ['no-delay', 'delay']
    predicted_classes = []
    for prediction in predictions:
        predicted_classes.append(classnames[prediction])
    # Return the predictions as JSON
    return json.dumps(predicted_classes)

The web service will be hosted in a container, and the container will need to install any required Python dependencies when it gets initialized. In this case, our scoring code requires **scikit-learn, matplotlib and seaborn**, so we'll create a .yml file that tells the container host to install this into the environment.

In [None]:
from azureml.core.conda_dependencies import CondaDependencies 

# Add the dependencies for our model (AzureML defaults is already included)
myenv = CondaDependencies()
myenv.add_conda_package("scikit-learn")
myenv.add_pip_package("matplotlib")
myenv.add_pip_package('seaborn')

# Save the environment config as a .yml file
env_file = folder_name + "/flight_delays_env.yml"
with open(env_file,"w") as f:
    f.write(myenv.serialize_to_string())
print("Saved dependency info in", env_file)

# Print the .yml file
with open(env_file,"r") as f:
    print(f.read())

### Provision the AKS Cluster

This is a one time setup. You can reuse this cluster for multiple deployments after it has been created. If you delete the cluster or the resource group that contains it, then you would have to recreate it.


In [None]:
from azureml.core import ComputeTarget, AksCompute

prov_config = AksCompute.provisioning_configuration()

aks_name = 'my-aks-9' 
# Create the cluster
aks_target = ComputeTarget.create(workspace = ws, 
                                  name = aks_name, 
                                  provisioning_configuration = prov_config)

aks_target.wait_for_completion(show_output = True)
print(aks_target.provisioning_state)

Now we're ready to deploy. We'll deploy the container  a service named **flight-delays-service**. The deployment process includes the following steps:

1. Define an inference configuration, which includes the scoring and environment files required to load and use the model.
2. Define a deployment configuration that defines the execution environment in which the service will be hosted. In this case, an Azure Kubernetes Service.
3. Deploy the model as a web service.
4. Verify the status of the deployed service.

> **More Information**: For more details about model deployment, and options for target execution environments, see the [documentation](https://docs.microsoft.com/en-gb/azure/machine-learning/service/how-to-deploy-and-where).


Deployment will take some time as it first runs a process to create a container image, and then runs a process to create a web service based on the image. When deployment has completed successfully, you'll see a status of **Healthy**.

In [None]:
from azureml.core.webservice import AksWebservice
from azureml.core.model import InferenceConfig
from azureml.core import Model


# Configure the scoring environment
inference_config = InferenceConfig(runtime= "python",
                                   source_directory = folder_name,
                                   entry_script="scoring_script.py",
                                   conda_file="flight_delays_env.yml")

deployment_config = AksWebservice.deploy_configuration(cpu_cores = 1, 
                                                        memory_gb = 1, 
                                                        compute_target_name='aks-compute',
                                                        auth_enabled=True,
                                                        autoscale_enabled=True)

service_name = "flight-service"

service = Model.deploy(ws, service_name, [model], inference_config, deployment_config)
service.wait_for_deployment(True)

## Use the Web Service

With the service deployed, now you can consume it from a client application.

In [None]:
import json

# This time our input is an array of two feature arrays
x_new = [[4, 19, 5, 4, 18, 36, 837, -3.0, 1138],
         [3, 76, 6, 4, 19, 36, 837, 98.0, 1234]]

# Convert the array or arrays to a serializable list in a JSON document
input_json = json.dumps({"data": x_new})

# Call the web service, passing the input data
predictions = service.run(input_data = input_json)

# Get the predicted classes.
predicted_classes = json.loads(predictions)
   
for i in range(len(x_new)):
    print ("Flight {}".format(x_new[i]), predicted_classes[i] )

The code above uses the Azure ML SDK to connect to the containerized web service and use it to generate predictions from your flight delays classification model. In production, a model is likely to be consumed by business applications that do not use the Azure ML SDK, but simply make HTTP requests to the web service.

Let's determine the URL to which these applications must submit their requests as well as the keys:

In [None]:
endpoint = service.scoring_uri
primary, secondary = service.get_keys()
print('PrirmaryKey: {} \nSecondayKey: {} \nEndpointUrl: {}'.format(primary,secondary,endpoint))

Now that you know the endpoint URI, an application can simply make an HTTP request, sending the patient data in JSON (or binary) format, and receive back the predicted class(es).

In [None]:
import requests
import json

endpoint = 'put-your-endpoint-url-here'
primary_key = 'put-primary-key-here'

x_new = [[4, 19, 5, 4, 18, 36, 837, -3.0, 1138],
         [3, 76, 6, 4, 19, 36, 837, 98.0, 1234]]

# Convert the array to a serializable list in a JSON document
input_json = json.dumps({"data": x_new})

# Set the content type
request_headers = {"Content-Type": "application/json",
                   "Authorization": "Bearer " + primary_key}

predictions = requests.post(endpoint, input_json, headers=request_headers)

predicted_classes = json.loads(predictions.json())

for i in range(len(x_new)):
    print("Flight {}".format(x_new[i]), predicted_classes[i])
