Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Inferencing with TensorFlow 2.0 on Azure Machine Learning Service

## Overview of Workshop

This notebook is Part 2 (Inferencing and Deploying a Model) of a four part workshop that demonstrates an end-to-end workflow for implementing a BERT model using Tensorflow 2.0 on Azure Machine Learning Service. The different components of the workshop are as follows:

- Part 1: [Working With Data and Training](1_AzureServiceClassifier_Training.ipynb)
- Part 2: [Inferencing and Deploying a Model](2_AzureServiceClassifier_Inferencing.ipynb)
- Part 3: [Setting Up a Pipeline Using MLOps](3_MLOps.md)
- Part 4: [Explaining Your Model Interoperability](4_IBMEmployeeAttritionClassifier_Interpretability.ipynb)

This workshop shows how to convert a TF 2.0 BERT model and deploy the model as Webservice in step-by-step fashion:

 * Initilize your workspace
 * Download a previous saved model (saved on Azure Machine Learning)
 * Test the downloaded model
 * Display scoring script
 * Defining an Azure Environment
 * Deploy Model as Webservice (Local, ACI and AKS)
 * Test Deployment (Azure ML Service Call, Raw HTTP Request)
 * Clean up Webservice

## What is Azure Machine Learning Service?
Azure Machine Learning service is a cloud service that you can use to develop and deploy machine learning models. Using Azure Machine Learning service, you can track your models as you build, train, deploy, and manage them, all at the broad scale that the cloud provides.
![](./images/aml-overview.png)


#### How can we use Azure Machine Learning SDK for deployment and inferencing of a machine learning models?
Deployment and inferencing of a machine learning model, is often an cumbersome process. Once you a trained model and a scoring script working on your local machine, you will want to deploy this model as a web service.

To facilitate deployment and inferencing, the Azure Machine Learning Python SDK provides a high-level abstraction for model deployment of a web service running on your [local](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where#local) machine, in Azure Container Instance ([ACI](https://azure.microsoft.com/en-us/services/container-instances/)) or Azure Kubernetes Service ([AKS](https://azure.microsoft.com/en-us/services/kubernetes-service/)), which allows users to easily deploy their models in the Azure ecosystem.

## Prerequisites
* Understand the [architecture and terms](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning
* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-1st-experiment-sdk-setup) to:
    * Install the AML SDK
    * Create a workspace and its configuration file (config.json)
* For local scoring test, you will also need to have Tensorflow and Keras installed in the current Jupyter kernel.
* Please run through Part 1: [Working With Data and Training](1_AzureServiceClassifier_Training.ipynb) Notebook first to register your model
* Make sure you enable [Docker for non-root users](https://docs.docker.com/install/linux/linux-postinstall/) (This is needed to run Local Deployment). Run the following commands in your Terminal and go to the your [Jupyter dashboard](/tree) and click `Quit` on the top right corner. After the shutdown, the Notebook will be automatically refereshed with the new permissions.
```bash
    sudo usermod -a -G docker $USER
    newgrp docker
```

#### Enable Docker for non-root users

In [None]:
!sudo usermod -a -G docker $USER
!newgrp docker

>**Note:** Make you shutdown your Jupyter notebook to enable this access. Go to the your [Jupyter dashboard](/tree) and click `Quit` on the top right corner. After the shutdown, the Notebook will be automatically refereshed with the new permissions.

## Azure Service Classification Problem 
One of the key tasks to ensuring long term success of any Azure service is actively responding to related posts in online forums such as Stackoverflow. In order to keep track of these posts, Microsoft relies on the associated tags to direct questions to the appropriate support team. While Stackoverflow has different tags for each Azure service (azure-web-app-service, azure-virtual-machine-service, etc), people often use the generic **azure** tag. This makes it hard for specific teams to track down issues related to their product and as a result, many questions get left unanswered. 

**In order to solve this problem, we will be building a model to classify posts on Stackoverflow with the appropriate Azure service tag.**

We will be using a BERT (Bidirectional Encoder Representations from Transformers) model which was published by researchers at Google AI Language. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of natural language processing (NLP) tasks without substantial architecture modifications.

For more information about the BERT, please read this [paper](https://arxiv.org/pdf/1810.04805.pdf)

## Checking Azure Machine Learning Python SDK Version

If you are running this on a Notebook VM, the Azure Machine Learning Python SDK is installed by default. If you are running this locally, you can follow these [instructions](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/install?view=azure-ml-py) to install it using pip.

This tutorial requires version 1.0.69 or higher. We can import the Python SDK to ensure it has been properly installed:

In [1]:
# Check core SDK version number
import azureml.core

print("SDK version:", azureml.core.VERSION)

SDK version: 1.0.69


## Connect To Workspace

Initialize a [Workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the prerequisites step. Workspace.from_config() creates a workspace object from the details stored in config.json.

In [2]:
from azureml.core import Workspace

ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

Workspace name: 107151-aml-ws
Azure region: westus2
Subscription id: edb336ca-b85f-4204-8057-7fdb7d65322c
Resource group: aml-rg-107151


## Register Datastore
A [Datastore](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py) is used to store connection information to a central data storage. This allows you to access your storage without having to hard code this (potentially confidential) information into your scripts. 

In this tutorial, the model was been previously prepped and uploaded into a central [Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) container. We will register this container into our workspace as a datastore using a [shared access signature (SAS) token](https://docs.microsoft.com/en-us/azure/storage/common/storage-sas-overview). 



We need to define the following parameters to register a datastore:

- `ws`: The workspace object
- `datastore_name`: The name of the datastore, case insensitive, can only contain alphanumeric characters and _.
- `container_name`: The name of the azure blob container.
- `account_name`: The storage account name.
- `sas_token`: An account SAS token, defaults to None.


In [3]:
from azureml.core.datastore import Datastore

datastore_name = 'tfworld'
container_name = 'azureml-blobstore-7c6bdd88-21fa-453a-9c80-16998f02935f'
account_name = 'tfworld6818510241'
sas_token = '?sv=2019-02-02&ss=bfqt&srt=sco&sp=rl&se=2019-11-08T05:12:15Z&st=2019-10-23T20:12:15Z&spr=https&sig=eDqnc51TkqiIklpQfloT5vcU70pgzDuKb5PAGTvCdx4%3D'

datastore = Datastore.register_azure_blob_container(workspace=ws, 
                                                    datastore_name=datastore_name, 
                                                    container_name=container_name,
                                                    account_name=account_name, 
                                                    sas_token=sas_token)

#### If the datastore has already been registered, then you (and other users in your workspace) can directly run this cell.

In [4]:
datastore = ws.datastores['tfworld']

### Download Model from Datastore
Get the trained model from an Azure Blob container. The model is saved into two files, ``config.json`` and ``model.h5``.

In [5]:
from azureml.core.model import Model

datastore.download('./',prefix="azure-service-classifier/model")



0

### Registering the Model with the Workspace
Register the model to use in your workspace. 

In [6]:
model = Model.register(model_path = "./azure-service-classifier/model",
                       model_name = "azure-service-classifier", # this is the name the model is registered as
                       tags = {'pretrained': "BERT"},
                       workspace = ws)
model_dir = './azure-service-classifier/model'

Registering model azure-service-classifier


### Downloading and Using Registered Models
> If you already completed Part 1: [Working With Data and Training](1_AzureServiceClassifier_Training.ipynb) Notebook.You can dowload your registered BERT Model and use that instead of the model saved on the blob storage.

```python
model = ws.models['azure-service-classifier']
model_dir = model.download(target_dir='.', exist_ok=True, exists_ok=None)
```

## Inferencing on the test set
Let's check the version of the local Keras. Make sure it matches with the version number printed out in the training script. Otherwise you might not be able to load the model properly.

In [7]:
import keras
import tensorflow as tf

print("Keras version:", keras.__version__)
print("Tensorflow version:", tf.__version__)

Using TensorFlow backend.


Keras version: 2.3.1
Tensorflow version: 2.1.0-dev20191025


#### Install Transformers Library
We have trained BERT model using Tensorflow 2.0 and the open source [huggingface/transformers](https://github.com/huggingface/transformers) libary. So before we can load the model we need to make sure we have also installed the Transformers Library.

In [8]:
%pip install transformers

Note: you may need to restart the kernel to use updated packages.


#### Load the Tensorflow 2.0 BERT model.
Load the downloaded Tensorflow 2.0 BERT model

In [9]:
import keras2onnx
import onnx
from transformers import BertTokenizer, TFBertPreTrainedModel, TFBertMainLayer
from transformers.modeling_tf_utils import get_initializer
class TFBertForMultiClassification(TFBertPreTrainedModel):
    def __init__(self, config, *inputs, **kwargs):
        super(TFBertForMultiClassification, self).__init__(config, *inputs, **kwargs)
        self.num_labels = config.num_labels
        self.bert = TFBertMainLayer(config, name='bert')
        self.dropout = tf.keras.layers.Dropout(config.hidden_dropout_prob)
        self.classifier = tf.keras.layers.Dense(config.num_labels,
                                                kernel_initializer=get_initializer(config.initializer_range),
                                                name='classifier',
                                                activation='softmax')
    def call(self, inputs, **kwargs):
        outputs = self.bert(inputs, **kwargs)
        pooled_output = outputs[1]
        pooled_output = self.dropout(pooled_output, training=kwargs.get('training', False))
        logits = self.classifier(pooled_output)
        outputs = (logits,) + outputs[2:]  # add hidden states and attention if they are here
        return outputs  # logits, (hidden_states), (attentions)
    
max_seq_length = 128
labels = ['azure-web-app-service', 'azure-storage', 'azure-devops', 'azure-virtual-machine', 'azure-functions']
loaded_model = TFBertForMultiClassification.from_pretrained(model_dir, num_labels=len(labels))
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
print("Model loaded from disk.")

Model loaded from disk.


Feed in test sentence to test the BERT model. And time the duration of the prediction.

In [10]:
%%time
import json 

# Input test sentences
raw_data = json.dumps({
    'text': 'My VM is not working'
})

 # Encode inputs using tokenizer
inputs = tokenizer.encode_plus(
    json.loads(raw_data)['text'],
    add_special_tokens=True,
    max_length=max_seq_length
    )
input_ids, token_type_ids = inputs["input_ids"], inputs["token_type_ids"]

    # The mask has 1 for real tokens and 0 for padding tokens. Only real tokens are attended to.
attention_mask = [1] * len(input_ids)

    # Zero-pad up to the sequence length.
padding_length = max_seq_length - len(input_ids)
input_ids = input_ids + ([0] * padding_length)
attention_mask = attention_mask + ([0] * padding_length)
token_type_ids = token_type_ids + ([0] * padding_length)
    
    # Make prediction
predictions = loaded_model.predict({
        'input_ids': tf.convert_to_tensor([input_ids], dtype=tf.int32),
        'attention_mask': tf.convert_to_tensor([attention_mask], dtype=tf.int32),
        'token_type_ids': tf.convert_to_tensor([token_type_ids], dtype=tf.int32)
    })

result =  {
        'prediction': str(labels[predictions[0].argmax().item()]),
        'probability': str(predictions[0].max())
    }

print(result)

{'prediction': 'azure-virtual-machine', 'probability': '0.98652285'}
CPU times: user 8.53 s, sys: 78.9 ms, total: 8.61 s
Wall time: 8.18 s


As you can see based on the sample sentence the model can predict the probablity of the stackover flow tags related to that sentence.

## Inferencing with ONNX

### ONNX and ONNX Runtime
**ONNX (Open Neural Network Exchange)** is an interoperable standard format for ML models, with support for both DNN and traditional ML. Models can be converted from a variety of frameworks, such as Tensorflow, Keras, PyTorch, scikit-learn, and more (see [ONNX Conversion tutorials](https://github.com/onnx/tutorials#converting-to-onnx-format)). This provides data teams with the flexibility to use their framework of choice for their training needs, while streamlining the process to operationalize these models for production usage in a consistent way.",

 In this section, we will demonstrate how to use ONNX Runtime, a high performance inference engine for ONNX format models, for inferencing our model. Along with interoperability, ONNX Runtime's performance-focused architecture can also accelerate  inferencing for many models through graph optimizations, utilization of custom accelerators, and more. You can find more about performance tuning [here](https://github.com/microsoft/onnxruntime/blob/master/docs/ONNX_Runtime_Perf_Tuning.md).

#### Convert to ONNX

Before we can convert to ONNX we need to install the lastest version of Tensorflow 2, due to [bug in the TF2.0 conversion](https://github.com/tensorflow/tensorflow/issues/32849) process.

In [15]:
%pip install tf-nightly==2.1.0-dev20191025

Collecting tf-nightly==2.1.0-dev20191025
  Using cached https://files.pythonhosted.org/packages/cc/74/09da6454f37a5d4e4829419ed026cfe4e4199606ec8e2dffafcdf77cbf5c/tf_nightly-2.1.0.dev20191025-cp36-cp36m-manylinux2010_x86_64.whl
Installing collected packages: tf-nightly
Successfully installed tf-nightly-2.1.0.dev20191025
Note: you may need to restart the kernel to use updated packages.


In [11]:
onnx_model = keras2onnx.convert_keras(loaded_model, target_opset=10)
onnx.save_model(onnx_model, 'bert_tf2_convert.onnx')

#### Download ONNX Model

In [12]:
datastore.download('.',prefix="azure-service-classifier/model/bert_tf2.onnx")



0

#### Install ONNX Runtime

In [13]:
%pip install onnxruntime

Note: you may need to restart the kernel to use updated packages.


#### Loading ONNX Model
Load the downloaded ONNX BERT model.

In [15]:
import numpy as np
import onnxruntime as rt
from transformers import BertTokenizer, TFBertPreTrainedModel, TFBertMainLayer
max_seq_length = 128
labels = ['azure-web-app-service', 'azure-storage', 'azure-devops', 'azure-virtual-machine', 'azure-functions']
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')

sess = rt.InferenceSession("bert_tf2_convert.onnx")
print("ONNX Model loaded from disk.")

ONNX Model loaded from disk.


#### View the inputs and outputs of converted ONNX model

In [16]:
for i in range(len(sess.get_inputs())):
    input_name = sess.get_inputs()[i].name
    print("Input name  :", input_name)
    input_shape = sess.get_inputs()[i].shape
    print("Input shape :", input_shape)
    input_type = sess.get_inputs()[i].type
    print("Input type  :", input_type)

Input name  : token_type_ids:0
Input shape : ['unk__847', 128]
Input type  : tensor(int32)
Input name  : input_ids:0
Input shape : ['unk__848', 128]
Input type  : tensor(int32)
Input name  : attention_mask:0
Input shape : ['unk__849', 128]
Input type  : tensor(int32)


In [17]:
for i in range(len(sess.get_outputs())):
    output_name = sess.get_outputs()[i].name
    print("Output name  :", output_name)  
    output_shape = sess.get_outputs()[i].shape
    print("Output shape :", output_shape)
    output_type = sess.get_outputs()[i].type
    print("Output type  :", output_type)

Output name  : Identity:0
Output shape : ['unk__850', 5]
Output type  : tensor(float)


#### Inferencing with ONNX Runtimw

In [18]:
%%time
import json 

# Input test sentences
raw_data = json.dumps({
    'text': 'My VM is not working'
})

labels = ['azure-web-app-service', 'azure-storage', 'azure-devops', 'azure-virtual-machine', 'azure-functions']

# Encode inputs using tokenizer
inputs = tokenizer.encode_plus(
    json.loads(raw_data)['text'],
    add_special_tokens=True,
    max_length=max_seq_length
    )
input_ids, token_type_ids = inputs["input_ids"], inputs["token_type_ids"]

    # The mask has 1 for real tokens and 0 for padding tokens. Only real tokens are attended to.
attention_mask = [1] * len(input_ids)

    # Zero-pad up to the sequence length.
padding_length = max_seq_length - len(input_ids)
input_ids = input_ids + ([0] * padding_length)
attention_mask = attention_mask + ([0] * padding_length)
token_type_ids = token_type_ids + ([0] * padding_length)
    
    # Make prediction
convert_input = {
        sess.get_inputs()[0].name: np.array(tf.convert_to_tensor([token_type_ids], dtype=tf.int32)),
        sess.get_inputs()[1].name: np.array(tf.convert_to_tensor([input_ids], dtype=tf.int32)),
        sess.get_inputs()[2].name: np.array(tf.convert_to_tensor([attention_mask], dtype=tf.int32))
    }

predictions = sess.run([output_name], convert_input)

result =  {
        'prediction': str(labels[predictions[0].argmax().item()]),
        'probability': str(predictions[0].max())
    }

print(result)

{'prediction': 'azure-virtual-machine', 'probability': '0.98652273'}
CPU times: user 770 ms, sys: 0 ns, total: 770 ms
Wall time: 199 ms


## Deploy models on Azure ML

Now we are ready to deploy the model as a web service running on your [local](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where#local) machine, in Azure Container Instance [ACI](https://azure.microsoft.com/en-us/services/container-instances/) or Azure Kubernetes Service [AKS](https://azure.microsoft.com/en-us/services/kubernetes-service/). Azure Machine Learning accomplishes this by constructing a Docker image with the scoring logic and model baked in. 
> **Note:** For this Notebook, we'll use the original model format for deployment, but the ONNX model can be deployed in the same way by using ONNX Runtime in the scoring script.

![](./images/aml-deploy.png)


### Deploying a web service
Once you've tested the model and are satisfied with the results, deploy the model as a web service. For this Notebook, we'll use the original model format for deployment, but note that the ONNX model can be deployed in the same way by using ONNX Runtime in the scoring script.

To build the correct environment, provide the following:
* A scoring script to show how to use the model
* An environment file to show what packages need to be installed
* A configuration file to build the web service
* The model you trained before

Read more about deployment [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where)

### Create score.py

First, we will create a scoring script that will be invoked by the web service call. We have prepared a [score.py script](code/scoring/score.py) in advance that scores your BERT model.

* Note that the scoring script must have two required functions, ``init()`` and ``run(input_data)``.
    * In ``init()`` function, you typically load the model into a global object. This function is executed only once when the Docker container is started.
    * In ``run(input_data)`` function, the model is used to predict a value based on the input data. The input and output to run typically use JSON as serialization and de-serialization format but you are not limited to that.

In [25]:
%pycat score.py

### Create Environment

You can create and/or use a Conda environment using the [Conda Dependencies object](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.conda_dependencies.condadependencies?view=azure-ml-py) when deploying a Webservice.

In [26]:
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies 

myenv = CondaDependencies.create(conda_packages=['numpy','pandas'],
                                 pip_packages=['numpy','pandas','inference-schema[numpy-support]','azureml-defaults','tensorflow==2.0.0','transformers==2.0.0'])

with open("myenv.yml","w") as f:
    f.write(myenv.serialize_to_string())

Review the content of the `myenv.yml` file.

In [27]:
%pycat myenv.yml

## Create Inference Configuration

We need to define the [Inference Configuration](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.inferenceconfig?view=azure-ml-py) for the web service. There is support for a source directory, you can upload an entire folder from your local machine as dependencies for the Webservice.
Note: in that case, your entry_script and conda_file paths are relative paths to the source_directory path.

Sample code for using a source directory:

```python
inference_config = InferenceConfig(source_directory="C:/abc",
                                   runtime= "python", 
                                   entry_script="x/y/score.py",
                                   conda_file="env/myenv.yml")
```

 - source_directory = holds source path as string, this entire folder gets added in image so its really easy to access any files within this folder or subfolder
 - runtime = Which runtime to use for the image. Current supported runtimes are 'spark-py' and 'python
 - entry_script = contains logic specific to initializing your model and running predictions
 - conda_file = manages conda and python package dependencies.
 
 
 > **Note:** Deployment uses the inference configuration deployment configuration to deploy the models. The deployment process is similar regardless of the compute target. Deploying to AKS is slightly different because you must provide a reference to the AKS cluster.

In [28]:
from azureml.core.model import InferenceConfig

inference_config = InferenceConfig(source_directory="./",
                                   runtime= "python", 
                                   entry_script="score.py",
                                   conda_file="myenv.yml"
                                  )

## Deploy as a Local Service

Estimated time to complete: **about 3-7 minutes**

Configure the image and deploy it locally. The following code goes through these steps:

* Build an image on local machine (or VM, if you are using a VM) using:
   * The scoring file (`score.py`)
   * The environment file (`myenv.yml`)
   * The model file 
* Define [Local Deployment Configuration](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice.localwebservice?view=azure-ml-py#deploy-configuration-port-none-)
* Send the image to local docker instance. 
* Start up a container using the image.
* Get the web service HTTP endpoint.
* This has a very quick turnaround time and is great for testing service before it is deployed to production

> **Note:** Make sure you enable [Docker for non-root users](https://docs.docker.com/install/linux/linux-postinstall/) (This is needed to run Local Deployment). Run the following commands in your Terminal and go to the your [Jupyter dashboard](/tree) and click `Quit` on the top right corner. After the shutdown, the Notebook will be automatically refereshed with the new permissions.
```bash
    sudo usermod -a -G docker $USER
    newgrp docker
```

#### Deploy Local Service

In [29]:
from azureml.core.model import InferenceConfig, Model
from azureml.core.webservice import LocalWebservice

# Create a local deployment for the web service endpoint
deployment_config = LocalWebservice.deploy_configuration()
# Deploy the service
local_service = Model.deploy(
    ws, "mymodel", [model], inference_config, deployment_config)
# Wait for the deployment to complete
local_service.wait_for_deployment(True)
# Display the port that the web service is available on
print(local_service.port)

Downloading model azure-service-classifier:7 to /tmp/azureml_h22qxwwv/azure-service-classifier/7
Generating Docker build context.
2019/10/25 19:18:31 Downloading source code...
2019/10/25 19:18:38 Finished downloading source code
2019/10/25 19:18:38 Creating Docker network: acb_default_network, driver: 'bridge'
2019/10/25 19:18:38 Successfully set up Docker network: acb_default_network
2019/10/25 19:18:38 Setting up Docker configuration...
2019/10/25 19:18:39 Successfully set up Docker configuration
2019/10/25 19:18:39 Logging in to registry: 107151amlws55918bf0.azurecr.io
2019/10/25 19:18:41 Successfully logged into 107151amlws55918bf0.azurecr.io
2019/10/25 19:18:41 Executing step ID: acb_step_0. Timeout(sec): 5400, Working directory: '', Network: 'acb_default_network'
2019/10/25 19:18:41 Scanning for dependencies...
2019/10/25 19:18:41 Successfully scanned dependencies
2019/10/25 19:18:41 Launching container with name: acb_step_0
Sending build context to Docker daemon  59.39kB
Step 1

  Downloading https://files.pythonhosted.org/packages/30/54/c9810421e41ec0bca2228c6f06b1b1189b196b69533cbcac9f71b44727f8/grpcio-1.24.3-cp36-cp36m-manylinux2010_x86_64.whl (2.2MB)
Collecting keras-preprocessing>=1.0.5
  Downloading https://files.pythonhosted.org/packages/28/6a/8c1f62c37212d9fc441a7e26736df51ce6f0e38455816445471f10da4f0a/Keras_Preprocessing-1.1.0-py2.py3-none-any.whl (41kB)
Collecting requests
  Downloading https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl (57kB)
Collecting sacremoses
  Downloading https://files.pythonhosted.org/packages/1f/8e/ed5364a06a9ba720fddd9820155cc57300d28f5f43a6fd7b7e817177e642/sacremoses-0.0.35.tar.gz (859kB)
Collecting boto3
  Downloading https://files.pythonhosted.org/packages/0e/41/27fb3969a76240d4c42a8f64b9d5ae78c676bab38e980e03b1bbaef279bd/boto3-1.10.2-py2.py3-none-any.whl (128kB)
Collecting regex
  Downloading https://files.pythonhosted.org/packa

Successfully installed Jinja2-2.10.3 MarkupSafe-1.1.1 PyJWT-1.7.1 SecretStorage-3.1.1 Werkzeug-0.16.0 absl-py-0.8.1 adal-1.2.2 applicationinsights-0.11.9 astor-0.8.0 azure-common-1.1.23 azure-graphrbac-0.61.1 azure-mgmt-authorization-0.60.0 azure-mgmt-containerregistry-2.8.0 azure-mgmt-keyvault-2.0.0 azure-mgmt-resource-5.1.0 azure-mgmt-storage-6.0.0 azureml-core-1.0.69 azureml-defaults-1.0.69 azureml-model-management-sdk-1.0.1b6.post1 backports.tempfile-1.0 backports.weakref-1.0.post1 boto3-1.10.2 botocore-1.13.2 cffi-1.13.1 chardet-3.0.4 click-7.0 configparser-3.7.4 contextlib2-0.6.0.post1 cryptography-2.8 dill-0.3.1.1 docker-4.1.0 docutils-0.15.2 flask-1.0.3 gast-0.2.2 google-pasta-0.1.7 grpcio-1.24.3 gunicorn-19.9.0 h5py-2.10.0 idna-2.8 inference-schema-1.0.0 isodate-0.6.0 itsdangerous-1.1.0 jeepney-0.4.1 jmespath-0.9.4 joblib-0.14.0 json-logging-py-0.2 jsonpickle-1.2 keras-applications-1.0.8 keras-preprocessing-1.1.0 liac-arff-2.4.0 markdown-3.1.1 msrest-0.6.10 msrestazure-0.6.2 n

This is the scoring web service endpoint:

In [30]:
print(local_service.scoring_uri)

http://localhost:32770/score


### Test Local Service

Let's test the deployed model. Pick a random samples about an issue, and send it to the web service. Note here we are using the run API in the SDK to invoke the service. You can also make raw HTTP calls using any HTTP tool such as curl.

After the invocation, we print the returned predictions.

In [31]:
%%time
import json
raw_data = json.dumps({
    'text': 'My VM is not working'
})

prediction = local_service.run(input_data=raw_data)

Making a scoring call...
Scoring result:
{'prediction': 'azure-virtual-machine', 'probability': '0.98652285'}
CPU times: user 8.76 ms, sys: 80 µs, total: 8.84 ms
Wall time: 9.26 s


### Reloading Webservice
You can update your score.py file and then call reload() to quickly restart the service. This will only reload your execution script and dependency files, it will not rebuild the underlying Docker image. As a result, reload() is fast.

In [32]:
%%writefile score.py
import os
import json
import tensorflow as tf
from transformers import TFBertPreTrainedModel, TFBertMainLayer, BertTokenizer
from transformers.modeling_tf_utils import get_initializer
import logging
logging.getLogger("transformers.tokenization_utils").setLevel(logging.ERROR)


class TFBertForMultiClassification(TFBertPreTrainedModel):

    def __init__(self, config, *inputs, **kwargs):
        super(TFBertForMultiClassification, self) \
            .__init__(config, *inputs, **kwargs)
        self.num_labels = config.num_labels

        self.bert = TFBertMainLayer(config, name='bert')
        self.dropout = tf.keras.layers.Dropout(config.hidden_dropout_prob)
        self.classifier = tf.keras.layers.Dense(
            config.num_labels,
            kernel_initializer=get_initializer(config.initializer_range),
            name='classifier',
            activation='softmax')

    def call(self, inputs, **kwargs):
        outputs = self.bert(inputs, **kwargs)

        pooled_output = outputs[1]

        pooled_output = self.dropout(
            pooled_output,
            training=kwargs.get('training', False))
        logits = self.classifier(pooled_output)

        # add hidden states and attention if they are here
        outputs = (logits,) + outputs[2:]

        return outputs  # logits, (hidden_states), (attentions)


max_seq_length = 128
labels = ['azure-web-app-service', 'azure-storage',
    'azure-devops', 'azure-virtual-machine', 'azure-functions']


def init():
    global tokenizer, model
    # os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'azure-service-classifier')
    tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
    model_dir = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'model')
    model = TFBertForMultiClassification \
        .from_pretrained(model_dir, num_labels=len(labels))
    print("hello from the reloaded script")

def run(raw_data):

    # Encode inputs using tokenizer
    inputs = tokenizer.encode_plus(
        json.loads(raw_data)['text'],
        add_special_tokens=True,
        max_length=max_seq_length
    )
    input_ids, token_type_ids = inputs["input_ids"], inputs["token_type_ids"]

    # The mask has 1 for real tokens and 0 for padding tokens.
    # Only real tokens are attended to.
    attention_mask = [1] * len(input_ids)

    # Zero-pad up to the sequence length.
    padding_length = max_seq_length - len(input_ids)
    input_ids = input_ids + ([0] * padding_length)
    attention_mask = attention_mask + ([0] * padding_length)
    token_type_ids = token_type_ids + ([0] * padding_length)

    # Make prediction
    predictions = model.predict({
        'input_ids': tf.convert_to_tensor([input_ids], dtype=tf.int32),
        'attention_mask': tf.convert_to_tensor(
            [attention_mask],
            dtype=tf.int32),
        'token_type_ids': tf.convert_to_tensor(
            [token_type_ids], 
            dtype=tf.int32)
    })

    result = {
        'prediction': str(labels[predictions[0].argmax().item()]),
        'probability': str(predictions[0].max())
    }

    print(result)
    return result


init()
run(json.dumps({
    'text': 'My VM is not working'
}))


Overwriting score.py


In [33]:
local_service.reload()

Container has been successfully cleaned up.
Starting Docker container...
Docker container running.


### Updating Webservice
If you do need to rebuild the image -- to add a new Conda or pip package, for instance -- you will have to call update(), instead (see below).

```python
local_service.update(models=[loaded_model], 
                     image_config=None, 
                     deployment_config=None, 
                     wait=False, inference_config=None)
```

### View service Logs (Debug, when something goes wrong )
>**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:** Run this cell

You should see the phrase **"hello from the reloaded script"** in the logs, because we added it to the script when we did a service reload.

In [35]:
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(local_service.get_logs())

('/bin/bash: '
 '/azureml-envs/azureml_acb89c589d9589e8a1eaf57a0759fafe/lib/libtinfo.so.5: no '
 'version information available (required by /bin/bash)\n'
 '/bin/bash: '
 '/azureml-envs/azureml_acb89c589d9589e8a1eaf57a0759fafe/lib/libtinfo.so.5: no '
 'version information available (required by /bin/bash)\n'
 '/bin/bash: '
 '/azureml-envs/azureml_acb89c589d9589e8a1eaf57a0759fafe/lib/libtinfo.so.5: no '
 'version information available (required by /bin/bash)\n'
 '/bin/bash: '
 '/azureml-envs/azureml_acb89c589d9589e8a1eaf57a0759fafe/lib/libtinfo.so.5: no '
 'version information available (required by /bin/bash)\n'
 '2019-10-25T19:40:01,073342513+00:00 - rsyslog/run \n'
 '2019-10-25T19:40:01,073431413+00:00 - gunicorn/run \n'
 '2019-10-25T19:40:01,073570613+00:00 - iot-server/run \n'
 'bash: '
 '/azureml-envs/azureml_acb89c589d9589e8a1eaf57a0759fafe/lib/libtinfo.so.5: no '
 'version information available (required by bash)\n'
 '2019-10-25T19:40:01,079037511+00:00 - nginx/run \n'
 '/usr/sb

## Deploy in ACI
Estimated time to complete: **about 3-7 minutes**

Configure the image and deploy. The following code goes through these steps:

* Build an image using:
   * The scoring file (`score.py`)
   * The environment file (`myenv.yml`)
   * The model file
* Define [ACI Deployment Configuration](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice.aciwebservice?view=azure-ml-py#deploy-configuration-cpu-cores-none--memory-gb-none--tags-none--properties-none--description-none--location-none--auth-enabled-none--ssl-enabled-none--enable-app-insights-none--ssl-cert-pem-file-none--ssl-key-pem-file-none--ssl-cname-none-)
* Send the image to the ACI container.
* Start up a container in ACI using the image.
* Get the web service HTTP endpoint.

In [None]:
%%time
from azureml.core.webservice import Webservice
from azureml.exceptions import WebserviceException
from azureml.core.webservice import AciWebservice, Webservice

## Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed for your ACI container. 
## If you feel you need more later, you would have to recreate the image and redeploy the service.
aciconfig = AciWebservice.deploy_configuration(cpu_cores=2, 
                                               memory_gb=4, 
                                               tags={"model": "BERT",  "method" : "tensorflow"}, 
                                               description='Predict StackoverFlow tags with BERT')

aci_service_name = 'asc-aciservice'

try:
    # if you want to get existing service below is the command
    # since aci name needs to be unique in subscription deleting existing aci if any
    # we use aci_service_name to create azure ac
    aci_service = Webservice(ws, name=aci_service_name)
    if aci_service:
        aci_service.delete()
except WebserviceException as e:
    print()

aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)

aci_service.wait_for_deployment(True)
print(aci_service.state)

This is the scoring web service endpoint:

In [37]:
print(aci_service.scoring_uri)

http://78bc80a2-7f3e-4fa4-879f-acaec24f3296.westus2.azurecontainer.io/score


### Test the deployed model

Let's test the deployed model. Pick a random samples about an Azure issue, and send it to the web service. Note here we are using the run API in the SDK to invoke the service. You can also make raw HTTP calls using any HTTP tool such as curl.

After the invocation, we print the returned predictions.

In [38]:
%%time
import json
raw_data = json.dumps({
    'text': 'My VM is not working'
})

prediction = aci_service.run(input_data=raw_data)
print(prediction)

{'prediction': 'azure-virtual-machine', 'probability': '0.98652285'}
CPU times: user 24.2 ms, sys: 1.08 ms, total: 25.3 ms
Wall time: 44.8 s


### View service Logs (Debug, when something goes wrong )
>**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:** Run this cell

In [39]:
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(aci_service.get_logs())

('2019-10-25T20:11:57,227785474+00:00 - rsyslog/run \n'
 '/bin/bash: '
 '/azureml-envs/azureml_acb89c589d9589e8a1eaf57a0759fafe/lib/libtinfo.so.5: no '
 'version information available (required by /bin/bash)\n'
 '/bin/bash: '
 '/azureml-envs/azureml_acb89c589d9589e8a1eaf57a0759fafe/lib/libtinfo.so.5: no '
 'version information available (required by /bin/bash)\n'
 '/bin/bash: '
 '/azureml-envs/azureml_acb89c589d9589e8a1eaf57a0759fafe/lib/libtinfo.so.5: no '
 'version information available (required by /bin/bash)\n'
 '/bin/bash: '
 '/azureml-envs/azureml_acb89c589d9589e8a1eaf57a0759fafe/lib/libtinfo.so.5: no '
 'version information available (required by /bin/bash)\n'
 'bash: '
 '/azureml-envs/azureml_acb89c589d9589e8a1eaf57a0759fafe/lib/libtinfo.so.5: no '
 'version information available (required by bash)\n'
 '2019-10-25T20:11:57,229030771+00:00 - iot-server/run \n'
 '2019-10-25T20:11:57,230760367+00:00 - gunicorn/run \n'
 '2019-10-25T20:11:57,246894727+00:00 - nginx/run \n'
 '/usr/sb

## Deploy in AKS (Single Node)

Estimated time to complete: **about 15-25 minutes**, 10-15 mins for AKS provisioning and 5-10 mins to deploy service

Configure the image and deploy. The following code goes through these steps:

* Provision a Dev Test AKS Cluster
* Build an image using:
   * The scoring file (`score.py`)
   * The environment file (`myenv.yml`)
   * The model file
* Define [AKS Provisioning Configuration](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.compute.akscompute?view=azure-ml-py#provisioning-configuration-agent-count-none--vm-size-none--ssl-cname-none--ssl-cert-pem-file-none--ssl-key-pem-file-none--location-none--vnet-resourcegroup-name-none--vnet-name-none--subnet-name-none--service-cidr-none--dns-service-ip-none--docker-bridge-cidr-none--cluster-purpose-none-)
* Provision an AKS Cluster
* Define [AKS Deployment Configuration](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice.akswebservice?view=azure-ml-py#deploy-configuration-autoscale-enabled-none--autoscale-min-replicas-none--autoscale-max-replicas-none--autoscale-refresh-seconds-none--autoscale-target-utilization-none--collect-model-data-none--auth-enabled-none--cpu-cores-none--memory-gb-none--enable-app-insights-none--scoring-timeout-ms-none--replica-max-concurrent-requests-none--max-request-wait-time-none--num-replicas-none--primary-key-none--secondary-key-none--tags-none--properties-none--description-none--gpu-cores-none--period-seconds-none--initial-delay-seconds-none--timeout-seconds-none--success-threshold-none--failure-threshold-none--namespace-none--token-auth-enabled-none-)
* Send the image to the AKS cluster.
* Start up a container in AKS using the image.
* Get the web service HTTP endpoint.

#### Provisioning Cluster

In [None]:
from azureml.core.compute import AksCompute, ComputeTarget

# Use the default configuration (you can also provide parameters to customize this).
# For example, to create a dev/test cluster, use:
# prov_config = AksCompute.provisioning_configuration(cluster_purpose = AksCompute.ClusterPurpose.DEV_TEST)
prov_config = AksCompute.provisioning_configuration(cluster_purpose = AksCompute.ClusterPurpose.DEV_TEST)

aks_name = 'myaks'
# Create the cluster
aks_target = ComputeTarget.create(workspace = ws,
                                    name = aks_name,
                                    provisioning_configuration = prov_config)

# Wait for the create process to complete
aks_target.wait_for_completion(show_output = True)

#### Deploying the model

In [None]:
from azureml.core.webservice import AksWebservice, Webservice
from azureml.core.model import Model

aks_target = AksCompute(ws,"myaks")

## Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed for your cluster. 
## If you feel you need more later, you would have to recreate the image and redeploy the service.
deployment_config = AksWebservice.deploy_configuration(cpu_cores = 2, memory_gb = 4)

aks_service = Model.deploy(ws, "myservice", [model], inference_config, deployment_config, aks_target)
aks_service.wait_for_deployment(show_output = True)
print(aks_service.state)

### Test the deployed model

#### Using the Azure SDK service call

We can use Azure SDK to make a service call with a simple function

In [None]:
%%time
import json
raw_data = json.dumps({
    'text': 'My VM is not working'
})

prediction = aks_service.run(input_data=raw_data)
print(prediction)

This is the scoring web service endpoint:

In [None]:
print(aks_service.scoring_uri)

#### Using HTTP call

We will make a Jupyter widget so we can now send construct raw HTTP request and send to the service through the widget.

#### Test Web Service with HTTP call

In [None]:
import ipywidgets as widgets
from ipywidgets import Layout, Button, Box, FloatText, Textarea, Dropdown, Label, IntSlider, VBox

from IPython.display import display


import requests

text = widgets.Text(
    value='',
    placeholder='Type a query',
    description='Question:',
    disabled=False
)

button = widgets.Button(description="Get Tag!")
output = widgets.Output()

items = [text, button] 

box_layout = Layout(display='flex',
                    flex_flow='row',
                    align_items='stretch',
                    width='70%')

box_auto = Box(children=items, layout=box_layout)


def on_button_clicked(b):
    with output:
        input_data = '{\"text\": \"'+ text.value +'\"}'
        headers = {'Content-Type':'application/json'}
        resp = requests.post(local_service.scoring_uri, input_data, headers=headers)
       
        print("="*10)
        print("Question:", text.value)
        print("POST to url", local_service.scoring_uri)
        print("Prediction:", resp.text)
        print("="*10)

button.on_click(on_button_clicked)

#Display the GUI
VBox([box_auto, output])

Doing a raw HTTP request and send to the service through without a widget.

In [None]:
query = 'My VM is not working'
input_data = '{\"text\": \"'+ query +'\"}'
headers = {'Content-Type':'application/json'}
resp = requests.post(local_service.scoring_uri, input_data, headers=headers)

print("="*10)
print("Question:", query)
print("POST to url", local_service.scoring_uri)
print("Prediction:", resp.text)
print("="*10)

### View service Logs (Debug, when something goes wrong )
>**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:** Run this cell

In [None]:
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(aks_service.get_logs())

## Summary of workspace
Let's look at the workspace after the web service was deployed. You should see

* a registered model named and with the id 
* an AKS and ACI webservice called with some scoring URL

In [None]:
models = ws.models
for name, model in models.items():
    print("Model: {}, ID: {}".format(name, model.id))
    
webservices = ws.webservices
for name, webservice in webservices.items():
    print("Webservice: {}, scoring URI: {}".format(name, webservice.scoring_uri))

## Delete ACI to clean up
You can delete the ACI deployment with a simple delete API call.

In [None]:
local_service.delete()
aci_service.delete()
aks_service.delete()