# Using batch deployments for NLP processing

Importing the required libraries. This notebook requires:

- `azure-ai-ml`
- `mlflow`
- `azureml-mlflow`
- `numpy`
- `pandas`
- `huggingface`
- `torch`

In [None]:
from azure.ai.ml import MLClient, Input
from azure.ai.ml.entities import (
    BatchEndpoint,
    BatchDeployment,
    Model,
    AmlCompute,
    Data,
    BatchRetrySettings,
    CodeConfiguration,
    Environment,
)
from azure.ai.ml.constants import AssetTypes, BatchDeploymentOutputAction
from azure.identity import DefaultAzureCredential

## Accessing the Azure Machine Learning workspace

In [None]:
subscription_id = "<subscription>"
resource_group = "<resource-group>"
workspace = "<workspace>"

ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)

## About the model

Let's review how the model is built. The model was built using TensorFlow along with the RestNet architecture ([Identity Mappings in Deep Residual Networks](https://arxiv.org/abs/1603.05027)). This model has the following constraints that are important to keep in mind for deployment:

* In work with images of size 244x244 (tensors of `(224, 224, 3)`).
* It requires inputs to be scaled to the range `[0,1]`.

In [None]:
from transformers import pipeline, AutoTokenizer, TFBartForConditionalGeneration

summarizer = pipeline(
    "summarization", model="facebook/bart-large-cnn", tokenizer=tokenizer
)

Testing if the model works:

Let's save this model locally:

In [None]:
model_local_path = "bart-text-summarization/model"
summarizer.save_pretrained(model_local_path)

## Registering the model

We need to register the model in order to use it with Azure Machine Learning:

In [None]:
model_name = "bart-text-summarization"

In [None]:
if not any(filter(lambda m: m.name == model_name, ml_client.models.list())):
    print(f"Model {model_name} is not registered. Creating...")
    model = ml_client.models.create_or_update(
        Model(name=model_name, path=model_local_path, type=AssetTypes.CUSTOM_MODEL)
    )

Let's get a reference to the model:

In [None]:
model = ml_client.models.get(name=model_name, label="latest")

## Creating a scoring script to work with the model

In [None]:
%%writefile bart-text-summarization/code/transformer_scorer.py

import os
import numpy as np
from transformers import pipeline, AutoTokenizer, TFBartForConditionalGeneration
from datasets import load_dataset

def init():
    global model
    global tokenizer

    # AZUREML_MODEL_DIR is an environment variable created during deployment
    model_path = os.path.join(os.environ["AZUREML_MODEL_DIR"], "model")

    # load the model
    tokenizer = AutoTokenizer.from_pretrained(model_path, truncation=True, max_length=1024)
    model = TFBartForConditionalGeneration.from_pretrained(model_path)

def run(mini_batch):
    resultList = []

    ds = load_dataset('csv', data_files={ 'score': mini_batch})
    for text in ds['score']['text']:
        # perform inference
        input_ids = tokenizer.batch_encode_plus([text], truncation=True, padding=True, max_length=1024)['input_ids']
        summary_ids = model.generate(input_ids, max_length=130, min_length=30, do_sample=False)
        summaries = [tokenizer.decode(s, skip_special_tokens=True, clean_up_tokenization_spaces=False) for s in summary_ids]

        # Get results:
        resultList.append(summaries[0])

    return resultList

> tokenizer is configured to truncate the lenght of the text as the model has a limit of 1024 tokens.

## Creating the deployment

First, let's create the endpoint that is going to host the batch deployments. Remember that each endpoint can host multiple deployments at any time, however, only one of them is the default one:

In [None]:
endpoint_name = "text-summarization-batch"
endpoint = BatchEndpoint(
    name=endpoint_name,
    description="An batch service to perform text sumarization of content in CSV files",
)

In [None]:
ml_client.batch_endpoints.begin_create_or_update(endpoint)

Batch endpoints can run on any Azure ML compute that already exists in the workspace. That means that multiple batch deployments can share the same compute infrastructure. In this example, we are going to work on an AzureML compute cluster called `cpu-cluster`. Let's verify the compute exists on the workspace or create it otherwise.

In [None]:
compute_name = "cpu-cluster"
if not any(filter(lambda m: m.name == compute_name, ml_client.compute.list())):
    print(f"Compute {compute_name} is not created. Creating...")
    compute_cluster = AmlCompute(
        name=compute_name, description="amlcompute", min_instances=0, max_instances=5
    )
    ml_client.begin_create_or_update(compute_cluster)

Compute may take time to be created. Let's wait for it:

In [None]:
from time import sleep

print("Waiting for compute", end="")
while ml_client.compute.get(name=compute_name).provisioning_state == "Creating":
    sleep(1)
    print(".", end="")

print(" [DONE]")

Let's create the environment. In our case, our model runs on `TensorFlow`. Azure Machine Learning already has an environment with the required software installed, so we can reutilize this environment.

In [None]:
environment = Environment(
    conda_file="./bart-text-summarization/environment/conda.yml",
    image="mcr.microsoft.com/azureml/tensorflow-2.4-ubuntu18.04-py37-cpu-inference:latest",
)

Let's create a deployment under the given endpoint.

In [None]:
deployment = BatchDeployment(
    name="text-summarization-hfbart",
    description="A text summarization deployment implemented with HuggingFace and BART architecture",
    endpoint_name=endpoint.name,
    model=model,
    environment=environment,
    code_configuration=CodeConfiguration(
        code="./bart-text-summarization/code/",
        scoring_script="transformer_scorer.py",
    ),
    compute=compute_name,
    instance_count=2,
    max_concurrency_per_instance=1,
    mini_batch_size=1,
    output_action=BatchDeploymentOutputAction.APPEND_ROW,
    output_file_name="predictions.csv",
    retry_settings=BatchRetrySettings(max_retries=3, timeout=3000),
    logging_level="info",
)

In [None]:
ml_client.batch_deployments.begin_create_or_update(deployment)

Let's update the default deployment name in the endpoint:

In [None]:
endpoint = ml_client.batch_endpoints.get(endpoint_name)
endpoint.defaults.deployment_name = deployment.name
ml_client.batch_endpoints.begin_create_or_update(endpoint)

We can see the endpoint URL as follows:

In [None]:
endpoint.scoring_uri

## Testing the endpoint

Once the deployment is created, it is ready to receive jobs. Let's first register a data asset so we can run the job against it. This data asset is a folder containing 1000 images from the original ImageNet dataset. We are going to download it first and then create the data asset:

In [None]:
data_path = "bart-text-summarization/data/"
dataset_name = "billsummary-small"

billsummary_data = Data(
    path=data_path,
    type=AssetTypes.URI_FOLDER,
    description="A sample of the billsum dataset for text summarization, in CSV file format",
    name=dataset_name,
)

ml_client.data.create_or_update(billsummary_data)

In [None]:
billsummary_data = ml_client.data.get(name=dataset_name, label="latest")

Let's use this data as an input for the job:

In [None]:
input = Input(type=AssetTypes.URI_FOLDER, path=billsummary_data.id)

In [None]:
job = ml_client.batch_endpoints.invoke(
    endpoint_name=endpoint.name, input=input
)

You can use the returned job object to check the status of the job:

In [None]:
ml_client.jobs.get(job.name)