## Text Classification Inference using Batch Endpoints

This sample shows how to deploy `text-classification` type models to a batch endpoint for inference.

### Task
`text-classification` is generic task type that can be used for scenarios such as sentiment analysis, emotion detection, grammar checking, spam filtering, etc. In this example, we will test for entailment v/s contradiction, meaning given a premise sentence and a hypothesis sentence, the task is to predict whether the premise entails the hypothesis (entailment), contradicts the hypothesis (contradiction), or neither (neutral). 

### Inference data
The Multi-Genre Natural Language Inference Corpus, or MNLI is a crowd sourced collection of sentence pairs with textual entailment annotations.The [MNLI](https://huggingface.co/datasets/glue) dataset is a subset of the larger [General Language Understanding Evaluation](https://gluebenchmark.com/) dataset. A copy of this dataset is available in the [glue-mnli-dataset](./glue-mnli-dataset/) folder.

### Model
Look for models tagged with `text-classification` in the system registry. Just looking for `text-classification` is not sufficient, you need to check if the model is specifically finetuned for  entailment v/s contradiction by studying the model card and looking at the input/output samples or signatures of the model. In this notebook, we use the `microsoft-deberta-base-mnli` model.

  
### Outline
* Set up pre-requisites.
* Pick a model to deploy.
* Prepare data for inference. 
* Deploy the model for batch inference.
* Run a batch inference job.
* Review inference predictions.
* Clean up resources.


### 1. Set up pre-requisites
* Install dependencies.
* Connect to AzureML Workspace. Learn more at [set up SDK authentication](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk). Replace  `<WORKSPACE_NAME>`, `<RESOURCE_GROUP>` and `<SUBSCRIPTION_ID>` below.
* Connect to `azureml` system registry.
* Create or update compute.

In [None]:
# Import packages used by the following code snippets
import csv
import json
import os
import time

import pandas as pd

from azure.ai.ml import Input, MLClient
from azure.ai.ml.constants import AssetTypes
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml.entities import (
    AmlCompute,
    BatchDeployment,
    BatchEndpoint,
    BatchRetrySettings,
    Model,
)

In [None]:
subscription_id = "<SUBSCRIPTION_ID>"
resource_group_name = "<RESOURCE_GROUP>"
workspace_name = "<WORKSPACE_NAME>"

#### Connect to workspace and registry using ML clients.

In [None]:
try:
    credential = DefaultAzureCredential()
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    credential = InteractiveBrowserCredential()

workspace_ml_client = MLClient(
    credential,
    subscription_id=subscription_id,
    resource_group_name=resource_group_name,
    workspace_name=workspace_name,
)
# The models, fine tuning pipelines, and environments are available in the AzureML system registry, "azureml"
registry_ml_client = MLClient(credential, registry_name="azureml")

#### Create a compute cluster.
Use the model card from the AzureML system registry to check the minimum required inferencing SKU, referenced as `size` below. If you already have a sufficient compute cluster, you can simply define the name in `compute_name` in the following code block. 

In [None]:
compute_name = "cpu-cluster"

In [None]:
compute_cluster = AmlCompute(
    name=compute_name,
    description="An AML compute cluster",
    size="Standard_DS3_V2",
    min_instances=0,
    max_instances=3,
    idle_time_before_scale_down=120,
)  # 120 seconds

workspace_ml_client.begin_create_or_update(compute_cluster)

### 2. Pick a model to deploy

Browse models in the Model Catalog in the AzureML Studio, filtering by the `text-classification` task. In this example, we use the `microsoft-deberta-base-mnli` model. If you have opened this notebook for a different model, replace the model name and version accordingly. 

In [None]:
model_name = "microsoft-deberta-base-mnli"
model_version = "1"
foundation_model = registry_ml_client.models.get(model_name, model_version)
print(
    f"Using model name: {foundation_model.name}, version: {foundation_model.version}, id: {foundation_model.id} for inferencing."
)

### 3. Prepare data for inference.

A copy of the MNLI is available in the [ glue-mnli](./glue-mnli/) folder. The next few cells show basic data preparation:
* Visualize some data rows
* Replace numerical categories in data with the actual string labels. This mapping is available in the [./glue-mnli-dataset/label.json](./glue-mnli-dataset/label.json). This step is needed because the selected models will return labels such `CONTRADICTION`, `CONTRADICTION`, etc. when running prediction. If the labels in your ground truth data are left as `0`, `1`, `2`, etc., then they would not match with prediction labels returned by the models.
* The dataset contains `premise` and `hypothesis` as two different columns. However, the models expect a single string for prediction in the format `[CLS] <premise text> [SEP] <hypothesis text> [SEP]`. Hence we merge the columns and drop the original columns.
* We want this sample to run quickly, so save a smaller dataset containing a fraction of the original.
* Since we are using a `mlflow` model, we don't need to write any inference code. However, we need the inference data to be in a shape can can be used for inference. Specifically, batch inference does not support jsonl lines files, but supports `csv` and `parquet`. We will dump a csv version from the pandas dataframe. Next, the rows of the batch inference csv file must strictly contain only the columns that will be passed to the model as input and the column header must match the model signature. In our case, the model signature which can be found in the `MLmodel` file in the model artifacts expects `input_string` as input. 

In [None]:
# Define directories and filenames as variables
dataset_dir = "glue-mnli-dataset"
training_datafile = "train.jsonl"
label_datafile = "label.json"

batch_dir = "batch"
batch_inputs_dir = os.path.join(batch_dir, "inputs")
batch_input_file = "batch_input.csv"
os.makedirs(batch_dir, exist_ok=True)
os.makedirs(batch_inputs_dir, exist_ok=True)

In the below cell, we load the input file and look at some sample data 

In [None]:
# Load the ./glue-mnli-dataset/train.jsonl file into a pandas dataframe and show the first 5 rows
pd.set_option(
    "display.max_colwidth", 0
)  # Set the max column width to 0 to display the full text
train_df = pd.read_json(os.path.join(".", dataset_dir, training_datafile), lines=True)
train_df.head()

Replace numerical labels with string labels and drop the columns not needed.

In [None]:
# Load the id2label json element of the label.json file into pandas table with keys as 'label' column of int64 type and values as 'label_string' column as string type
with open(os.path.join(dataset_dir, label_datafile)) as f:
    id2label = json.load(f)
    id2label = id2label["id2label"]
    label_df = pd.DataFrame.from_dict(
        id2label, orient="index", columns=["label_string"]
    )
    label_df["label"] = label_df.index.astype("int64")
    label_df = label_df[["label", "label_string"]]

# Join the train, validation and test dataframes with the id2label dataframe to get the label_string column
train_df = train_df.merge(label_df, on="label", how="left")
# Concat the premise and hypothesis columns to with "[CLS]" in the beginning and "[SEP]" in the middle and end to get the text column
train_df["text"] = train_df.apply(
    lambda row: "[CLS] " + row.premise + " [SEP] " + row.hypothesis + " [SEP]", axis=1
)
# Drop the idx, premise and hypothesis columns as they are not needed
train_df.drop(columns=["idx", "premise", "hypothesis", "label"], inplace=True)
# Rename the label_string column to ground_truth_label
train_df.rename(columns={"label_string": "ground_truth_label"}, inplace=True)

# Save the train_df dataframe to a jsonl file in the ./glue-mnli-dataset/batch folder with the `cls_sep_` prefix
cls_sep_datafile = os.path.join(batch_dir, "cls_sep_" + training_datafile)
train_df.to_json(cls_sep_datafile, orient="records", lines=True)
train_df.head()

Save a fraction of the input data to files of smaller batches for testing. The MLflow model's signature specifies the input should be a column named `"input_string"`, so rename the transformed `"text"` column.  

In [None]:
batch_df = train_df[["text"]].rename(columns={"text": "input_string"}).sample(frac=0.05)

# Divide this into files of 100 rows each
batch_size_per_predict = 100
for i in range(0, len(batch_df), batch_size_per_predict):
    j = i + batch_size_per_predict
    batch_df[i:j].to_csv(
        os.path.join(batch_inputs_dir, str(i) + batch_input_file), quoting=csv.QUOTE_ALL
    )

# Check out the first and last file name created
input_files = os.listdir(batch_inputs_dir)
print(f"{input_files[0]} to {str(i)}{batch_input_file}.")

### 4. Deploy the model to a batch endpoint
Batch endpoints are endpoints that are used to do batch inferencing on large volumes of data over a period of time. The endpoints receive pointers to data and run jobs asynchronously to process the data in parallel on compute clusters. Batch endpoints store outputs to a data store for further analysis. For more information on batch endpoints and deployments see [What are batch endpoints?](https://learn.microsoft.com/en-us/azure/machine-learning/concept-endpoints?view=azureml-api-2#what-are-batch-endpoints).

* Create a batch endpoint.
* Create a batch deployment.
* Set the deployment as default; doing so allows invoking the endpoint without specifying the deployment's name.

#### Create the endpoint.

In [None]:
# Endpoint names need to be unique in a region, hence using timestamp to create unique endpoint name
timestamp = int(time.time())
endpoint_name = "text-classification-" + str(timestamp)

endpoint = BatchEndpoint(
    name=endpoint_name,
    description="Batch endpoint for "
    + foundation_model.name
    + ", for text-classification task",
)
workspace_ml_client.begin_create_or_update(endpoint).result()

#### Create the deployment.

In [None]:
deployment_name = "demo"

deployment = BatchDeployment(
    name=deployment_name,
    endpoint_name=endpoint_name,
    model=foundation_model.id,
    compute=compute_name,
    error_threshold=0,
    instance_count=1,
    logging_level="info",
    max_concurrency_per_instance=1,
    mini_batch_size=10,
    output_file_name="predictions.csv",
    retry_settings=BatchRetrySettings(max_retries=3, timeout=300),
)
workspace_ml_client.begin_create_or_update(deployment).result()

#### Set the deployment as default.

In [None]:
endpoint = workspace_ml_client.batch_endpoints.get(endpoint_name)
endpoint.defaults.deployment_name = deployment_name
workspace_ml_client.begin_create_or_update(endpoint).wait()

endpoint = workspace_ml_client.batch_endpoints.get(endpoint_name)
print(f"The default deployment is {endpoint.defaults.deployment_name}")

### 5. Run a batch inference job.
Invoke the batch endpoint with the input parameter pointing to the folder containing the batch inference input. This creates a pipeline job using the default deployment in the endpoint. Wait for the job to complete.

In [None]:
input = Input(path=batch_inputs_dir, type=AssetTypes.URI_FOLDER)

job = workspace_ml_client.batch_endpoints.invoke(
    endpoint_name=endpoint.name, input=input
)

workspace_ml_client.jobs.stream(job.name)

### 6. Review inference predictions. 
Download the predictions from the job output and review the predictions using a dataframe.

In [None]:
scoring_job = list(workspace_ml_client.jobs.list(parent_job_name=job.name))[0]

workspace_ml_client.jobs.download(
    name=scoring_job.name, download_path=batch_dir, output_name="score"
)

predictions_file = os.path.join(batch_dir, "named-outputs", "score", "predictions.csv")

# Load the batch predictions file with no headers into a dataframe and set your column names
score_df = pd.read_csv(
    predictions_file,
    header=None,
    names=["row_number_per_file", "prediction", "batch_input_file_name"],
)
score_df.head()

Record the input file name and set the original index value in the `'index'` column for each input file. Join the `train_df` with ground truth into the input dataframe.

In [None]:
input_df = []
for file in input_files:
    input = pd.read_csv(os.path.join(batch_inputs_dir, file), index_col=0)
    input.reset_index(inplace=True)
    input["batch_input_file_name"] = file
    input.reset_index(names=["row_number_per_file"], inplace=True)
    input_df.append(input)
input_df = pd.concat(input_df)
input_df.set_index("index", inplace=True)
input_df = input_df.join(train_df).drop(columns=["input_string"])

input_df.head()

Join the predictions with input data to compare them to ground truth.

In [None]:
df = pd.merge(
    input_df, score_df, how="inner", on=["row_number_per_file", "batch_input_file_name"]
)

# Show the first few rows of the results
df.head(20)

### 7. Clean up resources
Batch endpoints use compute resources only when jobs are submitted. You can keep the batch endpoint for your reference without worrying about compute bills, or choose to delete the endpoint. If you created your compute cluster to have zero minimum instances and scale down soon after being idle, you won't be charged for an unused compute.

In [None]:
workspace_ml_client.batch_endpoints.begin_delete(name=endpoint_name).result()
workspace_ml_client.compute.begin_delete(name=compute_name).result()