## Text Classification Inference using Batch Endpoints

This sample shows how to run inference in batch model for `text-classification` task.

### Task
`text-classification` is generic task type that can be used for scenarios such as sentiment analysis, emotion detection, grammar checking, spam filtering, etc. In this example, we will test for entailment v/s contradiction, meaning given a premise sentence and a hypothesis sentence, the task is to predict whether the premise entails the hypothesis (entailment), contradicts the hypothesis (contradiction), or neither (neutral). 

### Inference data
The Multi-Genre Natural Language Inference Corpus, or MNLI is a crowd sourced collection of sentence pairs with textual entailment annotations.The [MNLI](https://huggingface.co/datasets/glue) dataset is a subset of the larger [General Language Understanding Evaluation](https://gluebenchmark.com/) dataset. A copy of this dataset is available in the [glue-mnli](./glue-mnli/) folder.

### Model
Look for models tagged with `text-classification` in the system registry. Just looking for `text-classification` is not sufficient, you need to check if the model is specifically finetuned for  entailment v/s contradiction by studying the model card and looking at the input/output samples or signatures of the model. In this notebook, we use the `microsoft-deberta-base-mnli` model.

  
### Outline
* Setup pre-requisites.
* Pick a model to deploy.
* Prepare data for inference. 
* Deploy the model for batch inference.
* Run a batch inference job.
* Review inference predictions.
* Clean up resources.


### 1. Setup pre-requisites
* Install dependencies
* Connect to AzureML Workspace. Learn more at [set up SDK authentication](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk). Replace  `<WORKSPACE_NAME>`, `<RESOURCE_GROUP>` and `<SUBSCRIPTION_ID>` below.
* Check or create compute.
* Connect to `azureml` system registry

In [None]:
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential, ClientSecretCredential
from azure.ai.ml.entities import AmlCompute
import time

try:
    credential = DefaultAzureCredential()
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    credential = InteractiveBrowserCredential()

workspace_ml_client = MLClient(
        credential,
        subscription_id =  "<SUBSCRIPTION_ID>",
        resource_group_name =  "<RESOURCE_GROUP>",
        workspace_name =  "<WORKSPACE_NAME>"
    )
# the models, fine tuning pipelines and environments are available in the AzureML system registry, "azureml-preview"
registry_ml_client = MLClient(credential, registry_name="azureml-preview-test1")

compute_cluster = "gpu-cluster-big"
try:
    compute = workspace_ml_client.compute.get(compute_cluster)
except Exception as ex:
    compute = AmlCompute(
        name=compute_cluster,
        size="Standard_ND40rs_v2",
        max_instances=2,  # For multi node training set this to an integer value more than 1
    )
    workspace_ml_client.compute.begin_create_or_update(compute).wait()

# genrating a unique timestamp that can be used for names and versions that need to be unique
timestamp = str(int(time.time())) 


### 2. Pick a model to deploy

Browse models in the Model Catalog in the AzureML Studio, filtering by the `text-classification` task. In this example, we use the `microsoft-deberta-base-mnli` model. If you have opened this notebook for a different model, replace the model name and version accordingly. 

In [None]:
model_name = "microsoft-deberta-base-mnli"
model_version = "6"
foundation_model=registry_ml_client.models.get(model_name, model_version)
print ("\n\nUsing model name: {0}, version: {1}, id: {2} for fine tuning".format(foundation_model.name, foundation_model.version, foundation_model.id))

### 3. Prepare data for inference.

A copy of the MNLI is available in the [ glue-mnli](./glue-mnli/) folder. The next few cells show basic data preparation:
* Visualize some data rows
* Replace numerical categories in data with the actual string labels. This mapping is available in the [./glue-mnli/label.json](./glue-mnli/label.json). This step is needed because the selected models will return labels such `CONTRADICTION`, `CONTRADICTION`, etc. when running prediction. If the labels in your ground truth data are left as `0`, `1`, `2`, etc., then they would not match with prediction labels returned by the models.
* The dataset contains `premise` and `hypothesis` as two different columns. However, the models expect a single string for prediction in the format `[CLS] <premise text> [SEP] <hypothesis text> [SEP]`. Hence we merge the columns and drop the original columns.
* We want this sample to run quickly, so save smaller dataset containing 10% of the original. 
* Since we are using a `mlflow` model, we don't need to write any inference code. However, we need the inference data to be in a shape can can be used for inference. Specifically, batch inference does not support jsonl lines files, but supports `csv` and `parquet`. We will dump a csv version from the pandas dataframe. Next, the rows of the batch inference csv file must strictly contain only the columns that will be passed to the model as input and the column header must match the model signature. In our case, the model signature which can be found in the `MLmodel` file in the model artifacts expects `input_string` as input. 

In the below cell, we load the input file and look at some sample data 

In [None]:

import os
dataset_dir = "./glue-mnli-dataset"
data_file="train.jsonl"
bath_sample_data_file = "small_batch_train.jsonl"
batch_dir = os.path.join(dataset_dir,"batch")
os.makedirs(batch_dir, exist_ok=True)
batch_input_file = "batch_input.csv"

# load the train.jsonl file into a pandas dataframe and show the first 5 rows
import pandas as pd
pd.set_option('display.max_colwidth', 0) # set the max column width to 0 to display the full text
df = pd.read_json(os.path.join(dataset_dir,data_file), lines=True)
df.head()

Replace numerical labels with string labels, drop the columns not needed and take a smaller sample

In [None]:
# load the id2label json element of the label.json file into pandas table with keys as 'label' column of int64 type and values as 'label_string' column as string type
import json
import csv
label_file="label.json"
with open(os.path.join(dataset_dir,label_file)) as f:
    id2label = json.load(f)
    id2label = id2label['id2label']
    label_df = pd.DataFrame.from_dict(id2label, orient='index', columns=['label_string'])
    label_df['label'] = label_df.index.astype('int64')
    label_df = label_df[['label', 'label_string']]

# join the train, validation and test dataframes with the id2label dataframe to get the label_string column
df =df.merge(label_df, on='label', how='left')
# concat the premise and hypothesis columns to with "[CLS]" in the beginning and "[SEP]" in the middle and end to get the text column
df['text'] = "[CLS] " + df['premise'] + " [SEP] " + df['hypothesis'] + " [SEP]"
# drop the idx, premise and hypothesis columns as they are not needed
df = df.drop(columns=['idx', 'premise', 'hypothesis', 'label'])
# rename the label_string column to ground_truth_label
df = df.rename(columns={'label_string': 'ground_truth_label'})
# get 10% of the rows so that the sample runs faster
df = df.sample(frac=0.1)
# reset index of the dataframe
df = df.reset_index(drop=True)
# save dataframe to a json lines file - we will us this file to compare output of batch inference with ground truth
df.to_json(os.path.join(dataset_dir, bath_sample_data_file), orient='records', lines=True)
df.head()

Generate the batch inference input csv file.

In [None]:
# keep only the text column in batch_df dataframe as input for batch inference data should not contain any columns that are not passed to the model
batch_df = df[['text']]
# rename text column to input_string
batch_df = batch_df.rename(columns={'text': 'input_string'})
# save the rows in batch_df dataframe to a csv file named in the batch_dir folder, containing only the batch_input_file
batch_df.to_csv(os.path.join(batch_dir, batch_input_file), index=False, quoting=csv.QUOTE_ALL)
batch_df.head()

## 4. Deploy the model for batch inference.

Batch endpoints are endpoints that are used batch score large datasets in job model. Batch endpoints receive pointers to data and run jobs asynchronously to process the data in parallel on compute clusters. Batch endpoints store outputs to a data store for further analysis.


Create a batch endpoint.

In [None]:
from azure.ai.ml.entities import BatchEndpoint, BatchDeployment, Model, AmlCompute, Data
batch_endpoint_name = "entail-contra-" + timestamp
endpoint = BatchEndpoint(
    name=batch_endpoint_name,
    description="Batch endpoint for " + foundation_model.name + ", to detect entailment v/s contradiction",
)
workspace_ml_client.batch_endpoints.begin_create_or_update(endpoint).result()

Create a batch deployment.

In [None]:
from azure.ai.ml.constants import BatchDeploymentOutputAction
from azure.ai.ml.entities import BatchRetrySettings

deployment = BatchDeployment(
    name="demo",
    description="Batch deployment for " + foundation_model.name + ", to detect entailment v/s contradiction",
    endpoint_name=endpoint.name,
    model=foundation_model,
    compute=compute_cluster,
    instance_count=1,
    max_concurrency_per_instance=2,
    mini_batch_size=10,
    output_action=BatchDeploymentOutputAction.APPEND_ROW,
    output_file_name="predictions.csv",
    retry_settings=BatchRetrySettings(max_retries=3, timeout=300),
    logging_level="info",
)
workspace_ml_client.batch_deployments.begin_create_or_update(deployment).result()

Set demo deployment as default. Default deployment is used when no deployment name is specified when invoking the batch endpoint.

In [None]:

endpoint = workspace_ml_client.batch_endpoints.get(batch_endpoint_name)
endpoint.defaults.deployment_name = "demo"
workspace_ml_client.batch_endpoints.begin_create_or_update(endpoint).result()
endpoint = workspace_ml_client.batch_endpoints.get(batch_endpoint_name)
print(f"The default deployment is {endpoint.defaults.deployment_name}")

### 5. Run a batch inference job.

Invoke the batch endpoint with the input parameter pointing to the folder containing the batch inference input. This creates a pipeline job using the default deployment in the endpoint. Wait for the job to complete.

In [None]:
from azure.ai.ml import Input
from azure.ai.ml.constants import AssetTypes
input = Input(type=AssetTypes.URI_FOLDER, path=batch_dir)
job = workspace_ml_client.batch_endpoints.invoke(endpoint_name=endpoint.name, input=input)
workspace_ml_client.jobs.stream(job.name)

### 6. Review inference predictions.

Download the predictions from the job output and review the predictions using a dataframe.

In [None]:
scoring_job = list(workspace_ml_client.jobs.list(parent_job_name=job.name))[0]
workspace_ml_client.jobs.download(name=scoring_job.name, download_path=dataset_dir, output_name="score")
predictions_file = os.path.join(dataset_dir, "named-outputs", "score", "predictions.csv")
# load the batch predictions file that has no headers into a score_df dataframe
score_df = pd.read_csv(predictions_file, header=None)
# rename column 0 as row_number, column 1 to prediction and column 2 to batch_input_file_name 
score_df.columns = ['row_number', 'prediction', 'batch_input_file_name']
score_df.head()


Join the predictions with input data to compare ground truth with predictions. 

In [None]:
# drop the row_number and batch_input_file_name columns as they are not needed
score_df = score_df.drop(columns=['row_number', 'batch_input_file_name'])
# join the df dataframe with the score_df dataframe on the index row
df = df.join(score_df)
# show the first 5 rows of the dataframe
df.head()

### 7. Clean up resources

Batch endpoints use compute resources only when jobs are submitted. You can keep the batch endpoint for your reference without worrying out compute bills, or choose to delete it.  

In [None]:
workspace_ml_client.batch_endpoints.begin_delete(batch_endpoint_name)