## Question Answering Inference using Online Endpoints

This sample shows how to deploy `question-answering` type models to an online endpoint for inference.

### Task
`question-answering` tasks return an answer given a question. There are two common types of `question-answering` tasks:

* Extractive: extract the answer from the given context.
* Abstractive: generate an answer from the context that correctly answers the question.
 
### Model
Models that can perform the `question-answering` task are tagged with `task: question-answering`. We will use the `deepset-minilm-uncased-squad2` model in this notebook. If you opened this notebook from a specific model card, remember to replace the specific model name. If you don't find a model that suits your scenario or domain, you can discover and [import models from HuggingFace hub](../../import/import_model_into_registry.ipynb) and then use them for inference. 

### Inference data
We will use the [SQUAD](https://huggingface.co/datasets/squad) dataset. The [original source](https://rajpurkar.github.io/SQuAD-explorer/) of dataset describes it as follows: _"Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable."_


### Outline
* Set up pre-requisites.
* Pick a model to deploy.
* Download and prepare data for inference. 
* Deploy the model for real time inference.
* Test the endpoint
* Clean up resources.

### 1. Set up pre-requisites
* Install dependencies
* Connect to AzureML Workspace. Learn more at [set up SDK authentication](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk). Replace  `<WORKSPACE_NAME>`, `<RESOURCE_GROUP>` and `<SUBSCRIPTION_ID>` below.
* Connect to `azureml` system registry

In [None]:
from azure.ai.ml import MLClient
from azure.identity import (
    DefaultAzureCredential,
    InteractiveBrowserCredential,
    ClientSecretCredential,
)
from azure.ai.ml.entities import AmlCompute
import time

try:
    credential = DefaultAzureCredential()
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    credential = InteractiveBrowserCredential()

workspace_ml_client = MLClient(
    credential,
    subscription_id="<SUBSCRIPTION_ID>",
    resource_group_name="<RESOURCE_GROUP>",
    workspace_name="<WORKSPACE_NAME>",
)
# The models, fine tuning pipelines and environments are available in the AzureML system registry, "azureml"
registry_ml_client = MLClient(credential, registry_name="azureml")

### 2. Pick a model to deploy

Browse models in the Model Catalog in the AzureML Studio, filtering by the `question-answering` task. In this example, we use the `deepset-minilm-uncased-squad2` model. If you have opened this notebook for a different model, replace the model name and version accordingly. 

In [None]:
model_name = "deepset-minilm-uncased-squad2"
version_list = list(registry_ml_client.models.list(model_name))
if len(version_list) == 0:
    print("Model not found in registry")
else:
    model_version = version_list[0].version
    foundation_model = registry_ml_client.models.get(model_name, model_version)
    print(
        "\n\nUsing model name: {0}, version: {1}, id: {2} for inferencing".format(
            foundation_model.name, foundation_model.version, foundation_model.id
        )
    )

### 3. Download and prepare data for inference.

The next few cells show basic data preparation:
* Visualize some data rows
* Save few samples in the format that can be passed as input to the online-inference endpoint.

In [None]:
# Download a small sample of the dataset into the ./squad-dataset directory
%run ./squad-dataset/download-dataset.py --download_dir ./squad-dataset

In [None]:
# Load the ./squad-dataset/train.jsonl file into a pandas dataframe and show the first 5 rows
import pandas as pd

pd.set_option(
    "display.max_colwidth", 0
)  # set the max column width to 0 to display the full text
train_df = pd.read_json("./squad-dataset/train.jsonl", lines=True)
train_df.head()

### 4. Deploy the model to an online endpoint
Online endpoints give a durable REST API that can be used to integrate with applications that need to use the model.

In [None]:
import time, sys
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    ProbeSettings,
)

# Create online endpoint - endpoint names need to be unique in a region, hence using timestamp to create unique endpoint name
timestamp = int(time.time())
online_endpoint_name = "question-answering-" + str(timestamp)
# create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    description="Online endpoint for "
    + foundation_model.name
    + ", for question-answering task",
    auth_mode="key",
)
workspace_ml_client.begin_create_or_update(endpoint).wait()

In [None]:
# create a deployment
demo_deployment = ManagedOnlineDeployment(
    name="demo",
    endpoint_name=online_endpoint_name,
    model=foundation_model.id,
    instance_type="Standard_DS3_v2",
    instance_count=2,
    liveness_probe=ProbeSettings(
        failure_threshold=30,
        success_threshold=1,
        timeout=2,
        period=10,
        initial_delay=1000,
    ),
    readiness_probe=ProbeSettings(
        failure_threshold=10,
        success_threshold=1,
        timeout=10,
        period=10,
        initial_delay=1000,
    ),
)
workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()
endpoint.traffic = {"demo": 100}
workspace_ml_client.begin_create_or_update(endpoint).result()

### 5. Test the endpoint with sample data

We will fetch some sample data from the test dataset and submit to online endpoint for inference. We will then show the display the scored labels alongside the ground truth labels

In [None]:
import json
import os

# read the ./squad-dataset/train.jsonl file into a pandas dataframe
df = pd.read_json("./squad-dataset/train.jsonl", lines=True)
# escape single and double quotes in the text column
df["question"] = df["question"].str.replace("'", "\\'").str.replace('"', '\\"')
df["context"] = df["context"].str.replace("'", "\\'").str.replace('"', '\\"')
# pick 1 random row
sample_df = df.sample(1)
# create a json object with the key as "inputs" and value as a list of question-context pairs from columns of the sample_df dataframe
test_json = {
    "input_data": {
        "question": sample_df["question"].to_list(),
        "context": sample_df["context"].to_list(),
    },
    "params": {},
}
# save the json object to a file named sample_score.json in the ./squad-dataset folder
with open(os.path.join(".", "squad-dataset", "sample_score.json"), "w") as f:
    json.dump(test_json, f)
sample_df.head()

In [None]:
# score the sample_score.json file using the online endpoint with the azureml endpoint invoke method
response = workspace_ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    deployment_name="demo",
    request_file="./squad-dataset/sample_score.json",
)
print("raw response: \n", response, "\n")
# convert the json response to a pandas dataframe
response_df = pd.read_json(response)
response_df.head()

In [None]:
# compare the predicted answer with the actual answer
response_df = pd.DataFrame({"predicted_answer": [response_df[0][0]]})
response_df["ground_truth_answer"] = sample_df["answers"].to_list()[0]["text"]
response_df.head()

### 6. Delete the online endpoint
Don't forget to delete the online endpoint, else you will leave the billing meter running for the compute used by the endpoint

In [None]:
workspace_ml_client.online_endpoints.begin_delete(name=online_endpoint_name).wait()