## Question Answering Inference using Online Endpoints

This sample shows how to deploy `question-answering` type models to an online endpoint for inference.

### Task
`question-answering` tasks return an answer given a question. There are two common types of `question-answering` tasks:

* Extractive: extract the answer from the given context.
* Abstractive: generate an answer from the context that correctly answers the question.
 
### Model
Models that can perform the `question-answering` task are tagged with `task: question-answering`. We will use the `deepset-minilm-uncased-squad2` model in this notebook. If you opened this notebook from a specific model card, remember to replace the specific model name. If you don't find a model that suits your scenario or domain, you can discover and [import models from HuggingFace hub](../../import/import_model_into_registry.ipynb) and then use them for inference. 

### Inference data
We will use the [SQUAD](https://huggingface.co/datasets/squad) dataset. The [original source](https://rajpurkar.github.io/SQuAD-explorer/) of dataset describes it as follows: _"Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable."_


### Outline
* Set up pre-requisites.
* Pick a model to deploy.
* Download and prepare data for inference. 
* Deploy the model for real time inference.
* Test the endpoint
* Clean up resources.

### 1. Set up pre-requisites
* Install dependencies
* Connect to AzureML Workspace. Learn more at [set up SDK authentication](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk). Replace  `<WORKSPACE_NAME>`, `<RESOURCE_GROUP>` and `<SUBSCRIPTION_ID>` below.
* Connect to `azureml` system registry

In [6]:
from azure.ai.ml import MLClient
from azure.identity import (
    DefaultAzureCredential,
    InteractiveBrowserCredential,
    ClientSecretCredential,
)
from azure.ai.ml.entities import AmlCompute
import time

try:
    credential = DefaultAzureCredential()
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    credential = InteractiveBrowserCredential()

workspace_ml_client = MLClient(
    credential,
    subscription_id="72c03bf3-4e69-41af-9532-dfcdc3eefef4",
    resource_group_name="nvidia",
    workspace_name="nvidia-eus",
)
# the models, fine tuning pipelines and environments are available in the AzureML system registry, "azureml"
registry_ml_client = MLClient(credential, registry_name="azureml")

### 2. Pick a model to deploy

Browse models in the Model Catalog in the AzureML Studio, filtering by the `question-answering` task. In this example, we use the `deepset-minilm-uncased-squad2` model. If you have opened this notebook for a different model, replace the model name and version accordingly. 

In [2]:
model_name = "deepset-minilm-uncased-squad2"
version_list = list(registry_ml_client.models.list(model_name))
if len(version_list) == 0:
    print("Model not found in registry")
else:
    model_version = version_list[0].version
    foundation_model = registry_ml_client.models.get(model_name, model_version)
    print(
        "\n\nUsing model name: {0}, version: {1}, id: {2} for inferencing".format(
            foundation_model.name, foundation_model.version, foundation_model.id
        )
    )



Using model name: deepset-minilm-uncased-squad2, version: 9, id: azureml://registries/azureml/models/deepset-minilm-uncased-squad2/versions/9 for inferencing


### 3. Download and prepare data for inference.

The next few cells show basic data preparation:
* Visualize some data rows
* Save few samples in the format that can be passed as input to the online-inference endpoint.

In [3]:
# Download a small sample of the dataset into the ./squad-dataset directory
%run ./squad-dataset/download-dataset.py --download_dir ./squad-dataset

  from .autonotebook import tqdm as notebook_tqdm
Downloading builder script: 100%|██████████| 5.27k/5.27k [00:00<00:00, 1.34MB/s]
Downloading metadata: 100%|██████████| 2.36k/2.36k [00:00<00:00, 590kB/s]
Downloading readme: 100%|██████████| 7.67k/7.67k [00:00<00:00, 2.46MB/s]


In [4]:
# Load the ./squad-dataset/train.jsonl file into a pandas dataframe and show the first 5 rows
import pandas as pd

pd.set_option(
    "display.max_colwidth", 0
)  # set the max column width to 0 to display the full text
train_df = pd.read_json("./squad-dataset/train.jsonl", lines=True)
train_df.head()

Unnamed: 0,id,title,context,question,answers
0,56d4cde92ccc5a1400d83239,Beyoncé,"On December 13, 2013, Beyoncé unexpectedly released her eponymous fifth studio album on the iTunes Store without any prior announcement or promotion. The album debuted atop the Billboard 200 chart, giving Beyoncé her fifth consecutive number-one album in the US. This made her the first woman in the chart's history to have her first five studio albums debut at number one. Beyoncé received critical acclaim and commercial success, selling one million digital copies worldwide in six days; The New York Times noted the album's unconventional, unexpected release as significant. Musically an electro-R&B album, it concerns darker themes previously unexplored in her work, such as ""bulimia, postnatal depression [and] the fears and insecurities of marriage and motherhood"". The single ""Drunk in Love"", featuring Jay Z, peaked at number two on the Billboard Hot 100 chart. In April 2014, after much speculation in the weeks before, Beyoncé and Jay Z officially announced their On the Run Tour. It served as the couple's first co-headlining stadium tour together. On August 24, 2014, she received the Video Vanguard Award at the 2014 MTV Video Music Awards. Knowles also took home three competitive awards: Best Video with a Social Message and Best Cinematography for ""Pretty Hurts"", as well as best collaboration for ""Drunk in Love"". In November, Forbes reported that Beyoncé was the top-earning woman in music for the second year in a row—earning $115 million in the year, more than double her earnings in 2013. Beyoncé was reissued with new material in three forms: as an extended play, a box set, as well as a full platinum edition.",What was the name of the tour featuring both Beyoncé and Jay Z?,"{'text': ['On the Run Tour.'], 'answer_start': [974]}"
1,5733bed24776f4190066118c,University_of_Notre_Dame,"The university is the major seat of the Congregation of Holy Cross (albeit not its official headquarters, which are in Rome). Its main seminary, Moreau Seminary, is located on the campus across St. Joseph lake from the Main Building. Old College, the oldest building on campus and located near the shore of St. Mary lake, houses undergraduate seminarians. Retired priests and brothers reside in Fatima House (a former retreat center), Holy Cross House, as well as Columba Hall near the Grotto. The university through the Moreau Seminary has ties to theologian Frederick Buechner. While not Catholic, Buechner has praised writers from Notre Dame and Moreau Seminary created a Buechner Prize for Preaching.",Which prize did Frederick Buechner create?,"{'text': ['Buechner Prize for Preaching'], 'answer_start': [675]}"
2,56bfab98a10cfb140055121f,Beyoncé,"In August, the couple attended the 2011 MTV Video Music Awards, at which Beyoncé performed ""Love on Top"" and started the performance saying ""Tonight I want you to stand up on your feet, I want you to feel the love that's growing inside of me"". At the end of the performance, she dropped her microphone, unbuttoned her blazer and rubbed her stomach, confirming her pregnancy she had alluded to earlier in the evening. Her appearance helped that year's MTV Video Music Awards become the most-watched broadcast in MTV history, pulling in 12.4 million viewers; the announcement was listed in Guinness World Records for ""most tweets per second recorded for a single event"" on Twitter, receiving 8,868 tweets per second and ""Beyonce pregnant"" was the most Googled term the week of August 29, 2011.",Where did she announce her pregnancy?,"{'text': ['2011 MTV Video Music Awards'], 'answer_start': [35]}"
3,5733b496d058e614000b60d2,University_of_Notre_Dame,"The rise of Hitler and other dictators in the 1930s forced numerous Catholic intellectuals to flee Europe; president John O'Hara brought many to Notre Dame. From Germany came Anton-Hermann Chroust (1907–1982) in classics and law, and Waldemar Gurian a German Catholic intellectual of Jewish descent. Positivism dominated American intellectual life in the 1920s onward but in marked contrast, Gurian received a German Catholic education and wrote his doctoral dissertation under Max Scheler. Ivan Meštrović (1883–1962), a renowned sculptor, brought Croatian culture to campus, 1955–62. Yves Simon (1903–61), brought to ND in the 1940s the insights of French studies in the Aristotelian-Thomistic tradition of philosophy; his own teacher Jacques Maritain (1882–73) was a frequent visitor to campus.",What was Ivan Meštrović known for being?,"{'text': ['a renowned sculptor'], 'answer_start': [519]}"
4,56bec6de3aeaaa14008c9408,Beyoncé,"The Bey Hive is the name given to Beyoncé's fan base. Fans were previously titled ""The Beyontourage"", (a portmanteau of Beyoncé and entourage). The name Bey Hive derives from the word beehive, purposely misspelled to resemble her first name, and was penned by fans after petitions on the online social networking service Twitter and online news reports during competitions.","Before the Bey Hive, fans of Beyonce were called what?","{'text': ['The Beyontourage'], 'answer_start': [83]}"


### 4. Deploy the model to an online endpoint
Online endpoints give a durable REST API that can be used to integrate with applications that need to use the model.

In [7]:
import time, sys
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    ProbeSettings,
)

# Create online endpoint - endpoint names need to be unique in a region, hence using timestamp to create unique endpoint name
timestamp = int(time.time())
online_endpoint_name = "question-answering-" + str(timestamp)
# create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    description="Online endpoint for "
    + foundation_model.name
    + ", for question-answering task",
    auth_mode="key",
)
workspace_ml_client.begin_create_or_update(endpoint).wait()

In [8]:
# create a deployment
demo_deployment = ManagedOnlineDeployment(
    name="demo",
    endpoint_name=online_endpoint_name,
    model=foundation_model.id,
    instance_type="Standard_DS3_v2",
    instance_count=2,
    liveness_probe=ProbeSettings(
        failure_threshold=30,
        success_threshold=1,
        timeout=2,
        period=10,
        initial_delay=1000,
    ),
    readiness_probe=ProbeSettings(
        failure_threshold=10,
        success_threshold=1,
        timeout=10,
        period=10,
        initial_delay=1000,
    ),
)
workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()
endpoint.traffic = {"demo": 100}
workspace_ml_client.begin_create_or_update(endpoint).result()

Check: endpoint question-answering-1702482139 exists


................................................................................................................................................................................................................................................

ManagedOnlineEndpoint({'public_network_access': 'Enabled', 'provisioning_state': 'Succeeded', 'scoring_uri': 'https://question-answering-1702482139.eastus.inference.ml.azure.com/score', 'openapi_uri': 'https://question-answering-1702482139.eastus.inference.ml.azure.com/swagger.json', 'name': 'question-answering-1702482139', 'description': 'Online endpoint for deepset-minilm-uncased-squad2, for question-answering task', 'tags': {}, 'properties': {'azureml.onlineendpointid': '/subscriptions/72c03bf3-4e69-41af-9532-dfcdc3eefef4/resourcegroups/nvidia/providers/microsoft.machinelearningservices/workspaces/nvidia-eus/onlineendpoints/question-answering-1702482139', 'AzureAsyncOperationUri': 'https://management.azure.com/subscriptions/72c03bf3-4e69-41af-9532-dfcdc3eefef4/providers/Microsoft.MachineLearningServices/locations/eastus/mfeOperationsStatus/oe:c43ea705-5b7a-4258-a729-932042af3d33:40485746-993c-4832-8202-cc668c5b5afa?api-version=2022-02-01-preview'}, 'print_as_yaml': True, 'id': '/sub

### 5. Test the endpoint with sample data

We will fetch some sample data from the test dataset and submit to online endpoint for inference. We will then show the display the scored labels alongside the ground truth labels

In [9]:
import json
import os

# read the ./squad-dataset/train.jsonl file into a pandas dataframe
df = pd.read_json("./squad-dataset/train.jsonl", lines=True)
# escape single and double quotes in the text column
df["question"] = df["question"].str.replace("'", "\\'").str.replace('"', '\\"')
df["context"] = df["context"].str.replace("'", "\\'").str.replace('"', '\\"')
# pick 1 random row
sample_df = df.sample(1)
# create a json object with the key as "inputs" and value as a list of question-context pairs from columns of the sample_df dataframe
test_json = {
    "input_data": {
        "question": sample_df["question"].to_list(),
        "context": sample_df["context"].to_list(),
    },
    "params": {} 
}
# save the json object to a file named sample_score.json in the ./squad-dataset folder
with open(os.path.join(".", "squad-dataset", "sample_score.json"), "w") as f:
    json.dump(test_json, f)
sample_df.head()

Unnamed: 0,id,title,context,question,answers
2,56bfab98a10cfb140055121f,Beyoncé,"In August, the couple attended the 2011 MTV Video Music Awards, at which Beyoncé performed \""Love on Top\"" and started the performance saying \""Tonight I want you to stand up on your feet, I want you to feel the love that\'s growing inside of me\"". At the end of the performance, she dropped her microphone, unbuttoned her blazer and rubbed her stomach, confirming her pregnancy she had alluded to earlier in the evening. Her appearance helped that year\'s MTV Video Music Awards become the most-watched broadcast in MTV history, pulling in 12.4 million viewers; the announcement was listed in Guinness World Records for \""most tweets per second recorded for a single event\"" on Twitter, receiving 8,868 tweets per second and \""Beyonce pregnant\"" was the most Googled term the week of August 29, 2011.",Where did she announce her pregnancy?,"{'text': ['2011 MTV Video Music Awards'], 'answer_start': [35]}"


In [10]:
# score the sample_score.json file using the online endpoint with the azureml endpoint invoke method
response = workspace_ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    deployment_name="demo",
    request_file="./squad-dataset/sample_score.json",
)
print("raw response: \n", response, "\n")
# convert the json response to a pandas dataframe
response_df = pd.read_json(response)
response_df.head()

raw response: 
 [{"0": "MTV Video Music Awards"}] 



Unnamed: 0,0
0,MTV Video Music Awards


In [11]:
# compare the predicted answer with the actual answer
response_df = pd.DataFrame({"predicted_answer": [response_df[0][0]]})
response_df["ground_truth_answer"] = sample_df["answers"].to_list()[0]["text"]
response_df.head()

Unnamed: 0,predicted_answer,ground_truth_answer
0,MTV Video Music Awards,2011 MTV Video Music Awards


### 6. Delete the online endpoint
Don't forget to delete the online endpoint, else you will leave the billing meter running for the compute used by the endpoint

In [12]:
workspace_ml_client.online_endpoints.begin_delete(name=online_endpoint_name).wait()

......................................................................