## Question Answering Inference using Online Endpoints

This sample shows how to deploy `question-answering` type models to an online endpoint for inference.

### Task
`question-answering` tasks return an answer given a question. There are two common types of `question-answering` tasks:

* Extractive: extract the answer from the given context.
* Abstractive: generate an answer from the context that correctly answers the question.
 
### Model
Models that can perform the `question-answering` task are tagged with `task: question-answering`. We will use the `deepset-minilm-uncased-squad2` model in this notebook. If you opened this notebook from a specific model card, remember to replace the specific model name. If you don't find a model that suits your scenario or domain, you can discover and [import models from HuggingFace hub](../../import/import-model-from-huggingface.ipynb) and then use them for inference. 

### Inference data
We will use the [SQUAD](https://huggingface.co/datasets/squad) dataset. A copy of this dataset is available in the [squad-dataset](./squad-dataset/) folder. The [original source](https://rajpurkar.github.io/SQuAD-explorer/) of dataset describes it as follows: _"Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable."_


### Outline
* Set up pre-requisites.
* Pick a model to deploy.
* Prepare data for inference. 
* Deploy the model for real time inference.
* Test the endpoint
* Clean up resources.

### 1. Set up pre-requisites
* Install dependencies
* Connect to AzureML Workspace. Learn more at [set up SDK authentication](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk). Replace  `<WORKSPACE_NAME>`, `<RESOURCE_GROUP>` and `<SUBSCRIPTION_ID>` below.
* Connect to `azureml` system registry

In [1]:
from azure.ai.ml import MLClient
from azure.identity import (
    DefaultAzureCredential,
    InteractiveBrowserCredential,
    ClientSecretCredential,
)
from azure.ai.ml.entities import AmlCompute
import time

try:
    credential = DefaultAzureCredential()
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    credential = InteractiveBrowserCredential()

workspace_ml_client = MLClient(
    credential,
    subscription_id="ea4faa5b-5e44-4236-91f6-5483d5b17d14",
    resource_group_name="amyharrispersonal",
    workspace_name="amyharris-canary",
)
# the models, fine tuning pipelines and environments are available in the AzureML system registry, "azureml-preview"
registry_ml_client = MLClient(credential, registry_name="azureml-preview")

Class FeatureStoreOperations: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class FeatureSetOperations: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class FeatureStoreEntityOperations: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


### 2. Pick a model to deploy

Browse models in the Model Catalog in the AzureML Studio, filtering by the `question-answering` task. In this example, we use the `deepset-minilm-uncased-squad2` model. If you have opened this notebook for a different model, replace the model name and version accordingly. 

In [2]:
model_name = "deepset-minilm-uncased-squad2"
model_version = "3"
foundation_model = registry_ml_client.models.get(model_name, model_version)
print(
    "\n\nUsing model name: {0}, version: {1}, id: {2} for inferencing".format(
        foundation_model.name, foundation_model.version, foundation_model.id
    )
)



Using model name: deepset-minilm-uncased-squad2, version: 3, id: azureml://registries/azureml-preview/models/deepset-minilm-uncased-squad2/versions/3 for inferencing


### 3. Prepare data for inference.

A subset of the SQUAD dataset is available in the [squad-dataset](./squad-dataset/) folder.  The next few cells show basic data preparation:
* Visualize some data rows
* Save few samples in the format that can be passed as input to the online-inference endpoint.

In [3]:
# Load the ./squad-dataset/train_100.jsonl file into a pandas dataframe and show the first 5 rows
import pandas as pd

pd.set_option(
    "display.max_colwidth", 0
)  # set the max column width to 0 to display the full text
train_df = pd.read_json("./squad-dataset/train_100.jsonl", lines=True)
train_df.head()

Unnamed: 0,id,title,context,question,answers
0,572edf54dfa6aa1500f8d489,The_Blitz,"Within the Luftwaffe, there was a more muted view of strategic bombing. The OKL did not oppose the strategic bombardment of enemy industries and or cities, and believed it could greatly affect the balance of power on the battlefield in Germany's favour by disrupting production and damaging civilian morale, but they did not believe that air power alone could be decisive. Contrary to popular belief, the Luftwaffe did not have a systematic policy of what became known as ""terror bombing"". Evidence suggests that the Luftwaffe did not adopt an official bombing policy in which civilians became the primary target until 1942.",Who believe air power alone would not be decisive?,"{'text': ['Luftwaffe'], 'answer_start': [11]}"
1,56eab22e0030b61400a35044,Political_corruption,"More recently, articles in various financial periodicals, most notably Forbes magazine, have pointed to Fidel Castro, General Secretary of the Republic of Cuba since 1959, of likely being the beneficiary of up to $900 million, based on ""his control"" of state-owned companies. Opponents of his regime claim that he has used money amassed through weapons sales, narcotics, international loans, and confiscation of private property to enrich himself and his political cronies who hold his dictatorship together, and that the $900 million published by Forbes is merely a portion of his assets, although that needs to be proven.",The $900 million Forbes said Castro took may only be what of his total assets?,"{'text': ['a portion'], 'answer_start': [565]}"
2,5726ddd95951b619008f8093,Molotov%E2%80%93Ribbentrop_Pact,"In an effort to demonstrate peaceful intentions toward Germany, on 13 April 1941, the Soviets signed a neutrality pact with Axis power Japan. While Stalin had little faith in Japan's commitment to neutrality, he felt that the pact was important for its political symbolism, to reinforce a public affection for Germany. Stalin felt that there was a growing split in German circles about whether Germany should initiate a war with the Soviet Union. Stalin did not know that Hitler had been secretly discussing an invasion of the Soviet Union since summer 1940, and that Hitler had ordered his military in late 1940 to prepare for war in the east regardless of the parties' talks of a potential Soviet entry as a fourth Axis Power.",Who was planning an invasion of the Soviet Union?,"{'text': ['Hitler'], 'answer_start': [568]}"
3,572e8fbbdfa6aa1500f8d141,Canadian_football,"During the last three minutes of a half, the penalty for failure to place the ball in play within the 20-second play clock, known as ""time count"" (this foul is known as ""delay of game"" in American football), is dramatically different from during the first 27 minutes. Instead of the penalty being 5 yards with the down repeated, the base penalty (except during convert attempts) becomes loss of down on first or second down, and 10 yards on third down with the down repeated. In addition, as noted previously, the referee can give possession to the defence for repeated deliberate time count violations on third down.",How many yards does the offense lose for a time count on third down?,"{'text': ['10'], 'answer_start': [429]}"
4,572fa22c04bcaa1900d76b1f,Printed_circuit_board,"Multi-layer printed circuit boards have trace layers inside the board. This is achieved by laminating a stack of materials in a press by applying pressure and heat for a period of time. This results in an inseparable one piece product. For example, a four-layer PCB can be fabricated by starting from a two-sided copper-clad laminate, etch the circuitry on both sides, then laminate to the top and bottom pre-preg and copper foil. It is then drilled, plated, and etched again to get traces on top and bottom layers.",Pressure is one thing you need to apply to make a multi-layer PCB; what's the other thing?,"{'text': ['heat'], 'answer_start': [159]}"


### 4. Deploy the model to an online endpoint
Online endpoints give a durable REST API that can be used to integrate with applications that need to use the model.

In [4]:
import time, sys
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    OnlineRequestSettings,
)

# Create online endpoint - endpoint names need to be unique in a region, hence using timestamp to create unique endpoint name
timestamp = int(time.time())
online_endpoint_name = "question-answering-" + str(timestamp)
# create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    description="Online endpoint for "
    + foundation_model.name
    + ", for question-answering task",
    auth_mode="key",
)
workspace_ml_client.begin_create_or_update(endpoint).wait()

In [5]:
# create a deployment
demo_deployment = ManagedOnlineDeployment(
    name="demo",
    endpoint_name=online_endpoint_name,
    model=foundation_model.id,
    instance_type="Standard_DS2_v2",
    instance_count=1,
    request_settings=OnlineRequestSettings(
        request_timeout_ms=60000,
    ),
)
workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()
endpoint.traffic = {"demo": 100}
workspace_ml_client.begin_create_or_update(endpoint).result()

Instance type Standard_DS2_v2 may be too small for compute resources. Minimum recommended compute SKU is Standard_DS3_v2 for general purpose endpoints. Learn more about SKUs here: https://learn.microsoft.com/en-us/azure/machine-learning/referencemanaged-online-endpoints-vm-sku-list
Check: endpoint question-answering-1684194665 exists
data_collector is not a known attribute of class <class 'azure.ai.ml._restclient.v2022_02_01_preview.models._models_py3.ManagedOnlineDeployment'> and will be ignored


.....................................................................................................................................

ManagedOnlineEndpoint({'public_network_access': 'Enabled', 'provisioning_state': 'Succeeded', 'scoring_uri': 'https://question-answering-1684194665.eastus2euap.inference.ml.azure.com/score', 'openapi_uri': 'https://question-answering-1684194665.eastus2euap.inference.ml.azure.com/swagger.json', 'name': 'question-answering-1684194665', 'description': 'Online endpoint for deepset-minilm-uncased-squad2, for question-answering task', 'tags': {}, 'properties': {'azureml.onlineendpointid': '/subscriptions/ea4faa5b-5e44-4236-91f6-5483d5b17d14/resourcegroups/amyharrispersonal/providers/microsoft.machinelearningservices/workspaces/amyharris-canary/onlineendpoints/question-answering-1684194665', 'AzureAsyncOperationUri': 'https://management.azure.com/subscriptions/ea4faa5b-5e44-4236-91f6-5483d5b17d14/providers/Microsoft.MachineLearningServices/locations/eastus2euap/mfeOperationsStatus/oe:c76e6446-545b-4141-80f9-e8ad59c471f2:3b65a675-9d42-4f30-a220-bb6173f18200?api-version=2022-02-01-preview'}, 'p

### 5. Test the endpoint with sample data

We will fetch some sample data from the test dataset and submit to online endpoint for inference. We will then show the display the scored labels alongside the ground truth labels

In [6]:
import json
import os

# read the ./squad-dataset/train_100.jsonl file into a pandas dataframe
df = pd.read_json("./squad-dataset/train_100.jsonl", lines=True)
# escape single and double quotes in the text column
df["question"] = df["question"].str.replace("'", "\\'").str.replace('"', '\\"')
df["context"] = df["context"].str.replace("'", "\\'").str.replace('"', '\\"')
# pick 1 random row
sample_df = df.sample(1)
# create a json object with the key as "inputs" and value as a list of question-context pairs from columns of the sample_df dataframe
test_json = {
    "inputs": {
        "question": sample_df["question"].to_list(),
        "context": sample_df["context"].to_list(),
    }
}
# save the json object to a file named sample_score.json in the ./squad-dataset folder
with open(os.path.join(".", "squad-dataset", "sample_score.json"), "w") as f:
    json.dump(test_json, f)
sample_df.head()

Unnamed: 0,id,title,context,question,answers
75,5706931c52bb891400689a8e,Black_people,"In South Africa, the period of colonization resulted in many unions and marriages between European men and African women from various tribes, resulting in mixed-race children. As the Europeans acquired territory and imposed rule over the Africans, they generally pushed mixed-race and Africans into second-class status. During the first half of the 20th century, the Afrikaaner-dominated government classified the population according to four main racial groups: Black, White, Asian (mostly Indian), and Coloured. The Coloured group included people of mixed Bantu, Khoisan, and European descent (with some Malay ancestry, especially in the Western Cape). The Coloured definition occupied an intermediary political position between the Black and White definitions in South Africa. It imposed a system of legal racial segregation, a complex of laws known as apartheid.",What does apartheid mean?,"{'text': ['a system of legal racial segregation'], 'answer_start': [791]}"


In [7]:
# score the sample_score.json file using the online endpoint with the azureml endpoint invoke method
response = workspace_ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    deployment_name="demo",
    request_file="./squad-dataset/sample_score.json",
)
print("raw response: \n", response, "\n")
# convert the json response to a pandas dataframe
response_df = pd.read_json(response, typ="series")
response_df.head()

raw response: 
 [{"0": "legal racial segregation"}] 



0    {'0': 'legal racial segregation'}
dtype: object

In [8]:
# compare the predicted answer with the actual answer
response_df = pd.DataFrame({"predicted_answer": [response_df["answer"]]})
response_df["ground_truth_answer"] = sample_df["answers"].to_list()[0]["text"]
response_df.head()

KeyError: 'answer'

### 6. Delete the online endpoint
Don't forget to delete the online endpoint, else you will leave the billing meter running for the compute used by the endpoint

In [None]:
workspace_ml_client.online_endpoints.begin_delete(name=online_endpoint_name).wait()