# Using AI21 Contextual Answers on SageMaker through Model Packages

This sample notebook shows you how to deploy **AI21 Contextual Answers** using Amazon SageMaker.

You can provide your users with relevant, immediate answers in natural language to questions they ask about documents from your knowledge base or website. The AI21 Contextual Answers model ensures accurate and reliable question answering capabilities based entirely on the context of a specific document or article. This model is fully grounded in your provided context, therefore avoiding all factual inaccuracies including hallucinations and distortions. This means that if the answer to a question is not specified in the document, the model will indicate this, rather than returning a potentially inaccurate answer. Easily integrate this model into existing systems **without requiring any prompt engineering.**


**AI21 Contextual Answers** allows you to provide your users with relevant, immediate answers in natural language to questions they ask about documents from your knowledge base or website. It ensures accurate and reliable question answering capabilities based entirely on the context of a specific document or article. This means that if the answer to a question is not specified in the document, the model will indicate this, rather than returning a potentially inaccurate answer.
As a task-specific model, it can be easily integrated into existing systems without requiring prompt engineering: simply provide the context and the question, and the model will provide the answer.


## Pre-requisites:
1. Before running this notebook, please make sure you got this notebook from the model catalog on SageMaker AWS Management Console.
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**.
1. This notebook is intended to work with **boto3 v1.25.4** or higher.

## Contents:
1. [Select model package](#1.-Select-model-package)
1. [Create an endpoint and perform real-time inference](#2.-Create-an-endpoint-and-perform-real-time-inference)
   1. [Create an endpoint](#A.-Create-an-endpoint)
   1. [Interact with the model](#B.-Interact-with-the-model)
   1. [Ask about financial reports](#C.-Ask-about-financial-reports)
1. [Clean-up](#3.-Clean-up)
   1. [Delete the endpoint](#A.-Delete-the-endpoint)
   1. [Delete the model](#B.-Delete-the-model)


## Usage instructions
You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

## 1. Select model package
Confirm that you received this notebook from the model catalog in SageMaker AWS Management Console.

In [1]:
model_package_map = {
    "us-east-1": "arn:aws:sagemaker:us-east-1:865070037744:model-package/contextual-answers-1-1-017-c6d7d7dfcc8033d3b4b9533daa1b6a99",
    "us-east-2": "arn:aws:sagemaker:us-east-2:057799348421:model-package/contextual-answers-1-1-017-c6d7d7dfcc8033d3b4b9533daa1b6a99",
    "us-west-1": "arn:aws:sagemaker:us-west-1:382657785993:model-package/contextual-answers-1-1-017-c6d7d7dfcc8033d3b4b9533daa1b6a99",
    "us-west-2": "arn:aws:sagemaker:us-west-2:594846645681:model-package/contextual-answers-1-1-017-c6d7d7dfcc8033d3b4b9533daa1b6a99",
    "ca-central-1": "arn:aws:sagemaker:ca-central-1:470592106596:model-package/contextual-answers-1-1-017-c6d7d7dfcc8033d3b4b9533daa1b6a99",
    "eu-central-1": "arn:aws:sagemaker:eu-central-1:446921602837:model-package/contextual-answers-1-1-017-c6d7d7dfcc8033d3b4b9533daa1b6a99",
    "eu-west-1": "arn:aws:sagemaker:eu-west-1:985815980388:model-package/contextual-answers-1-1-017-c6d7d7dfcc8033d3b4b9533daa1b6a99",
    "eu-west-2": "arn:aws:sagemaker:eu-west-2:856760150666:model-package/contextual-answers-1-1-017-c6d7d7dfcc8033d3b4b9533daa1b6a99",
    "eu-west-3": "arn:aws:sagemaker:eu-west-3:843114510376:model-package/contextual-answers-1-1-017-c6d7d7dfcc8033d3b4b9533daa1b6a99",
    "eu-north-1": "arn:aws:sagemaker:eu-north-1:136758871317:model-package/contextual-answers-1-1-017-c6d7d7dfcc8033d3b4b9533daa1b6a99",
    "ap-southeast-1": "arn:aws:sagemaker:ap-southeast-1:192199979996:model-package/contextual-answers-1-1-017-c6d7d7dfcc8033d3b4b9533daa1b6a99",
    "ap-southeast-2": "arn:aws:sagemaker:ap-southeast-2:666831318237:model-package/contextual-answers-1-1-017-c6d7d7dfcc8033d3b4b9533daa1b6a99",
    "ap-northeast-2": "arn:aws:sagemaker:ap-northeast-2:745090734665:model-package/contextual-answers-1-1-017-c6d7d7dfcc8033d3b4b9533daa1b6a99",
    "ap-northeast-1": "arn:aws:sagemaker:ap-northeast-1:977537786026:model-package/contextual-answers-1-1-017-c6d7d7dfcc8033d3b4b9533daa1b6a99",
    "ap-south-1": "arn:aws:sagemaker:ap-south-1:077584701553:model-package/contextual-answers-1-1-017-c6d7d7dfcc8033d3b4b9533daa1b6a99",
    "sa-east-1": "arn:aws:sagemaker:sa-east-1:270155090741:model-package/contextual-answers-1-1-017-c6d7d7dfcc8033d3b4b9533daa1b6a99"
}

In [2]:
import json
from sagemaker import ModelPackage
from sagemaker import get_execution_role
import sagemaker as sage
import boto3

### Check the version of boto3 - must be v1.25.4 or higher
If you see a lower version number, pick another kernel to run the notebook, with Python 3.8 or above

In [3]:
boto3.__version__

'1.26.74'

### Install ai21 python SDK

In [4]:
! pip install -U "ai21[AWS]"
import ai21

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com


In [5]:
region = boto3.Session().region_name
if region not in model_package_map.keys():
    raise ("UNSUPPORTED REGION")

model_package_arn = model_package_map[region]

In [6]:
role = get_execution_role()
sagemaker_session = sage.Session()

runtime_sm_client = boto3.client("runtime.sagemaker")

## 2. Create an endpoint and perform real-time inference

### <span style='color:Blue'> How to choose the best instance for my use case?</span>
<span style='color:#0057FF'> When you create your endpoint, you need to choose the instance type to run the model on. Choosing the right instance is mainly a matter of economics. Depending on your use case, you probably want the most cost-effective instance possible. In this notebook we use one of the supported instances.</span>

<span style='color:#0057FF'>Looking for the list of all supported instances? See</span> [here](https://docs.ai21.com/docs/choosing-the-right-instance-type-for-amazon-sagemaker-models#ai21-contextual-answers).

If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html).

In [7]:
endpoint_name = "contextual-answers"

content_type = "application/json"

real_time_inference_instance_type = (
    "ml.p4d.24xlarge"    # Recommended instance
#     "ml.g5.48xlarge"   # Cheaper and faster - recommended for relatively short inputs/outputs
#     "ml.g5.12xlarge"   # Even more cheaper and faster - up to 10K characters
)

### A. Create an endpoint

In [8]:
# create a deployable model from the model package.
model = ModelPackage(
    role=role, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session
)

# Deploy the model
predictor = model.deploy(1, real_time_inference_instance_type, endpoint_name=endpoint_name, 
                         model_data_download_timeout=3600,
                         container_startup_health_check_timeout=600,
                        )

-----------------!

Once endpoint has been created, you would be able to perform real-time inference.

### B. Interact with the model

**AI21 Studio Contextual Answers model** allows you to access our high-quality question answering technology. It was designed to answer questions based on a specific document context provided by the customer. This avoids any factual issues that language models may have and makes sure the answers it provides are grounded in that context document.

This model receives document text, serving as a context, and a question and returns an answer based entirely on this context. This means that if the answer to your question is not in the document, the model will indicate it (instead of providing a false answer).

To get a sense of the model's behavior, let's use this toy example of asking what is the Eiffel tower height. Most language models will simply answer according to their training data.

This model, however, bases its answer solely on the context you provide. Let's use the following [Wikipedia paragraph](https://en.wikipedia.org/wiki/Eiffel_Tower#:~:text=The%20Eiffel%20Tower%20(%2F%CB%88a%C9%AA,from%20the%20Champ%20de%20Mars) as context, with small modifications:

In [9]:
# Actual paragraph
context = "The tower is 330 metres (1,083 ft) tall,[6] about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest human-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure in the world to surpass both the 200-metre and 300-metre mark in height. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct."

# The paragraph with manual changes of the height
false_context = "The tower is 3 metres (10 ft) tall,[6] about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest human-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure in the world to surpass both the 200-metre and 300-metre mark in height. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct."

# The paragraph with the height omitted
partial_context = "Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest human-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure in the world to surpass both the 200-metre and 300-metre mark in height. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct."

Here is what the model will say when asked the same question in each context.

With the true context, the model will answer the correct response (which is indeed indicated in the provided context):

In [10]:
# True context
response = ai21.Answer.execute(
    context=context,
    question="What is the height of the Eiffel tower?",
    ai21.SageMakerDestination(endpoint_name)
)

print(response.answer)

The Eiffel Tower is 330 metres (1,083 ft) tall.


If we feed the model with context which is false in real life, it will still answer according to provided context:

In [15]:
# False context
response = ai21.Answer.execute(
    context=false_context,
    question="What is the height of the Eiffel tower?",
    ai21.SageMakerDestination(endpoint_name)
)

print(response.answer)

The Eiffel Tower is 3 metres (10 ft) tall.


Here we omitted the first line from the context, hence providing the model a context without any information on the height. Instead of making up an answer, the model will simply say that the answer is not in the provided document:

In [12]:
# Irrelevant context
response = ai21.Answer.execute(
    context=partial_context,
    question="What is the height of the Eiffel tower?",
    ai21.SageMakerDestination(endpoint_name)
)

print(response.answer)

Answer not in document


### Important - Model Usage
The input to the model should be simple and straight-forward, no prompting needed.

**Good question:** *What is the height of the Eiffel tower?*

**Bad question:** *Please answer the following question: What is the height of the Eiffel tower?*

**Another bad question:** *You are an expert model, answer the question according to the context. What is the height of the Eiffel tower?*

### C. Ask about financial reports

The document context should be **no more than 10,000 characters**, and the question can be up to 160 characters.

Imagine you are performing research and rely on financial reports to base your findings. Let's take the following part from [JPMorgan Chase & Co. 2021 annual report](https://www.jpmorganchase.com/content/dam/jpmc/jpmorgan-chase-and-co/investor-relations/documents/annualreport-2021.pdf):

In [16]:
financial_context = """In 2020 and 2021, enormous QE — approximately $4.4 trillion, or 18%, of 2021 gross domestic product (GDP) — and enormous fiscal stimulus (which has been and always will be inflationary) — approximately $5 trillion, or 21%, of 2021 GDP — stabilized markets and allowed companies to raise enormous amounts of capital. In addition, this infusion of capital saved many small businesses and put more than $2.5 trillion in the hands of consumers and almost $1 trillion into state and local coffers. These actions led to a rapid decline in unemployment, dropping from 15% to under 4% in 20 months — the magnitude and speed of which were both unprecedented. Additionally, the economy grew 7% in 2021 despite the arrival of the Delta and Omicron variants and the global supply chain shortages, which were largely fueled by the dramatic upswing in consumer spending and the shift in that spend from services to goods. Fortunately, during these two years, vaccines for COVID-19 were also rapidly developed and distributed.
In today's economy, the consumer is in excellent financial shape (on average), with leverage among the lowest on record, excellent mortgage underwriting (even though we've had home price appreciation), plentiful jobs with wage increases and more than $2 trillion in excess savings, mostly due to government stimulus. Most consumers and companies (and states) are still flush with the money generated in 2020 and 2021, with consumer spending over the last several months 12% above pre-COVID-19 levels. (But we must recognize that the account balances in lower-income households, smaller to begin with, are going down faster and that income for those households is not keeping pace with rising inflation.)
Today's economic landscape is completely different from the 2008 financial crisis when the consumer was extraordinarily overleveraged, as was the financial system as a whole — from banks and investment banks to shadow banks, hedge funds, private equity, Fannie Mae and many other entities. In addition, home price appreciation, fed by bad underwriting and leverage in the mortgage system, led to excessive speculation, which was missed by virtually everyone — eventually leading to nearly $1 trillion in actual losses.
"""

Rather than reading the entire report, just ask what you want to know:

In [17]:
question = "Did the economy shrink after the Omicron variant arrived?"

The model will answer based on the provided report:

In [27]:
response = ai21.Answer.execute(
    context=financial_context,
    question=question,
    ai21.SageMakerDestination(endpoint_name)
)

print(response.answer)

No, the economy grew 7% in 2021 despite the arrival of the Delta and Omicron variants and the global supply chain shortages, which were largely fueled by the dramatic upswing in consumer spending and the shift in that spend from services to goods.


In addition, you can ask more complex questions, where the answer requires deductions rather than just extracting the correct sentence from the document context. This will result in abstractive, rather than extractive, answers that draw on several different parts of the document. For example, look at the following question:

In [56]:
harder_question = "Did COVID-19 eventually help the economy?"

response = ai21.Answer.execute(
    context=financial_context,
    question=harder_question,
    ai21.SageMakerDestination(endpoint_name)
)

print(response.answer)

The rapid development of vaccines for COVID-19 helped the economy recover quickly from the pandemic.


We now present the model with the following question. You may be confused to answer something based on the last paragraph without delving into the text. However, if you read the provided document context properly, you will discover that the answer does not appear there. The model handles this as expected:

In [57]:
irrelevant_question = "How did COVID-19 affect the financial crisis of 2008?"

response = ai21.Answer.execute(
    context=financial_context,
    question=irrelevant_question,
    ai21.SageMakerDestination(endpoint_name)
)

print(response.answer)

Answer not in document


### Interested in learning more?
Take a look at our [guide](https://docs.ai21.com/docs/contextual-answers-api) to understand all the capabilities of AI21 Contextual Answers model

## 3. Clean-up

### A. Delete the endpoint

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged.

In [58]:
model.sagemaker_session.delete_endpoint(endpoint_name)
model.sagemaker_session.delete_endpoint_config(endpoint_name)

### B. Delete the model

In [59]:
model.delete_model()