# Question Answering using Embeddings

Many use cases require text (text2text) generation models like **BloomZ** and **Flan T5** to respond to user questions with insightful answers. For example, a customer support chatbot may need to provide answers to common questions. The **BloomZ** and **Flan T5** models have picked up a lot of general knowledge in training, but we often need to ingest and use a large library of more specific information.

In this notebook we will demonstrate a method for enabling **BloomZ** and **Flan T5** to answer questions using a library of text as a reference, by using document embeddings and retrieval. We'll be using a dataset of Wikipedia articles about the 2020 Summer Olympic Games. 

### 1. Deploy Flan T5 XL

In [3]:
import sagemaker, boto3, json
from sagemaker.session import Session

sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()

In [2]:
# model_version="*" fetches the latest version of the model
model_id, model_version = "huggingface-text2text-flan-t5-xl", "*"

from sagemaker import image_uris, model_uris, script_uris, hyperparameters
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base


endpoint_name_flan_t5 = name_from_base(f"jumpstart-example-{model_id}")

inference_instance_type = "ml.p3.2xlarge"

# Retrieve the inference docker container uri. This is the base HuggingFace container image for the default model above.
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,  # automatically inferred from model_id
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=inference_instance_type,
)

# Retrieve the inference script uri. This includes all dependencies and scripts for model loading, inference handling etc.
deploy_source_uri = script_uris.retrieve(
    model_id=model_id, model_version=model_version, script_scope="inference"
)


# Retrieve the model uri.
model_uri = model_uris.retrieve(
    model_id=model_id, model_version=model_version, model_scope="inference"
)

model_inference = Model(
    image_uri=deploy_image_uri,
    model_data=model_uri,
    role=aws_role,
    predictor_cls=Predictor,
    name=endpoint_name_flan_t5,
)
# deploy the Model. Note that we need to pass Predictor class when we deploy model through Model class,
# for being able to run inference through the sagemaker API.
model_predictor_inference = model_inference.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    predictor_cls=Predictor,
    endpoint_name=endpoint_name_flan_t5,
    volume_size=30,
)

-----------!

### 2. Responses from the flan T5 XL without providing the Context

In [5]:
question = "Which instances can I use with Managed Spot Training?"

In [12]:
def query_endpoint_with_json_payload(encoded_json, endpoint_name, content_type='application/json'):
    client = boto3.client('runtime.sagemaker')
    response = client.invoke_endpoint(EndpointName=endpoint_name, ContentType=content_type, Body=encoded_json)
    return response

def parse_response_model_flan_t5(query_response):
    model_predictions = json.loads(query_response["Body"].read())
    generated_text = model_predictions["generated_texts"]
    return generated_text

In [12]:
payload = {
    "text_inputs": question, 
    "max_length":50, 
    "num_return_sequences":1, 
    "top_k":50, 
    "top_p":0.95, 
    "do_sample":True
}


query_response = query_endpoint_with_json_payload(json.dumps(payload).encode('utf-8'), endpoint_name=endpoint_name_flan_t5)

generated_texts = parse_response_model_flan_t5(query_response)
print(generated_texts)

['Spot Training allows you to create courses in .NET Core 3.0 and up, and for Azure SQL Server.']


### 3. Retrieving the most relevant context from the database for the question

#### 3.1 Deploying the model endpoint for getting the embeddings

In [8]:
model_id, model_version = "huggingface-textembedding-gpt-j-6b", "*"

endpoint_name_embed = name_from_base(f"jumpstart-example-{model_id}")


embed_model = Model(
    image_uri=deploy_image_uri,
    model_data="s3://sagemaker-jumpstart-cache-contributor-staging/jumpstart-1p/textembedding/infer-huggingface-textembedding-huggingface-textembedding-gpt-j-6b-20230320-2050-repack.tar.gz",
    role=aws_role,
    predictor_cls=Predictor,
    name=endpoint_name_embed,
)

model_predictor_embed = embed_model.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    predictor_cls=Predictor,
    endpoint_name=endpoint_name_embed
)

------------------!

In [9]:
def parse_response_text_embed(query_response):
    generated_text = []
    model_predictions = json.loads(query_response['Body'].read())
    generated_text.append(model_predictions['embedding'])
    return generated_text

#### 3.2. Preprocess the document library
We plan to use document embeddings to fetch the most relevant parts of our document library and insert them into the prompt that we provide to **Flan T5 Xl**.

Sections should be large enough to contain enough information to answer a question; but small enough to fit one or several into the **Flan T5 Xl** prompt. We find that approximately a paragraph of text is usually a good length, but you should experiment for your particular use case. In this example, we prepared the database based on the answers here - https://aws.amazon.com/sagemaker/faqs/

In [10]:
# Downloading the Database
!aws s3 cp s3://hemamsin-jump-test-pdx/data/Amazon_SageMaker_FAQs/Amazon_SageMaker_FAQs.csv Amazon_SageMaker_FAQs.csv

  from cryptography import x509
download: s3://hemamsin-jump-test-pdx/data/Amazon_SageMaker_FAQs/Amazon_SageMaker_FAQs.csv to ./Amazon_SageMaker_FAQs.csv


In [13]:
import pandas as pd

df_answers = pd.read_csv('Amazon_SageMaker_FAQs.csv',names =["Question", "Answer"])
df_answers.set_index(["Question"])

res_embed = []
for idx, row in df_answers.iterrows():
    query_response = query_endpoint_with_json_payload(row["Answer"], endpoint_name_embed, content_type="application/x-text")
    generated_embed = parse_response_text_embed(query_response)[0][0]
    res_embed.append(generated_embed)
res_embed_df = pd.DataFrame(res_embed)

In [18]:
import numpy as np
import os
import io
import sagemaker.amazon.common as smac


train_features = np.array(res_embed)

# Providing each answer embedding label of the answer
train_labels = np.array([i for i in range(len(train_features))])
#train_labels = np.array(df_answers["Answer"])

print("train_features shape = ", train_features.shape)
print("train_labels shape = ", train_labels.shape)

buf = io.BytesIO()
smac.write_numpy_to_dense_tensor(buf, train_features, train_labels)
buf.seek(0)


bucket = sess.default_bucket()  # modify to your bucket name
prefix = "Database"
key = "Amazon-SageMaker-FAQs"

boto3.resource("s3").Bucket(bucket).Object(os.path.join(prefix, "train", key)).upload_fileobj(buf)
s3_train_data = f"s3://{bucket}/{prefix}/train/{key}"
print(f"uploaded training data location: {s3_train_data}")

train_features shape =  (154, 4096)
train_labels shape =  (154,)
uploaded training data location: s3://sagemaker-us-west-2-802376408542/Database/train/Amazon-SageMaker-FAQs


In [19]:
from sagemaker.amazon.amazon_estimator import get_image_uri


def trained_estimator_from_hyperparams(s3_train_data, hyperparams, output_path):
    """
    Create an Estimator from the given hyperparams, fit to training data,
    and return a deployed predictor

    """
    # set up the estimator
    knn = sagemaker.estimator.Estimator(
        get_image_uri(boto3.Session().region_name, "knn"),
        aws_role,
        instance_count=1,
        instance_type="ml.m5.2xlarge",
        output_path=output_path,
        sagemaker_session=sess,
    )
    knn.set_hyperparameters(**hyperparams)

    # train a model. fit_input contains the locations of the train data
    fit_input = {"train": s3_train_data}
    knn.fit(fit_input)
    return knn


hyperparams = {"feature_dim": train_features.shape[1], "k": 5,"sample_size": train_features.shape[0], "predictor_type": "classifier"}
output_path = f"s3://{bucket}/{prefix}/default_example/output"
knn_estimator = trained_estimator_from_hyperparams(
    s3_train_data, hyperparams, output_path
)

The method get_image_uri has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
INFO:sagemaker:Creating training-job with name: knn-2023-03-24-14-18-29-762


2023-03-24 14:18:30 Starting - Starting the training job...
2023-03-24 14:18:44 Starting - Preparing the instances for training...
2023-03-24 14:19:30 Downloading - Downloading input data...
2023-03-24 14:19:49 Training - Downloading the training image...............
2023-03-24 14:22:30 Training - Training image download completed. Training in progress...Docker entrypoint called with argument(s): train
Running default environment configuration script
[03/24/2023 14:22:54 INFO 139628795332416] Reading default configuration from /opt/amazon/lib/python3.7/site-packages/algorithm/resources/default-conf.json: {'_kvstore': 'dist_async', '_log_level': 'info', '_num_gpus': 'auto', '_num_kv_servers': '1', '_tuning_objective_metric': '', '_faiss_index_nprobe': '5', 'epochs': '1', 'feature_dim': 'auto', 'faiss_index_ivf_nlists': 'auto', 'index_metric': 'L2', 'index_type': 'faiss.Flat', 'mini_batch_size': '5000', '_enable_profiler': 'false'}
[03/24/2023 14:22:54 INFO 139628795332416] Merging with 

In [20]:
from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import JSONDeserializer

def predictor_from_estimator(knn_estimator, instance_type, endpoint_name=None):
    knn_predictor = knn_estimator.deploy(
        initial_instance_count=1, instance_type=instance_type, endpoint_name=endpoint_name
    )
    knn_predictor.serializer = CSVSerializer()
    knn_predictor.deserializer = JSONDeserializer()
    return knn_predictor


instance_type = "ml.m4.xlarge"
endpoint_name = name_from_base(f"jumpstart-example-knn")

knn_predictor = predictor_from_estimator(
    knn_estimator, instance_type, endpoint_name=endpoint_name
)

INFO:sagemaker:Creating model with name: knn-2023-03-24-14-23-35-012
INFO:sagemaker:Creating endpoint-config with name jumpstart-example-knn-2023-03-24-14-23-35-012
INFO:sagemaker:Creating endpoint with name jumpstart-example-knn-2023-03-24-14-23-35-012


-----

### 4. Doing the inference with context

#### 4.1 Getting the Context For the Question

In [None]:
MAX_SECTION_LEN = 500
SEPARATOR = "\n* "

def construct_context(contexts) -> str:
    chosen_sections = []
    chosen_sections_len = 0
     
    for document_section in contexts:
        # Add contexts until we run out of space.        
        
        chosen_sections_len += len(document_section) + 2 
        if chosen_sections_len > MAX_SECTION_LEN:
            break
            
        chosen_sections.append(SEPARATOR + document_section.replace("\n", " "))
    # Useful diagnostic information
    print(f"Selected {len(chosen_sections)} document sections:")    
        
    return "".join(chosen_sections)

In [None]:
query_response = query_endpoint_with_json_payload(question, endpoint_name_embed, content_type="application/x-text")
question_embedding = parse_response_text_embed(query_response)[0][0]

#Getting the most relevant context using KNN
context_predictions = knn_predictor.predict(np.array(question_embedding), initial_args={"ContentType": "text/csv","Accept": "application/json; verbose=true"})['predictions'][0]["labels"]

#context_predictions = knn_predictor.predict(np.array(question_embedding), initial_args={"ContentType": "text/csv","Accept": "application/json; verbose=true"})['predictions'][0]["labels"]
context_embed_retrieve = construct_context(context_predictions)

In [None]:
query_response = query_endpoint_with_json_payload(json.dumps(payload).encode('utf-8'), endpoint_name=endpoint_name_flan_t5)

generated_texts = parse_response_model_flan_t5(query_response)
print(generated_texts)

In [None]:
prompts_flan_t5 = """Answer based on context:\n\n{context}\n\n{question}"""

input_flan_t5 = prompts_flan_t5.replace("{context}", context_embed_retrieve)
input_flan_t5 = prompts_flan_t5.replace("{question}", question)

payload = {
    "text_inputs": input_flan_t5, 
    "max_length":500, 
    "num_return_sequences":1, 
    "top_k":50, 
    "top_p":0.95, 
    "do_sample":True
}

query_response = query_endpoint_with_json_payload(json.dumps(payload).encode('utf-8'), endpoint_name=endpoint_name_flan_t5)

generated_texts = parse_response_model_flan_t5(query_response)
print(generated_texts)