## Multilingual chatbot using E5 multilingual embeddings model and Meta llama2 7-b chat LLM in Sagemaker Jumpstart


---

#### This notebook has been tested in us-east-1. Use Data Science 3.0 Image


---

In this notebook we will demonstrate how to use [**Llama-2-7b Chat**](https://ai.meta.com/llama/) to answer questions using a library of documents in 3 different languages as a reference, by using document embeddings and retrieval. Unlike other RAG solutions, embeddings will be generated and combined with the embedding model, Multilinigual E5 Large [Multilingual E5 Large](https://huggingface.co/intfloat/multilingual-e5-large) to identify the nearest neighbors, all from a single endpoint in this solution. The source documents are in 3 languages: English, Spanish and Italian. The documents are Amazon SageMaker FAQs sourced from [Amazon SageMaker FAQ](https://aws.amazon.com/sagemaker/faqs/?nc1=h_ls)


To perform inference on the [Llama models](https://ai.meta.com/llama/), you need to pass custom_attributes='accept_eula=true' as part of header. This means you have read and accept the end-user-license-agreement (EULA) of the model. EULA can be found in model card description or from this [webpage](https://ai.meta.com/resources/models-and-libraries/llama-downloads/).

Note: Custom_attributes used to pass EULA are key/value pairs. The key and value are separated by '=' and pairs are separated by ';'. If the user passes the same key more than once, the last value is kept and passed to the script handler (i.e., in this case, used for conditional logic). For example, if 'accept_eula=false; accept_eula=true' is passed to the server, then 'accept_eula=true' is kept and passed to the script handler.

Other Retrieval Augmented Generation Solutions - 
- [Question Answering using LangChain and Cohere's Generate and Embedding Models from SageMaker JumpStart](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/question_answering_retrieval_augmented_generation/question_answering_Cohere%2Blangchain_jumpstart.ipynb)
- [Question Answering based on Custom Dataset](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/question_answering_retrieval_augmented_generation/question_answering_jumpstart_knn.ipynb)
- [Question Answering based on Custom Dataset with Open-sourced LangChain Library](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/question_answering_retrieval_augmented_generation/question_answering_langchain_jumpstart.ipynb)
- [Question Answering using LLama-2, Pinecone & Custom Dataset](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/question_answering_retrieval_augmented_generation/question_answering_pinecone_llama-2_jumpstart.ipynb)


## Step 1. Deploy Llama-2 7 Billion Chat Model in SageMaker JumpStart

In [None]:
#Run this cell if you are running the code in your local IDE
!pip install -qU \
    sagemaker \
    pinecone-client==2.2.1 \
    ipywidgets==7.0.0

To begin, we will initialize all of the SageMaker session variables we'll need to use throughout the walkthrough.

In [3]:
import sagemaker
from sagemaker.jumpstart.model import JumpStartModel

role = sagemaker.get_execution_role()

my_model = JumpStartModel(model_id="meta-textgeneration-llama-2-7b-f")

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


We will use a `ml.g5.4xlarge` instance to deploy our Llama-2-7 billion model. We can find pricing for all instances [here](https://aws.amazon.com/sagemaker/pricing/).

In [4]:
predictor_llm = my_model.deploy(initial_instance_count=1, instance_type="ml.g5.4xlarge")

----------------------!

#### To gain an understanding of the necessity for a retrieval-augmented generation (RAG) approach in addressing the question and answering problem, please refer to this  [question_answering_pinecone_llama-2_jumpstart.ipynb](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/question_answering_retrieval_augmented_generation/question_answering_pinecone_llama-2_jumpstart.ipynb)

## Step 2. Use Text Embeddings to identify the correct context from the documents including documents of the language of choice based on the question, and use them along with prompt and question to query LLM


We plan to use document embeddings to fetch the most relevant documents in our document knowledge library and combine them with the prompt that we provide to LLM. The main difference here is the documents and their corresponding embeddings are in English, Spanish and Italian. 


To achieve that, we will do following.

* Running a text embedding model training job. The training job will generate embeddings for dataset provided and save them along with the model. These embeddings will be utilized during inference to find the nearest neighbors for an input sentence. The nearest neighbor is based on the cosine similarity between the input sentence embedding and already computed sentence embeddings during the training job. To get more information please refer to [text-embedding-sentence-similarity.ipynb](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/question_answering_retrieval_augmented_generation/text-embedding-sentence-similarity.ipynb)
* Query the text embedding model endpoint created above to Identify top K most relevant documents based on user query
* Combine the retrieved documents with prompt and question and send them into LLM.

Note: We are saving the dataset here with the model only to get the most similar document unlike the other RAG solutions. 


Note: The retrieved document/text should be large enough to contain enough information to answer a question; but small enough to fit into the LLM prompt -- maximum sequence length of 1024 tokens. 

### To train and host on Amazon Sagemaker, we need to setup and authenticate the use of AWS services. Here, we use the execution role associated with the current notebook instance as the AWS account role with SageMaker access. It has necessary permissions, including access to your data in S3. 

In [5]:
import sagemaker, boto3, json
from sagemaker.session import Session

sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


#### We are using the **huggingface-sentencesimilarity-multilingual-e5-large** model to get embeddings. Multilingual-E5-large model is initialized from xlm-roberta-large and continually trained on a mixture of multilingual datasets. It supports 100 languages from xlm-roberta, but low-resource languages may see performance degradation.

In [6]:
model_id = "huggingface-sentencesimilarity-multilingual-e5-large"

### 2.1. Preparing Dataset

In [7]:
# In this section, we'll be fetching and prepping the Amazon_SageMaker_FAQs dataset to utilize it in finding the nearest neighbour to an input question. 
# We will input FAQs in English, Spanish and Italian and create a single data file which will be used for creating embeddings

import pandas as pd


#!aws s3 cp s3://jumpstart-cache-prod-us-west-2/training-datasets/Amazon_SageMaker_FAQs/Amazon_SageMaker_FAQs.csv Amazon_SageMaker_FAQs.csv

#Reading FAQs in English, Spanish and Italian
#English FAQs
df_faq_en = pd.read_csv("Amazon_SageMaker_FAQs.csv", header=None)
#Spanish FAQs
df_faq_es = pd.read_csv("Amazon_SageMaker_ES_FAQ.csv", header=None)
#Italian FAQs
df_faq_it = pd.read_csv("Amazon_SageMaker_IT_FAQ.csv", header=None)
#Create Single FAQ file of all languages
pd.concat([df_faq_en, df_faq_es, df_faq_it]).to_csv('Amazon_SageMaker_FAQ_Multilingual.csv', index=False)

# Preparing the Data in the required format

data = pd.read_csv("Amazon_SageMaker_FAQ_Multilingual.csv", names=["Questions", "Answers"])
data["id"] = data.index

data_req = data[["id", "Answers"]]

data_req.to_csv("data.csv", index=False, header=False)

# Uploading the data in required format to s3 Bucket
output_bucket = sess.default_bucket()
#You can modify this to your own prefix
output_prefix = "jumpstart-example-multilingual-training"

s3_output_location = f"s3://{output_bucket}/{output_prefix}/output"
training_dataset_s3_path = f"s3://{output_bucket}/{output_prefix}/data/data.csv"

!aws s3 cp data.csv {training_dataset_s3_path}


upload: ./data.csv to s3://sagemaker-us-east-1-385888608451/jumpstart-example-multilingual-training/data/data.csv


### 2.2 Getting the Embeddings for the Input data using Training Job for Multilingual FAQs and store them in the Multilingual E5 Large embeddings model.

In [8]:
from sagemaker import hyperparameters
from sagemaker.jumpstart.estimator import JumpStartEstimator


# Retrieve the default hyper-parameters for the model
hyperparameters = hyperparameters.retrieve_default(model_id=model_id, model_version="*")

# [Optional] Override default hyperparameters with custom values
# Store embeddings in model is "True" by default
# default instance 
# max_seq_length parameter is the max sequence length of the input to process by the embedding model. The default None value results in using the default max_seq_length for the model.
hyperparameters["batch_size"] = "128"
print(hyperparameters)

estimator = JumpStartEstimator(
    model_id=model_id, hyperparameters=hyperparameters, output_path=s3_output_location
)

# Launch a SageMaker Training job by passing s3 path of the data

estimator.fit({"training": f"s3://{output_bucket}/{output_prefix}/data"}, logs=True)

# Use the estimator from the previous step to deploy to a SageMaker endpoint
predictor_nn = estimator.deploy()

INFO:sagemaker:Creating training-job with name: multilingual-e5-large-2023-10-28-02-06-24-699


{'max_seq_length': 'None', 'batch_size': '128', 'store_text_with_embedding': 'True'}
2023-10-28 02:06:25 Starting - Starting the training job...
2023-10-28 02:06:42 Starting - Preparing the instances for training......
2023-10-28 02:07:54 Downloading - Downloading input data.........
2023-10-28 02:09:24 Training - Downloading the training image........................
2023-10-28 02:13:15 Training - Training image download completed. Training in progress....[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2023-10-28 02:13:45,237 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2023-10-28 02:13:45,252 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2023-10-28 02:13:45,262 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2023-10-28 02:13:45,264 sagemaker_py

INFO:sagemaker.jumpstart:No instance type selected for inference hosting endpoint. Defaulting to ml.g5.2xlarge.
INFO:sagemaker:Creating model with name: multilingual-e5-large-2023-10-28-02-16-11-553


Training seconds: 484
Billable seconds: 484


INFO:sagemaker:Creating endpoint-config with name multilingual-e5-large-2023-10-28-02-16-11-545
INFO:sagemaker:Creating endpoint with name multilingual-e5-large-2023-10-28-02-16-11-545


---------!

### 2.3. Deploy & run Inference on the model to get nearest neighbor

You can make queries to the endpoint using a JSON payload containing a batch of input texts, to find the nearest neighbors of the input text from the dataset which is provided during the training job.

* **queries:** Provide the list of inputs for which to find the closest match from the training data
* **top_k:** The number of closest match to find from the training data
* **mode:** Supply it as "nn_train_data" for getting the nearest neighbors to input queries within the dataset provided

In [9]:
from sagemaker.serializers import JSONSerializer

newline = "\n"
predictor_nn.serializer = JSONSerializer()
predictor_nn.content_type = "application/json"

payload_nearest_neighbour = {
    "queries": ["Qué es Amazon SageMaker Autopilot?"],
    "top_k": 1,
    "mode": "nn_train_data",
    "return_text": True,
}

response = predictor_nn.predict(payload_nearest_neighbour)

question = payload_nearest_neighbour["queries"][0]
answer = response[0][0]["text"]
# Relating the Input Question with the Answer
print(f"The input Question is: {question}{newline}" f"The Corresponding Answer is: {answer}")

The input Question is: Qué es Amazon SageMaker Autopilot?
The Corresponding Answer is: El piloto automático de Amazon SageMaker es la primera función de aprendizaje automático automatizado del sector que le brinda un control y una visibilidad totales de sus modelos de aprendizaje automático. SageMaker Autopilot inspecciona automáticamente los datos sin procesar, aplica procesadores de funciones, selecciona el mejor conjunto de algoritmos, entrena y ajusta varios modelos, realiza un seguimiento de su rendimiento y, a continuación, clasifica los modelos en función del rendimiento, todo ello con solo unos pocos clics. El resultado es el modelo con mejor rendimiento, que se puede implementar en una fracción del tiempo que normalmente se necesita para entrenar el modelo. Obtiene una visibilidad total de cómo se creó el modelo y qué contiene, y SageMaker Autopilot se integra con Amazon SageMaker Studio. Puede explorar hasta 50 modelos diferentes generados por SageMaker Autopilot dentro de Sa

### 2.4 Combine the retrieved documents, prompt, and question to query the LLM

Now we're ready begin querying our LLM with a **R**etrieval **A**ugmented **G**eneration (RAG) pipeline. Let's see how this will work step-by-step first.

In [10]:
# Get the nearest neighbour for an input question
question = "Qué es Amazon SageMaker Autopilot?"

payload_nearest_neighbour = {
    "queries": [question],
    "top_k": 2,
    "mode": "nn_train_data",
    "return_text": True,
}

response = predictor_nn.predict(payload_nearest_neighbour)[0]

# We get multiple relevant contexts here. We can use these to contruct a single `context` to feed into our LLM prompt.
contexts = [ans["text"] for ans in response]

In [11]:
max_section_len = 1000
separator = "\n"

from typing import List


def construct_context(contexts: List[str]) -> str:
    chosen_sections = []
    chosen_sections_len = 0

    for text in contexts:
        text = text.strip()
        # Add contexts until we run out of space.
        chosen_sections_len += len(text) + 2
        if chosen_sections_len > max_section_len:
            break
        chosen_sections.append(text)
    concatenated_doc = separator.join(chosen_sections)
    print(
        f"With maximum sequence length {max_section_len}, selected top {len(chosen_sections)} document sections: \n{concatenated_doc}"
    )
    return concatenated_doc

In [12]:
context_str = construct_context(contexts=contexts)

With maximum sequence length 1000, selected top 0 document sections: 



#### Create payload function. Based on the language of the input question ( English, Spanish, or Italian) we are using Amazon Comprehend to idenitfy the language so the prompt for llama2 can be crafted dynamically to ask it to respond in the same language as the question)

In [13]:
def create_payload(question, context_str) -> dict:
    
    #Use Amazon Comprehend to detect the language of the question
    session = boto3.Session()
    comprehend_client = session.client(service_name="comprehend")
    response = comprehend_client.detect_dominant_language(Text=question)
    languages = response["Languages"]
    lang_code = languages[0]["LanguageCode"]
    print(lang_code)
    
    #Select which language to prompt llama2 to respond in based on detected language by Amazon Comprehend)
    if lang_code == "en":
        language = "Respond in English."
    if lang_code == "es":
        language = "Responder en español."
    if lang_code == "it":
        language = "Rispondi in italiano."
    print(language)
    
    #Create Prompt template for llama2
    prompt_template = """Your are a friendly multilingual Assistant chatbot. You can speak in English, Spanish and Italian. Answer the following QUESTION based only on the CONTEXT given.Respond using the language of the QUESTION and the CONTEXT.If you do not know the answer and the CONTEXT doesn't 
    contain the answer truthfully say "I don't know".

    CONTEXT:
    {context}


    ANSWER:
    """
    #Frame a new question which dynamically adds which language the question needs to be answered in
    new_question = language+question
    print(new_question)
    text_input = prompt_template.replace("{context}", context_str).replace("{question}", new_question)
    
    #Create payload for llama2 7b chat model.
    payload = {
        "inputs": [
            [
                {"role": "system", "content": text_input},
                {"role": "user", "content": new_question},
            ]
        ],
        "parameters": {
            "max_new_tokens": 1024,
            "top_p": 0.9,
            "temperature": 0.1,
            "return_full_text": False,
        },
    }
    return payload

In [14]:
payload = create_payload(question, context_str)
out = predictor_llm.predict(payload, custom_attributes="accept_eula=true")
generated_text = out[0]["generation"]["content"]
print(f"[Input]: {question}\n[Output]: {generated_text}")

es
Responder en español.
Responder en español.Qué es Amazon SageMaker Autopilot?
[Input]: Qué es Amazon SageMaker Autopilot?
[Output]:  ¡Hola! Amazon SageMaker Autopilot es una herramienta de aprendizaje automático de Amazon Web Services (AWS) que permite a los usuarios crear, entrenar y deployear modelos de aprendizaje automático de manera automática y sencilla.

Con Amazon SageMaker Autopilot, los usuarios pueden utilizar un conjunto de herramientas y recursos para crear modelos de aprendizaje automático de alta calidad sin tener que preocuparse por la configuración técnica detallada. La herramienta utiliza un conjunto de algoritmos de aprendizaje automático de alta calidad y técnicas de aprendizaje automático avanzadas para entrenar y mejorar los modelos.

Amazon SageMaker Autopilot es especialmente útil para los usuarios que no tienen experiencia previa en aprendizaje automático o que necesitan crear modelos de aprendizaje automático de manera rápida y sencilla para aplicaciones de

### 2.4 Let's place all of this logic into a single RAG query function:

In [15]:
def rag_query(question: str) -> str:
    # Get nearest neighbor
    payload_nearest_neighbour = {
        "queries": [question],
        "top_k": 3,
        "mode": "nn_train_data",
        "return_text": True,
    }
    response = predictor_nn.predict(payload_nearest_neighbour)[0]
    # get contexts
    contexts = [ans["text"] for ans in response]
    # build the multiple contexts string
    context_str = construct_context(contexts=contexts)
    # create our retrieval augmented prompt
    payload = create_payload(question, context_str)
    # make prediction
    out = predictor_llm.predict(payload, custom_attributes="accept_eula=true")
    final_text = out[0]["generation"]["content"]
    return final_text

### You can now ask questions in different languages in the below cells. You can tweak the parameters in the above cells associated with the RAG query and change questions below to see the responses

### We can now ask a question about Sagemaker Features in English: [Amazon SageMaker FAQ English](https://aws.amazon.com/sagemaker/faqs/?nc1=h_ls)

In [17]:
output_en = rag_query("Which open-source models are supported with Amazon SageMaker JumpStart?")
print(output_en)

With maximum sequence length 1000, selected top 2 document sections: 
Amazon SageMaker JumpStart includes 150+ pre-trained open-source models from PyTorch Hub and TensorFlow Hub. For vision tasks such as image classification and object detection, you can use models such as ResNet, MobileNet, and Single-Shot Detector (SSD). For text tasks such as sentence classification, text classification, and question answering, you can use models such as BERT, RoBERTa, and DistilBERT.
Amazon SageMaker JumpStart helps you quickly and easily get started with ML. SageMaker JumpStart provides a set of solutions for the most common use cases that can be deployed readily with just a few clicks. The solutions are fully customizable and showcase the use of AWS CloudFormation templates and reference architectures so you can accelerate your ML journey. SageMaker JumpStart also supports one-click deployment and fine-tuning of more than 150 popular open-source models such as transformer, object detection, and i

### Ask the same question in Italian. [Amazon SageMaker FAQ Italian](https://aws.amazon.com/it/sagemaker/faqs/?nc1=h_ls)

In [18]:
output_it = rag_query("Quali modelli open source sono supportati da SageMaker JumpStart?")
print(output_it)

With maximum sequence length 1000, selected top 1 document sections: 
Amazon SageMaker JumpStart include oltre 150 modelli open source preformati di PyTorch Hub e TensorFlow Hub. Per attività di visione come la classificazione delle immagini e il rilevamento di oggetti, puoi utilizzare modelli come ResNet, MobileNet e Single-Shot Detector (SSD). Per attività di testo come la classificazione delle frasi, la classificazione del testo e la risposta alle domande, è possibile utilizzare modelli come BERT, Roberta e DiStilbert.
it
Rispondi in italiano.
Rispondi in italiano.Quali modelli open source sono supportati da SageMaker JumpStart?
 Buona sera! In base al contesto fornito, SageMaker JumpStart supporta oltre 150 modelli open source di PyTorch Hub e TensorFlow Hub. Questi modelli includono:

* Per attività di visione:
	+ ResNet
	+ MobileNet
	+ Single-Shot Detector (SSD)
* Per attività di testo:
	+ BERT
	+ Roberta
	+ DiStilbert

Spero che questa risposta ti sia utile! Se hai altre domande

### Translate the response to English to see if it matches the English output

In [19]:
#Use Amazon Translate to translate the response back to english
translate = boto3.client(service_name='translate', region_name= aws_region, use_ssl=True)

result = translate.translate_text(Text=output_it, 
            SourceLanguageCode="it", TargetLanguageCode="en")
print('TranslatedText: ' + result.get('TranslatedText'))
print('SourceLanguageCode: ' + result.get('SourceLanguageCode'))
print('TargetLanguageCode: ' + result.get('TargetLanguageCode'))

TranslatedText:  Good evening! Based on the context provided, SageMaker JumpStart supports more than 150 open source PyTorch Hub and TensorFlow Hub models. These models include:

* For viewing activities:
 + ResNet
 + MobileNet
 + Single-Shot Detector (SSD)
* For text activities:
 + BERT
 + Roberta
 + by Stilbert

Hope this answer is helpful to you! If you have any other questions, don't hesitate to ask.
SourceLanguageCode: it
TargetLanguageCode: en


### Ask the same question in Spanish. [Amazon SageMaker FAQ Spanish](https://aws.amazon.com/es/sagemaker/faqs/?nc1=h_ls)

In [20]:
output_es = rag_query("¿Qué modelos de código abierto son compatibles con Amazon SageMaker JumpStart?")
print(output_es)

With maximum sequence length 1000, selected top 1 document sections: 
Amazon SageMaker JumpStart incluye más de 150 modelos de código abierto previamente entrenados de PyTorch Hub y TensorFlow Hub. Para tareas de visión, como la clasificación de imágenes y la detección de objetos, puede utilizar modelos como ResNet, MobileNet y Single-Shot Detector (SSD). Para tareas de texto, como la clasificación de oraciones, la clasificación de textos y la respuesta a preguntas, puede utilizar modelos como BERT, RoberTA y DistiLbert.
es
Responder en español.
Responder en español.¿Qué modelos de código abierto son compatibles con Amazon SageMaker JumpStart?
 ¡Hola! Según el contexto, Amazon SageMaker JumpStart incluye más de 150 modelos de código abierto previamente entrenados de PyTorch Hub y TensorFlow Hub. Estos modelos son compatibles con tareas de visión, como la clasificación de imágenes y la detección de objetos, y tareas de texto, como la clasificación de oraciones, la clasificación de texto

### Translate the Response in to English to compare with English response

In [21]:
#Use Amazon Translate to translate the answer to English to check accuracy.
translate = boto3.client(service_name='translate', region_name= aws_region, use_ssl=True)

result = translate.translate_text(Text=output_es, 
            SourceLanguageCode="es", TargetLanguageCode="en")
print('TranslatedText: ' + result.get('TranslatedText'))
print('SourceLanguageCode: ' + result.get('SourceLanguageCode'))
print('TargetLanguageCode: ' + result.get('TargetLanguageCode'))

TranslatedText:  Hello! Depending on the context, Amazon SageMaker JumpStart includes more than 150 pre-trained open source models from PyTorch Hub and TensorFlow Hub. These models are compatible with vision tasks, such as classifying images and detecting objects, and text tasks, such as classifying sentences, classifying texts and answering questions. Some of the models mentioned in the context are:

* ResNet
* MobileNet
* Single-Shot Detector (SSD)
* BERT
* Roberta
* by Stilbert

I hope this information is useful. If you need anything else, don't hesitate to ask!
SourceLanguageCode: es
TargetLanguageCode: en


#### Check how closely the answers match for the same question. The accuracy and performance can be improved with finetuning the llama2 model on specific languages for specific tasks.

## Step 3: Clean up the endpoint

In [None]:
predictor.delete_model()
predictor.delete_endpoint()

#### Notebook CI Test Results

This notebook was tested in us-east-1. 

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/introduction_to_amazon_algorithms|jumpstart-foundation-models|question_answerIng_retrieval_augmented_generation_jumpstart|question_answering_text_embedding_llama-2_jumpstart.ipynb)

