# Workshop: Text2Text Generation with SageMaker

Welcome to this workshop on Text2Text Generation with SageMaker. In this workshop, we will be using a pre-trained model deployed on a SageMaker endpoint to perform text-to-text generation tasks.
The workshop is divided into several sections:

1. **Setting up the environment:** In this section, we will import necessary libraries and define some helper functions.
2. **Querying the endpoint:** We will define some example input texts and use them to query the SageMaker endpoint.
3. **Advanced features:** We will explore some advanced features of the model, such as controlling the length of the generated text and the number of output sequences returned.
3. **Prompt Engineering:** We will explore some prompt engineering tactics
4. **RAG with FAISS:** We will use the langchain library to create a question answering chain and perform similarity searches on a set of documents.
5. **Cleaning up:** Finally, we will shut down the SageMaker endpoint to avoid incurring unnecessary costs.

Let's get started!

## Section 1: Introduction

In this section, we will import the necessary libraries and define some helper functions that we will use throughout the workshop.

We will be using the `json` and `boto3` libraries. The `json` library provides functions for working with JSON data, and the `boto3` library allows us to interact with AWS services, including SageMaker.

Let's start by importing these libraries.

In [2]:
import json
import boto3

Next, we will define some example input texts. These are the texts that we will use to query the SageMaker endpoint. The model will take these texts as input and return the output of the accomplished task.

In [3]:
text1 = "Translate to German:  My name is Arthur"
text2 = "A step by step recipe to make bolognese pasta:"

Now, let's define the endpoint that you have created. We will use this endpoint to query the model and get the generated text. We will also define some formatting variables for better output visualization.

The `endpoint_name` variable should be set to the name of the SageMaker endpoint that you have created. The `newline`, `bold`, and `unbold` variables are used to format the output text for better readability.

In [4]:
newline, bold, unbold = '\n', '\033[1m', '\033[0m'
endpoint_name = 'jumpstart-dft-hf-text2text-flan-t5-xxl'
embedding_endpoint_name = 'jumpstart-dft-hf-textembedding-gpt-j-6b-fp16'

Next, we will define a function to query the endpoint. This function will take the encoded text as input and return the response from the endpoint.

The `query_endpoint` function uses the `boto3` library to create a SageMaker runtime client. It then uses this client to invoke the SageMaker endpoint with the encoded text as input. The function returns the response from the endpoint.

In [5]:
def query_endpoint(encoded_text):
    client = boto3.client('runtime.sagemaker')
    response = client.invoke_endpoint(EndpointName=endpoint_name, ContentType='application/x-text', Body=encoded_text)
    return response

We will also define a function to parse the response from the endpoint. This function will extract the generated text from the response.

In [6]:
def parse_response(query_response):
    model_predictions = json.loads(query_response['Body'].read())
    generated_text = model_predictions['generated_text']
    return generated_text

Now, let's use these functions to query the endpoint with our example texts and print the generated text.

In [7]:
def get_completion(prompt):
    query_response = query_endpoint(prompt.encode('utf-8'))
    generated_text = parse_response(query_response)
    print (f"Inference:{newline}"
            f"input text: {text}{newline}"
            f"generated text: {bold}{generated_text}{unbold}{newline}")

In [8]:
for text in [text1, text2]:
    get_completion(text)

Inference:
input text: Translate to German:  My name is Arthur
generated text: [1mIch bin Arthur.[0m

Inference:
input text: A step by step recipe to make bolognese pasta:
generated text: [1mAdd the ground beef to a large skillet and cook over medium heat until browned, about[0m



### Advanced Features

The model we are using supports many advanced parameters that can be used to control the text generation process. These parameters include:

- **max_length:** This parameter controls the maximum length of the generated text. The model will generate text until the output length (which includes the input context length) reaches `max_length`.
- **num_return_sequences:** This parameter controls the number of output sequences returned by the model.
- **num_beams:** This parameter controls the number of beams used in the greedy search during text generation.
- **no_repeat_ngram_size:** This parameter ensures that a sequence of words of `no_repeat_ngram_size` is not repeated in the output sequence.
- **temperature:** This parameter controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words.
- **early_stopping:** If set to True, text generation is finished when all beam hypotheses reach the end of sentence token.
- **do_sample:** If set to True, the model will sample the next word as per the likelihood.
- **top_k:** In each step of text generation, the model will sample from only the `top_k` most likely words.
- **top_p:** In each step of text generation, the model will sample from the smallest possible set of words with cumulative probability `top_p`.
- **seed:** This parameter can be used to fix the randomized state for reproducibility.

We can specify any subset of these parameters when invoking the endpoint. In the next section, we will show an example of how to invoke the endpoint with these arguments.

In [9]:
payload = {"text_inputs":"Tell me the steps to make a pizza", "max_length":50, "num_return_sequences":3, "top_k":50, "top_p":0.95, "do_sample":True}

We will now define a function to query the endpoint with a JSON payload. This function will take the encoded JSON as input and return the response from the endpoint.

The `query_endpoint_with_json_payload` function is similar to the `query_endpoint` function we defined earlier. The difference is that this function takes a JSON payload as input instead of a text. This allows us to pass the advanced parameters to the endpoint.

In [10]:
def query_endpoint_with_json_payload(encoded_json):
    client = boto3.client('runtime.sagemaker')
    response = client.invoke_endpoint(EndpointName=endpoint_name, ContentType='application/json', Body=encoded_json)
    return response

We will also define a function to parse the response from the endpoint when multiple texts are returned. This function will extract the generated texts from the response.

The `parse_response_multiple_texts` function is similar to the `parse_response` function we defined earlier. The difference is that this function extracts the 'generated_texts' field from the JSON instead of the 'generated_text' field. This is because when we request multiple texts from the endpoint, the response contains a 'generated_texts' field with a list of generated texts.

In [11]:
def parse_response_multiple_texts(query_response):
    model_predictions = json.loads(query_response['Body'].read())
    generated_text = model_predictions['generated_texts']
    return generated_text

Now, let's use these functions to query the endpoint with our JSON payload and print the generated texts.

In [12]:
query_response = query_endpoint_with_json_payload(json.dumps(payload).encode('utf-8'))
generated_texts = parse_response_multiple_texts(query_response)
print(generated_texts)

['To make a pizza, you need a pizza stone, pizza dough, pizza sauce, cheese, and toppings. To make a pizza, you need a pizza stone, pizza dough, pizza sauce, cheese, and toppings', 'To make a pizza, you need a pizza pan, pizza sauce, cheese, and toppings.', '1 tbsp olive oil 2 tbsp pizza sauce 3 tbsp grated mozzarella cheese 4 tbsp grated Parmesan cheese 5 tbsp grated Romano cheese']


In [13]:
def my_query_endpoint(query):
    payload = {
        "text_inputs": query,
        "max_length": 5000,
        "num_return_sequences": 1,
        "top_k": 250,
        "top_p": 0.95,
        "do_sample": True,
        "temperature": 0.01
    }
    client = boto3.client('runtime.sagemaker')
    response = client.invoke_endpoint(EndpointName=endpoint_name, ContentType='application/json', Body=json.dumps(payload).encode('utf-8'))
    return response


def get_completion(query):
    return parse_response_multiple_texts(
        my_query_endpoint(query)
    )

## Section 3: Prompt Engineering

### Prompting Principles

1. Write clear and specific instructions.
2. Give the model time to “think”.

### Tactics for 'Write clear and specific instructions'.

#### Tactic 1: Use delimiters to clearly indicate distinct parts of the input

In [14]:
text = f"""
You should express what you want a model to do by \ 
providing instructions that are as clear and \ 
specific as you can possibly make them. \ 
This will guide the model towards the desired output, \ 
and reduce the chances of receiving irrelevant \ 
or incorrect responses. Don't confuse writing a \ 
clear prompt with writing a short prompt. \ 
In many cases, longer prompts provide more clarity \ 
and context for the model, which can lead to \ 
more detailed and relevant outputs.
"""
delimiter_prompt = f"Summarize the text delimited by triple backticks\into a single sentence.\n```{text}```"

get_completion(delimiter_prompt)

['Write clear prompts.']

#### Tactic 2: Ask for a structured output

In [15]:
query = '''Generate a list of 3 fictional book information and write them in the following JSON format:

[
  {
    "book_id": 1, 
    "title": "The Chronicles of Imaginia",
    "author": "Phoebe Imaginaire", 
    "genre": "Fantasy"
  },
  {
    "book_id": 2,
    "title": "Love Among the Stars",
    "author": "Stella Cosmos",
    "genre": "Science Fiction Romance"   
  },
  {
    "book_id": 3,
    "title": "Murder at Midnight Manor",
    "author": "I.M. Mysterious",
    "genre": "Mystery"
  }
]
'''

get_completion(query)

["[['The Chronicles of Imaginia', 'Love Among the Stars', 'Murder at Midnight Manor']]"]

#### Tactic 3: Ask the model to check whether conditions are satisfied

In [16]:
text_1 = f"""The sun is out today."""

prompt = f'''You will be provided with text delimited by triple quotes. 
If the text includes a mention of 'rain', write a mysterious, suspenseful story\
involving strange events that takes place on a dark, stormy night.\
If the text includes a mention of 'sun', write a lighthearted, \
feel-good story.\

\"\"\"{text_1}\"\"\"
Story:
'''

get_completion(prompt)

['The sun is out today. The sun is out today. The sun is out today.']

In [None]:
text_2 = """
The rain falls constantly
"""

prompt = f'''You will be provided with text delimited by triple quotes. 
If the text includes a mention of 'rain', write a mysterious, suspenseful story\
involving strange events that takes place on a dark, stormy night.\
If the text includes a mention of 'sun', write a lighthearted, \
feel-good story.\

\"\"\"{text_2}\"\"\"
Story:
'''

get_completion(prompt)

#### Tactic 4: "Few-shot" prompting

In [49]:
prompt = f"""
Your task is to answer in a consistent style.

<child>: Teach me about patience.

<grandparent>: The river that carves the deepest \ 
valley flows from a modest spring; the \ 
grandest symphony originates from a single note; \ 
the most intricate tapestry begins with a solitary thread.

<child>: Teach me about resilience.
"""
get_completion(prompt)

['grandparent>: The  phoenix rises from the ashes; the  oak grows back from the twig;  the grass grows back from the plowed field.']

### Tactics for 'Give the model time to “think”.'

#### Tactic 1: Specify the steps required to complete a task

In [48]:
text = """
In a charming village, siblings Jack and Jill set out on \ 
a quest to fetch water from a hilltop \ 
well. As they climbed, singing joyfully, misfortune \ 
struck—Jack tripped on a stone and tumbled \ 
down the hill, with Jill following suit. \ 
Though slightly battered, the pair returned home to \ 
comforting embraces. Despite the mishap, \ 
their adventurous spirits remained undimmed, and they \ 
continued exploring with delight.
"""
# example 1
prompt = f"""
Perform the following actions: 
1 - Summarize the following text delimited by triple \
backticks with 1 sentence.
2 - Translate the summary into French.
3 - List each name in the French summary.
4 - Output a json object that contains the following \
keys: french_summary, num_names.

Separate your answers with line breaks.

Text:
```{text}```
"""
print("\nCompletion for prompt 1:")
get_completion(prompt)


Completion for prompt 1:


[" Dans une ville charmante, les frères Jack et Jill se sont engagés  à obtenir de l'eau d'une source située  à la cime de la montagne.  Pendant que Jack chantait en joie, une misfortune  s'est produite : Jack a tripping sur un rocher et s'est tué  en boucle, avec Jill en suite.  Même si légèrement bruyées, la paire retournait  à la maison pour des hugs rafraichissants. Malgré le dommage,  leurs esprits aventuriers n'avaient pas été enfouis et  ils ont poursuivi leurs activités d'aventure avec élan. "]

#### Ask for output in a specified format

In [47]:
prompt_2 = f"""
Your task is to perform the following actions: 
1 - Summarize the following text delimited by 
  <> with 1 sentence.
2 - Translate the summary into French.
3 - List each name in the French summary.
4 - Output a json object that contains the 
  following keys: french_summary, num_names.

Use the following format:
Text: <text to summarize>
Summary: <summary>
Translation: <summary translation>
Names: <list of names in Italian summary>
Output JSON: <json with summary and num_names>

Text: <{text}>
"""
print("\nCompletion for prompt 2:")
get_completion(prompt_2)


Completion for prompt 2:


['Une recettes en étapes pour faire une pasta bolognese :>']

#### Tactic 2: Instruct the model to work out its own solution before rushing to a conclusion

In [46]:
prompt = """
Determine if the student's solution is correct or not.

Question:
I'm building a solar power installation and I need \
 help working out the financials. 
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost \ 
me a flat $100k per year, and an additional $10 / square \
foot
What is the total cost for the first year of operations 
as a function of the number of square feet.

Student's Solution:
Let x be the size of the installation in square feet.
Costs:
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 100x
Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
"""

get_completion(prompt)

['No']

#### Note that the student's solution is actually not correct.
#### We can fix this by instructing the model to work out its own solution first.

In [45]:
prompt = f"""
Your task is to determine if the student's solution \
is correct or not.
To solve the problem do the following:
- First, work out your own solution to the problem. 
- Then compare your solution to the student's solution \ 
and evaluate if the student's solution is correct or not. 
Don't decide if the student's solution is correct until 
you have done the problem yourself.

Use the following format:
Question:
```
question here
```
Student's solution:
```
student's solution here
```
Actual solution:
```
steps to work out the solution and your solution here
```
Is the student's solution the same as actual solution \
just calculated:
```
yes or no
```
Student grade:
```
correct or incorrect
```

Question:
```
I'm building a solar power installation and I need help \
working out the financials. 
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost \
me a flat $100k per year, and an additional $10 / square \
foot
What is the total cost for the first year of operations \
as a function of the number of square feet.
``` 
Student's solution:
```
Let x be the size of the installation in square feet.
Costs:
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 100x
Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
```
Actual solution:
"""
get_completion(prompt)

['yes']

## Section 4: Model Limitations: Hallucinations
- Boie is a real company, the product name is not real.

In [44]:
prompt = f"""
Tell me about AeroGlide UltraSlim Smart Toothbrush by Boie
"""
get_completion(prompt)

['AeroGlide UltraSlim Smart Toothbrush by Boie is a smart toothbrush that uses a built-in camera to detect plaque and gum disease. It also has a built-in timer that allows you to brush for a set amount of time.']

## Section 5: RAG with FAISS

Before we proceed to the next steps, let's ensure that we have the necessary libraries installed. We will need the `langchain` library for the following steps. If it's not already installed, we can install it using pip.

The `langchain` library is a Python library that provides utilities for working with large language models. It includes utilities for creating prompts, querying endpoints, parsing responses, and more. We will use this library in the following steps to interact with our SageMaker endpoint.

In [None]:
!apt update
!apt-get install libmagic-dev -y

In [None]:
!pip install --upgrade python-magic unstructured langchain faiss-cpu pandas --quiet

Now, let's import some necessary modules from the `langchain` library.

- `PromptTemplate`: This class is used to create a template for the prompts that we will pass to the language model. 
- `SagemakerEndpoint`: This class is used to interact with the SageMaker endpoint.
- `LLMContentHandler`: This class is used to handle the content that we send to and receive from the language model.
- `load_qa_chain`: This function is used to load a question-answering chain. A chain is a sequence of transformations applied to the input to generate an answer.
- `Document`: This class is used to create documents that the language model can use to find the answer to a question.
- `EmbeddingsContentHandler`: This class is used to handle the content that we send to and receive from the embedding model.
- `SagemakerEndpointEmbeddings`: This class is used to interact with the SageMaker embeddings enpoint.

In [17]:
from langchain import PromptTemplate, SagemakerEndpoint
from langchain.llms.sagemaker_endpoint import LLMContentHandler
from langchain.chains.question_answering import load_qa_chain, LLMChain
from langchain.docstore.document import Document
from langchain.embeddings.sagemaker_endpoint import EmbeddingsContentHandler
from langchain.embeddings import SagemakerEndpointEmbeddings
import json
from typing import Dict, List

We will now create a content handler for the language model to transform input to a format that the SageMaker endpoint expects and output to a form that the language model class expects. We will also define some parameters for the model.

The `ContentHandler` class is a subclass of the `LLMContentHandler` class. It defines two methods:

- `transform_input`: This method takes a prompt and a dictionary of model parameters as input, and returns the input in a format that the SageMaker endpoint expects. In this case, it converts the input to a JSON string and encodes it to bytes.
- `transform_output`: This method takes the output from the SageMaker endpoint and returns it in a form that the language model class expects. In this case, it decodes the output from bytes to a string, parses the JSON, and returns the 'generated_texts' field.

The `parameters` dictionary defines the parameters that we will use when querying the language model. These parameters control the behavior of the language model, such as the maximum length of the generated text, the number of sequences to return, and the sampling strategy.

In [18]:
parameters = {
    "max_length": 5000,
    "num_return_sequences": 1,
    "top_k": 250,
    "top_p": 0.95,
    "do_sample": True,
    "temperature": 0.01,
}

class ContentHandler(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs: Dict) -> bytes:
        input_str = json.dumps({"text_inputs": prompt, **model_kwargs})
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        return response_json['generated_texts'][0]
    


llm_content_handler = ContentHandler()
sm_llm=SagemakerEndpoint(
            endpoint_name=endpoint_name,
            region_name="eu-central-1",
            model_kwargs=parameters,
            content_handler=llm_content_handler,
        )
creative_llm=SagemakerEndpoint(
            endpoint_name=endpoint_name,
            region_name="eu-central-1",
            model_kwargs={
                "max_length": 5000,
                "num_return_sequences": 1,
                "top_k": 250,
                "top_p": 0.95,
                "do_sample": False,
                "temperature": 2.5
            },
            content_handler=llm_content_handler,
        )

Next, we will define a prompt template and load a chain. 

The prompt template is used to format the input to the language model. It accepts a set of parameters from the user that can be used to generate a prompt for a language model. 

The question answering chain is a sequence of transformations applied to the input to generate an answer.

The `PromptTemplate` class takes a template string and a list of input variables as arguments. The template string is a string that contains placeholders for the input variables. The placeholders are enclosed in curly braces `{}` and correspond to the names of the input variables. When we use the prompt template, we will replace the placeholders with the actual values of the input variables.

The `chain` function loads a chain. A chain is a sequence of transformations applied to the input to generate an answer. Chains allow us to combine multiple components together to create a single, coherent application. For example, we can create a chain that takes user input, formats it with a PromptTemplate, and then passes the formatted response to an LLM. In this case, the chain includes the language model and the prompt template.

In [19]:
prompt=PromptTemplate(
            template="Use the following pieces of context to answer the question at the end.\n{context}\nQuestion: {question}\nAnswer:",
            input_variables=["context", "question"]
        )
chain = load_qa_chain(
        llm=sm_llm,
        prompt=prompt,
    )

Now, let's test our question answering chain with a sample question and some context. The context is a list of documents that the model can use to find the answer to the question.

The `chain` function takes a dictionary as input and returns the output of the chain. The input dictionary must contain the 'input_documents' and 'question' keys. The 'input_documents' key corresponds to a list of documents that the model can use to find the answer to the question. The 'question' key corresponds to the question that we want to answer.

In [20]:
query = "Which instances can I use with Managed Spot Training in SageMaker?"

input_documents = [Document(page_content="")]

chain({"input_documents": input_documents, "question": query}, return_only_outputs=True)

{'output_text': 'Spot instances'}

Next, we will create a content handler for embeddings to transform a format that the SageMaker endpoint expects and output to a form that the embeddings class expects.

The `SagemakerEndpointEmbeddingsJumpStart` class is a subclass of the `SagemakerEndpointEmbeddings` class. It defines the `embed_documents` method, which computes document embeddings using a SageMaker Inference Endpoint. The method takes a list of texts and a chunk size as input, and returns a list of embeddings.

The `ContentHandler` class is a subclass of the `EmbeddingsContentHandler` class. It defines two methods:

- `transform_input`: This method takes a prompt and a dictionary of model parameters as input, and returns the input in a format that the SageMaker endpoint expects. In this case, it converts the input to a JSON string and encodes it to bytes.
- `transform_output`: This method takes the output from the SageMaker endpoint and returns it in a form that the embeddings class expects. In this case, it decodes the output from bytes to a string, parses the JSON, and returns the 'embedding' field.

In [21]:
class SagemakerEndpointEmbeddingsJumpStart(SagemakerEndpointEmbeddings):
    def embed_documents(self, texts: List[str], chunk_size: int = 5) -> List[List[float]]:
        """Compute doc embeddings using a SageMaker Inference Endpoint.

        Args:
            texts: The list of texts to embed.
            chunk_size: The chunk size defines how many input texts will
                be grouped together as request. If None, will use the
                chunk size specified by the class.

        Returns:
            List of embeddings, one for each text.
        """
        results = []
        _chunk_size = len(texts) if chunk_size > len(texts) else chunk_size

        for i in range(0, len(texts), _chunk_size):
            response = self._embedding_func(texts[i : i + _chunk_size])
            print
            results.extend(response)
        return results

class ContentHandler(EmbeddingsContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs={}) -> bytes:
        input_str = json.dumps({"text_inputs": prompt, **model_kwargs})
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        embeddings = response_json["embedding"]
        return embeddings
embeddings_content_handler=ContentHandler()
embeddings = SagemakerEndpointEmbeddingsJumpStart(
    endpoint_name=embedding_endpoint_name,
    region_name="eu-central-1",
    content_handler=embeddings_content_handler,
)

Now we will load the data we will embed for contextual prompting

In [22]:
from langchain.document_loaders.url import UnstructuredURLLoader

In [23]:
urls = [
    "https://aws.amazon.com/codewhisperer/faqs/",
    "https://aws.amazon.com/sagemaker/faqs/",
]
headers={"ssl_verify":"False"}
loader = UnstructuredURLLoader(urls=urls,headers=headers)

We will now install the `faiss-cpu` library

`faiss-cpu` provides efficient similarity search and clustering of dense vectors.

FAISS (Facebook AI Similarity Search) is a library developed by Facebook AI that allows for efficient similarity search and clustering of dense vectors. So, given a set of vectors(in this case a vector representation of a document i.e. an embedding), we can index them using Faiss — then using another vector (the query vector), we search for the most similar vectors within the index.

In [24]:
!pip install faiss-cpu --quiet

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


We will now create an index of our documents using the VectorstoreIndexCreator. This index will allow us to perform efficient similarity searches on our documents.

The VectorstoreIndexCreator is a utility that helps us create an index of our documents. It uses the embeddings of the documents to create the index. The embeddings are dense vectors that represent the documents. The index allows us to perform efficient similarity searches on the documents.

In [25]:
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import Chroma, AtlasDB, FAISS
from langchain.text_splitter import CharacterTextSplitter,RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader,CSVLoader

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 200,
    chunk_overlap  = 20,
    length_function = len,
    add_start_index = True,
)

index_creator = VectorstoreIndexCreator(
    vectorstore_cls=FAISS,
    embedding=embeddings,
    text_splitter = text_splitter
)
index = index_creator.from_loaders([loader])

Let's test our index by querying it with a sample question.

The `index.query` function is used to perform a similarity search on the index. It takes a question and a language model as input, and returns the most similar documents in the index. After the relevant documents are retrieved, the LLM can be used to generate a coherent and contextually relevant answer based on the retrieved documents.

In [26]:
index.query(question=query, llm=sm_llm)

'Managed Spot Training can be used with all instances supported in SageMaker.'

We will now replicate the index.query functionality step by step to illustrate what happens

we will create a document search object using the FAISS vector store and our documents. This will allow us to perform similarity searches on our documents. Using this we retrieve the top 3 most similar docs to our query.

The `FAISS.from_documents` function is used to create a FAISS vector store from our documents. The embeddings of the documents are used to create the vector store. The vector store allows us to perform efficient similarity searches on the documents.

The `docsearch.similarity_search` function is used to perform a similarity search on the documents. It takes a query and a number of results to return as input, and returns the most similar documents in the vector store. The query is converted into an embedding and this embedding is then compared with the embeddings of the documents in the vector store.

In [27]:
documents = loader.load()
splitdocuments = text_splitter.split_documents(documents)
docsearch = FAISS.from_documents(splitdocuments, embeddings)
docs = docsearch.max_marginal_relevance_search(query, k=3)
docs

[Document(page_content='Q: Which instances can I use with Managed Spot Training?\n\nManaged Spot Training can be used with all instances supported in SageMaker.\n\nQ: Which Regions are supported with Managed Spot Training?', metadata={'source': 'https://aws.amazon.com/sagemaker/faqs/', 'start_index': 46752}),
 Document(page_content='Q: When should I use Managed Spot Training?', metadata={'source': 'https://aws.amazon.com/sagemaker/faqs/', 'start_index': 45014}),
 Document(page_content='Q: Why should I use SageMaker Serverless Inference?', metadata={'source': 'https://aws.amazon.com/sagemaker/faqs/', 'start_index': 55822})]

Finally, we will use our question-answering chain to answer our query using the documents we found.

The `chain` function is used to apply our question-answering chain to our query and documents.

In [28]:
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'output_text': 'all instances'}

We have looked at two flows so far:

a.

![alt text](flow.png)

b.

![alt text](RAGflow.png)

Now let's

In [62]:
retail_data = "s3://mysagebucket-4590283737/RAGFiles/"
!aws s3 cp --recursive $retail_data rag_data

download: s3://mysagebucket-4590283737/RAGFiles/retail_items.csv to rag_data/retail_items.csv


In [63]:
import pandas as pd
df = pd.read_csv('rag_data/retail_items.csv')

processed_df=df[['description']]
processed_df['description'] = df.apply(lambda row: f"{row['name']} is a {row['style']} in the {row['category']} category. Description: {row['description']} with a Price of ${row['price']} and Current stock is {row['current_stock']}.", axis=1)





processed_df.head(5)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  processed_df['description'] = df.apply(lambda row: f"{row['name']} is a {row['style']} in the {row['category']} category. Description: {row['description']} with a Price of ${row['price']} and Current stock is {row['current_stock']}.", axis=1)


Unnamed: 0,description
0,Sans Pareil Scarf is a scarf in the apparel ca...
1,Chef Knife is a kitchen in the housewares cate...
2,Gainsboro Jacket is a jacket in the apparel ca...
3,High Definition Speakers is a speaker in the e...
4,Spiffy Sandals is a sandals in the footwear ca...


In [64]:
processed_df[['description']].to_csv("rag_data/processed_retail_data.csv", index=False)

In [65]:
retail_data_loader = CSVLoader(file_path="rag_data/processed_retail_data.csv")
retail_data_documents = retail_data_loader.load()

In [66]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap  = 20,
    length_function = len,
    add_start_index = True,
)
retail_data_index_creator = VectorstoreIndexCreator(
    vectorstore_cls=FAISS,
    embedding=embeddings,
    text_splitter=text_splitter,
)
retail_data_index = retail_data_index_creator.from_loaders([retail_data_loader])

In [67]:
retail_query="What is the price and stock of Sans Pareil scarf?"

In [68]:
chain({"input_documents": input_documents, "question": retail_query}, return_only_outputs=True)

{'output_text': 'Sans Pareil scarf is a limited edition scarf, produced in a limited edition of 450 pieces.'}

In [69]:
retail_data_index.query(question=retail_query, llm=sm_llm)

'$114.99 and Current stock is 6'

In [70]:
retail_data_docsearch = FAISS.from_documents(retail_data_documents, embeddings)
retail_data_docs = retail_data_docsearch.max_marginal_relevance_search(retail_query, k=3)
retail_data_docs

[Document(page_content='description: Sans Pareil Scarf is a scarf in the apparel category. Description: Sans pareil scarf for women with a Price of $114.99 and Current stock is 6.', metadata={'source': 'rag_data/processed_retail_data.csv', 'row': 199}),
 Document(page_content='description: Set is a set in the tools category. Description: This set is a must-have for your toolbox with a Price of $8.99 and Current stock is 13.', metadata={'source': 'rag_data/processed_retail_data.csv', 'row': 1473}),
 Document(page_content='description: Rich Soap is a bathing in the beauty category. Description: Enjoy the fragrance of this rich soap with a Price of $73.99 and Current stock is 10.', metadata={'source': 'rag_data/processed_retail_data.csv', 'row': 375})]

In [71]:
chain({"input_documents": retail_data_docs, "question": retail_query}, return_only_outputs=True)

{'output_text': '$114.99 and 6'}

## Cleanup

After you have finished with this notebook, you should clean up your AWS resources to avoid any unwanted charges. This includes deleting the SageMaker endpoint. [add cleanup steps]