# Workshop: Text2Text Generation with SageMaker

Welcome to this workshop on Text2Text Generation with SageMaker. In this workshop, we will be using a pre-trained model deployed on a SageMaker endpoint to perform text-to-text generation tasks.
The workshop is divided into several sections:

1. **Setting up the environment:** In this section, we will import necessary libraries and define some helper functions.
2. **Querying the endpoint:** We will define some example input texts and use them to query the SageMaker endpoint.
3. **Advanced features:** We will explore some advanced features of the model, such as controlling the length of the generated text and the number of output sequences returned.
3. **Prompt Engineering:** We will explore some prompt engineering tactics
4. **RAG with FAISS:** We will use the langchain library to create a question answering chain and perform similarity searches on a set of documents.
5. **Cleaning up:** Finally, we will shut down the SageMaker endpoint to avoid incurring unnecessary costs.

Let's get started!

## Section 1: Introduction

In this section, we will import the necessary libraries and define some helper functions that we will use throughout the workshop.

We will be using the `json` and `boto3` libraries. The `json` library provides functions for working with JSON data, and the `boto3` library allows us to interact with AWS services, including SageMaker.

Let's start by importing these libraries.

In [12]:
import json
import boto3

Next, we will define some example input texts. These are the texts that we will use to query the SageMaker endpoint. The model will take these texts as input and return the output of the accomplished task.

In [13]:
text1 = "Translate the following text to German: My name is Arthur"
text2 = "A step by step recipe to make bolognese pasta:"

Now, let's define the endpoint that you have created. We will use this endpoint to query the model and get the generated text. We will also define some formatting variables for better output visualization.

The `endpoint_name` variable should be set to the name of the SageMaker endpoint that you have created. The `newline`, `bold`, and `unbold` variables are used to format the output text for better readability.

In [14]:
newline, bold, unbold = '\n', '\033[1m', '\033[0m'
endpoint_name = 'jumpstart-dft-hf-text2text-flan-t5-xl'
embedding_endpoint_name = 'jumpstart-dft-hf-textembedding-gpt-j-6b-fp16'

Next, we will define a function to query the endpoint. This function will take the encoded text as input and return the response from the endpoint.

The `query_endpoint` function uses the `boto3` library to create a SageMaker runtime client. It then uses this client to invoke the SageMaker endpoint with the encoded text as input. The function returns the response from the endpoint.

In [15]:
def query_endpoint(encoded_text):
    client = boto3.client('runtime.sagemaker')
    response = client.invoke_endpoint(EndpointName=endpoint_name, ContentType='application/x-text', Body=encoded_text)
    return response

We will also define a function to parse the response from the endpoint. This function will extract the generated text from the response.

In [19]:
def parse_response(query_response):
    model_predictions = json.loads(query_response['Body'].read())
    generated_text = model_predictions["generated_text"]
    return generated_text

Now, let's use these functions to query the endpoint with our example texts and print the generated text.

In [20]:
def get_completion(prompt):
    query_response = query_endpoint(prompt.encode('utf-8'))
    generated_text = parse_response(query_response)
    print (f"Inference:{newline}"
            f"input text: {text}{newline}"
            f"generated text: {bold}{generated_text}{unbold}{newline}")

In [21]:
for text in [text1, text2]:
    get_completion(text)

Inference:
input text: Translate the following text to German: My name is Arthur
generated text: [1mIch bin Arthur.[0m

Inference:
input text: A step by step recipe to make bolognese pasta:
generated text: [1mIn a large saucepan, combine the ground beef, onion, garlic, tomato paste, tomato[0m



### Advanced Features

The model we are using supports many advanced parameters that can be used to control the text generation process. These parameters include:

- **max_length:** This parameter controls the maximum length of the generated text. The model will generate text until the output length (which includes the input context length) reaches `max_length`.
- **num_return_sequences:** This parameter controls the number of output sequences returned by the model.
- **num_beams:** This parameter controls the number of beams used in the greedy search during text generation.
- **no_repeat_ngram_size:** This parameter ensures that a sequence of words of `no_repeat_ngram_size` is not repeated in the output sequence.
- **temperature:** This parameter controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words.
- **early_stopping:** If set to True, text generation is finished when all beam hypotheses reach the end of sentence token.
- **do_sample:** If set to True, the model will sample the next word as per the likelihood.
- **top_k:** In each step of text generation, the model will sample from only the `top_k` most likely words.
- **top_p:** In each step of text generation, the model will sample from the smallest possible set of words with cumulative probability `top_p`.
- **seed:** This parameter can be used to fix the randomized state for reproducibility.

We can specify any subset of these parameters when invoking the endpoint. In the next section, we will show an example of how to invoke the endpoint with these arguments.

In [22]:
payload = {"text_inputs":"Tell me the steps to make a pizza:", "max_length":50, "num_return_sequences":3, "top_k":50, "top_p":0.95, "do_sample":True, "no_repeat_ngram_size":20}

We will now define a function to query the endpoint with a JSON payload. This function will take the encoded JSON as input and return the response from the endpoint.

The `query_endpoint_with_json_payload` function is similar to the `query_endpoint` function we defined earlier. The difference is that this function takes a JSON payload as input instead of a text. This allows us to pass the advanced parameters to the endpoint.

In [30]:
def query_endpoint_with_json_payload(encoded_json):
    client = boto3.client('runtime.sagemaker')
    response = client.invoke_endpoint(EndpointName=endpoint_name, ContentType='application/json', Body=encoded_json)
    return response

We will also define a function to parse the response from the endpoint when multiple texts are returned. This function will extract the generated texts from the response.

The `parse_response_multiple_texts` function is similar to the `parse_response` function we defined earlier. The difference is that this function extracts the 'generated_texts' field from the JSON instead of the 'generated_text' field. This is because when we request multiple texts from the endpoint, the response contains a 'generated_texts' field with a list of generated texts.

In [31]:
def parse_response_multiple_texts(query_response):
    model_predictions = json.loads(query_response['Body'].read())
    generated_text = model_predictions['generated_texts']
    return generated_text

Now, let's use these functions to query the endpoint with our JSON payload and print the generated texts.

In [32]:
query_response = query_endpoint_with_json_payload(json.dumps(payload).encode('utf-8'))
generated_texts = parse_response_multiple_texts(query_response)
print(generated_texts)

['To make a pizza, first gather your ingredients. Next, spread your pizza sauce on your pizza crust. Next, place your toppings on your pizza crust. Finally, place your pizza in the oven.', 'Gather the ingredients. Place the dough on a floured surface and knead it into a ball. Flatten the ball into a circle and place it on a greased baking sheet. Bake the pizza', 'Spread pizza sauce on the bottom of a large pizza pan. Spread the pizza sauce over the pizza dough. Top the pizza with pepperoni, olives, and other desired toppings. Bake the pizza at 450 degrees F for about']


In [40]:
def my_query_endpoint(query):
    payload = {
        "text_inputs": query,
        "max_length": 5000,
        "num_return_sequences": 1,
        "top_k": 50,
        "top_p": 0.95,
        "do_sample": True,
        "temperature": 0.2,
    }
    client = boto3.client('runtime.sagemaker')
    response = client.invoke_endpoint(EndpointName=endpoint_name, ContentType='application/json', Body=json.dumps(payload).encode('utf-8'))
    return response


def get_completion(query):
    return parse_response_multiple_texts(
        my_query_endpoint(query)
    )

## Section 3: Prompt Engineering

### Prompting Principles

1. Write clear and specific instructions.
2. Give the model time to “think”.

### Tactics for 'Write clear and specific instructions'.

#### Tactic 1: Use delimiters to clearly indicate distinct parts of the input

In [41]:
text = "CodeWhisperer is an AI-powered coding companion designed to assist developers in \
real-time within their Integrated Development Environment (IDE). \
It provides single-line or full-function code suggestions based on natural language comments, \
such as specific tasks or instructions. The suggestions are generated from large language models \
trained on billions of lines of code, including Amazon and open-source code. \
Developers can quickly accept, review, or continue writing their code, \
with the ability to edit suggestions to ensure accuracy. CodeWhisperer also offers \
specialized training for AWS APIs and helps in improving application security by detecting vulnerabilities. \
It supports multiple programming languages and can be used across various IDEs. \
The service also emphasizes responsible AI use, including bias filtering and \
tracking of suggestions that might resemble open-source training data."

delimiter_prompt = f"Summarize the text delimited by triple backticks into a single sentence:\n```{text}```"

get_completion(delimiter_prompt)

['CodeWhisperer provides a natural language based, intelligent suggestion engine that uses language models trained on billions of lines of code.']

#### Tactic 2: Ask for a structured output

In [42]:
query = f"Summarise the key features of CodeWhisperer outlined in the text delimited by triple backticks\
into a single sentence. And output them in JSON form with key being a name of the feature and value \
being the description.For example,'feature1':'description1'\
```{text}```"

get_completion(query)

['CodeWhisperer is an AI-powered coding companion designed to assist developers in real-time within their Integrated Development Environment (IDE) with suggestions based on natural language comments such as specific tasks or instructions, which are generated from large language models trained on billions of lines of code, including Amazon and open-source code. The suggestions are generated from large language models trained on billions of lines of code, including Amazon and open-source code.']

#### Tactic 3: Ask the model to check whether conditions are satisfied

In [43]:
text_1 = "Before you use CodeWhisperer for the first time, you do the following: \
Choose your IDE. Install or update your IDE (if applicable). \
Install or update the AWS Toolkit (if applicable). Choose your authentication method.\
Set up your Builder ID, IAM Identity Center, or IAM credentials."

prompt = f"""
You will be provided with text delimited by triple quotes. 
If it contains a sequence of instructions, \ 
re-write those instructions in the following format

Step 1 - ...
Step 2 - …
…
Step N - …

If the text does not contain a sequence of instructions, \ 
then simply write \"No steps provided.\"

\"\"\"{text_1}\"\"\"
"""

print(get_completion(prompt))

['Step 1 - Choose your IDE Step 2 - Install or update your IDE (if applicable) Step 3 - Install or update the AWS Toolkit (if applicable) Step 4 - Choose your authentication method']


In [44]:
text = "CodeWhisperer is an AI-powered coding companion designed to assist developers in \
real-time within their Integrated Development Environment (IDE). \
It provides single-line or full-function code suggestions based on natural language comments, \
such as specific tasks or instructions. The suggestions are generated from large language models \
trained on billions of lines of code, including Amazon and open-source code. \
Developers can quickly accept, review, or continue writing their code, \
with the ability to edit suggestions to ensure accuracy. CodeWhisperer also offers \
specialized training for AWS APIs and helps in improving application security by detecting vulnerabilities. \
It supports multiple programming languages and can be used across various IDEs. \
The service also emphasizes responsible AI use, including bias filtering and \
tracking of suggestions that might resemble open-source training data."

prompt = f"""
You will be provided with text delimited by triple quotes. 
If it contains a sequence of instructions, \ 
re-write those instructions in the following format:

Step 1 - ...
Step 2 - …
…
Step N - …

If the text does not contain a sequence of instructions, \ 
then simply write \"No steps provided.\"

\"\"\"{text}\"\"\"
"""

get_completion(prompt)

['No steps provided']

#### Tactic 4: "Few-shot" prompting

In [45]:
prompt = f"""
Your task is to answer in a consistent style.

<child>: Teach me about patience.

<grandparent>: The river that carves the deepest \ 
valley flows from a modest spring; the \ 
grandest symphony originates from a single note; \ 
the most intricate tapestry begins with a solitary thread.

<child>: Teach me about resilience.
"""
get_completion(prompt)

['grandparent>: You can  tell a liar by the way he looks at things  you can  guess the best way a writer would tell a story  you can  predict how a story will end,   if you know the outcome, you can  prepare yourself to face it; you can  change your mind about your plans and your ideas  if the first one doesn  work out.']

### Tactics for 'Give the model time to “think”.'

#### Tactic 1: Specify the steps required to complete a task

In [46]:
text = "CodeWhisperer is an AI-powered coding companion designed to assist developers in \
real-time within their Integrated Development Environment (IDE). \
It provides single-line or full-function code suggestions based on natural language comments, \
such as specific tasks or instructions. The suggestions are generated from large language models \
trained on billions of lines of code, including Amazon and open-source code. \
Developers can quickly accept, review, or continue writing their code, \
with the ability to edit suggestions to ensure accuracy. CodeWhisperer also offers \
specialized training for AWS APIs and helps in improving application security by detecting vulnerabilities. \
It supports multiple programming languages and can be used across various IDEs. \
The service also emphasizes responsible AI use, including bias filtering and \
tracking of suggestions that might resemble open-source training data."
# example 1
prompt = f"""
Perform the following actions: 
1 - Summarize the following text delimited by triple \
backticks with 1 sentence.
2 - Translate the summary into French.

Separate your answers with line breaks.

Text:
```{text}```
"""
print("\nCompletion for prompt 1:")
get_completion(prompt)


Completion for prompt 1:


["CodeWhisperer est une aide d'apprentissage de la programmation en application de la CI créée pour aider les développeurs dans l'écriture de code dans un environnement intégré de développement (IDE). Il fournit des suggestions de programmes de sexe à à droite ou de full-function dans une linie de comment, comme des tâches spécifiques ou des instructions. Il prend également en compte des évaluations de normes de modèle de langues sur des milliards de péchés de code, dont des possibilités de disponibilité et de disponibilité non-privée. Les développeurs peuvent rapidement accepter, révuer ou poursuivre l'écriture de leurs programmes, y compris pour les suggestions, avec la possibilité d'éditer les conseils afin de s’assurer d'une précision. CodeWhisperer offre également une formation spécialisée pour les APIs AWS, et contribue à renforcer la sécurité des applications en détectant les vulnérabilités. Le service appuie différents langues de programme et peut être utilisé dans diverses env

#### Tactic 2: Instruct the model to work out its own solution before rushing to a conclusion

In [48]:
prompt = """
Determine if the student's solution is correct or not.

Question:
I'm building a solar power installation and I need \
 help working out the financials. 
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost \ 
me a flat $100k per year, and an additional $10 / square \
foot
What is the total cost for the first year of operations 
as a function of the number of square feet.

Student's Solution:
Let x be the size of the installation in square feet.
Costs:
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 100x
Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
"""

get_completion(prompt)

['Yes']

#### Note that the student's solution is actually not correct.
#### We can fix this by instructing the model to work out its own solution first.

In [49]:
prompt = f"""
Your task is to determine if the student's solution \
is correct or not.
To solve the problem do the following,
- First, work out your own solution to the problem. 
- Then compare your solution to the student's solution \ 
and evaluate if the student's solution is correct or not. 
Don't decide if the student's solution is correct until 
you have done the problem yourself.

Use the following format
Question
```
question here
```
Student's solution
```
student's solution here
```
Actual solution
```
steps to work out the solution and your solution here
```
Is the student's solution the same as actual solution \
just calculated?
```
yes or no
```
Student grade
```
correct or incorrect
```

Question:
```
I'm building a solar power installation and I need help \
working out the financials. 
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost \
me a flat $100k per year, and an additional $10 / square \
foot
What is the total cost for the first year of operations \
as a function of the number of square feet.
``` 
Student's solution
```
Let x be the size of the installation in square feet.
Costs:
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 100x
Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
```
Actual solution. Analyse it step by step when explaining your reasoning:
"""
get_completion(prompt)

[" X = cost of land / square foot if the installation is 1000 square feet. Then it's 100x if the land is 1000 square feet. Then 250x if the solar panels are 1000 square feet and 10x if the maintenance is 100,000 square feet. Total cost if the land and panels are 1000 square feet: 100x + 250x + 100,000 + 100x = 450x + 100,000.  Correct."]

## Section 4: Model Limitations: Hallucinations
- Boie is a real company, the product name is not real.

In [50]:
prompt = f"""
How does codewhisperer generate suggestions?
"""
get_completion(prompt)

['For each line of code, Codewhisperer suggests a better solution or a possible refactoring solution. You can also enter a new line to help Codewhisperer understand the problem.']

## Section 5: RAG with FAISS

Before we proceed to the next steps, let's ensure that we have the necessary libraries installed. We will need the `langchain` library for the following steps. If it's not already installed, we can install it using pip.

The `langchain` library is a Python library that provides utilities for working with large language models. It includes utilities for creating prompts, querying endpoints, parsing responses, and more. We will use this library in the following steps to interact with our SageMaker endpoint.

In [51]:
!apt update
!apt-get install libmagic-dev -y

Hit:1 http://deb.debian.org/debian bullseye InRelease
Get:2 http://deb.debian.org/debian bullseye-updates InRelease [44.1 kB]
Get:3 http://security.debian.org/debian-security bullseye-security InRelease [48.4 kB]
Get:4 http://security.debian.org/debian-security bullseye-security/main amd64 Packages [253 kB]
Fetched 345 kB in 0s (1269 kB/s)33m[0m[33m
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
32 packages can be upgraded. Run 'apt list --upgradable' to see them.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libmagic-dev is already the newest version (1:5.39-3).
0 upgraded, 0 newly installed, 0 to remove and 32 not upgraded.


In [52]:
!pip install --upgrade python-magic unstructured langchain faiss-cpu pandas --quiet

[0m

Now, let's import some necessary modules from the `langchain` library.

- `PromptTemplate`: This class is used to create a template for the prompts that we will pass to the language model. 
- `SagemakerEndpoint`: This class is used to interact with the SageMaker endpoint.
- `LLMContentHandler`: This class is used to handle the content that we send to and receive from the language model.
- `load_qa_chain`: This function is used to load a question-answering chain. A chain is a sequence of transformations applied to the input to generate an answer.
- `Document`: This class is used to create documents that the language model can use to find the answer to a question.
- `EmbeddingsContentHandler`: This class is used to handle the content that we send to and receive from the embedding model.
- `SagemakerEndpointEmbeddings`: This class is used to interact with the SageMaker embeddings enpoint.

In [53]:
from langchain import PromptTemplate, SagemakerEndpoint
from langchain.llms.sagemaker_endpoint import LLMContentHandler
from langchain.chains.question_answering import load_qa_chain, LLMChain
from langchain.docstore.document import Document
from langchain.embeddings.sagemaker_endpoint import EmbeddingsContentHandler
from langchain.embeddings import SagemakerEndpointEmbeddings
import json
from typing import Dict, List

We will now create a content handler for the language model to transform input to a format that the SageMaker endpoint expects and output to a form that the language model class expects. We will also define some parameters for the model.

The `ContentHandler` class is a subclass of the `LLMContentHandler` class. It defines two methods:

- `transform_input`: This method takes a prompt and a dictionary of model parameters as input, and returns the input in a format that the SageMaker endpoint expects. In this case, it converts the input to a JSON string and encodes it to bytes.
- `transform_output`: This method takes the output from the SageMaker endpoint and returns it in a form that the language model class expects. In this case, it decodes the output from bytes to a string, parses the JSON, and returns the 'generated_texts' field.

The `parameters` dictionary defines the parameters that we will use when querying the language model. These parameters control the behavior of the language model, such as the maximum length of the generated text, the number of sequences to return, and the sampling strategy.

In [54]:
parameters = {
    "max_length": 5000,
    "num_return_sequences": 1,
    "top_k": 250,
    "top_p": 0.95,
    "do_sample": True,
    "temperature": 0.01,
}

class ContentHandler(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs: Dict) -> bytes:
        input_str = json.dumps({"text_inputs": prompt, **model_kwargs})
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        return response_json['generated_texts'][0]
    


llm_content_handler = ContentHandler()
sm_llm=SagemakerEndpoint(
            endpoint_name=endpoint_name,
            region_name="eu-central-1",
            model_kwargs=parameters,
            content_handler=llm_content_handler,
        )
creative_llm=SagemakerEndpoint(
            endpoint_name=endpoint_name,
            region_name="eu-central-1",
            model_kwargs={
                "max_length": 5000,
                "num_return_sequences": 1,
                "top_k": 250,
                "top_p": 0.95,
                "do_sample": False,
                "temperature": 2.5
            },
            content_handler=llm_content_handler,
        )

Next, we will define a prompt template and load a chain. 

The prompt template is used to format the input to the language model. It accepts a set of parameters from the user that can be used to generate a prompt for a language model. 

The question answering chain is a sequence of transformations applied to the input to generate an answer.

The `PromptTemplate` class takes a template string and a list of input variables as arguments. The template string is a string that contains placeholders for the input variables. The placeholders are enclosed in curly braces `{}` and correspond to the names of the input variables. When we use the prompt template, we will replace the placeholders with the actual values of the input variables.

The `chain` function loads a chain. A chain is a sequence of transformations applied to the input to generate an answer. Chains allow us to combine multiple components together to create a single, coherent application. For example, we can create a chain that takes user input, formats it with a PromptTemplate, and then passes the formatted response to an LLM. In this case, the chain includes the language model and the prompt template.

In [55]:
prompt=PromptTemplate(
            template="Use the following pieces of context to answer the question at the end.\n{context}\nQuestion: {question}\nAnswer:",
            input_variables=["context", "question"]
        )
chain = load_qa_chain(
        llm=sm_llm,
        prompt=prompt,
    )

Now, let's test our question answering chain with a sample question and some context. The context is a list of documents that the model can use to find the answer to the question.

The `chain` function takes a dictionary as input and returns the output of the chain. The input dictionary must contain the 'input_documents' and 'question' keys. The 'input_documents' key corresponds to a list of documents that the model can use to find the answer to the question. The 'question' key corresponds to the question that we want to answer.

In [56]:
query = "Which instances can I use with Managed Spot Training in SageMaker?"

input_documents = [Document(page_content="")]

chain({"input_documents": input_documents, "question": query}, return_only_outputs=True)

{'output_text': 'SageMaker'}

Next, we will create a content handler for embeddings to transform a format that the SageMaker endpoint expects and output to a form that the embeddings class expects.

The `SagemakerEndpointEmbeddingsJumpStart` class is a subclass of the `SagemakerEndpointEmbeddings` class. It defines the `embed_documents` method, which computes document embeddings using a SageMaker Inference Endpoint. The method takes a list of texts and a chunk size as input, and returns a list of embeddings.

The `ContentHandler` class is a subclass of the `EmbeddingsContentHandler` class. It defines two methods:

- `transform_input`: This method takes a prompt and a dictionary of model parameters as input, and returns the input in a format that the SageMaker endpoint expects. In this case, it converts the input to a JSON string and encodes it to bytes.
- `transform_output`: This method takes the output from the SageMaker endpoint and returns it in a form that the embeddings class expects. In this case, it decodes the output from bytes to a string, parses the JSON, and returns the 'embedding' field.

In [57]:
class SagemakerEndpointEmbeddingsJumpStart(SagemakerEndpointEmbeddings):
    def embed_documents(self, texts: List[str], chunk_size: int = 5) -> List[List[float]]:
        """Compute doc embeddings using a SageMaker Inference Endpoint.

        Args:
            texts: The list of texts to embed.
            chunk_size: The chunk size defines how many input texts will
                be grouped together as request. If None, will use the
                chunk size specified by the class.

        Returns:
            List of embeddings, one for each text.
        """
        results = []
        _chunk_size = len(texts) if chunk_size > len(texts) else chunk_size

        for i in range(0, len(texts), _chunk_size):
            response = self._embedding_func(texts[i : i + _chunk_size])
            print
            results.extend(response)
        return results

class ContentHandler(EmbeddingsContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs={}) -> bytes:
        input_str = json.dumps({"text_inputs": prompt, **model_kwargs})
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        embeddings = response_json["embedding"]
        return embeddings
embeddings_content_handler=ContentHandler()
embeddings = SagemakerEndpointEmbeddingsJumpStart(
    endpoint_name=embedding_endpoint_name,
    region_name="eu-central-1",
    content_handler=embeddings_content_handler,
)

Now we will load the data we will embed for contextual prompting

In [58]:
from langchain.document_loaders.url import UnstructuredURLLoader

In [59]:
urls = [
    "https://aws.amazon.com/codewhisperer/faqs/",
    "https://aws.amazon.com/sagemaker/faqs/",
]
headers={"ssl_verify":"False"}
loader = UnstructuredURLLoader(urls=urls,headers=headers)

We will now install the `faiss-cpu` library

`faiss-cpu` provides efficient similarity search and clustering of dense vectors.

FAISS (Facebook AI Similarity Search) is a library developed by Facebook AI that allows for efficient similarity search and clustering of dense vectors. So, given a set of vectors(in this case a vector representation of a document i.e. an embedding), we can index them using Faiss — then using another vector (the query vector), we search for the most similar vectors within the index.

In [60]:
!pip install faiss-cpu --quiet

[0m

We will now create an index of our documents using the VectorstoreIndexCreator. This index will allow us to perform efficient similarity searches on our documents.

The VectorstoreIndexCreator is a utility that helps us create an index of our documents. It uses the embeddings of the documents to create the index. The embeddings are dense vectors that represent the documents. The index allows us to perform efficient similarity searches on the documents.

In [61]:
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import Chroma, AtlasDB, FAISS
from langchain.text_splitter import CharacterTextSplitter,RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader,CSVLoader

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 200,
    chunk_overlap  = 20,
    length_function = len,
    add_start_index = True,
)

index_creator = VectorstoreIndexCreator(
    vectorstore_cls=FAISS,
    embedding=embeddings,
    text_splitter = text_splitter
)
index = index_creator.from_loaders([loader])



Let's test our index by querying it with a sample question.

The `index.query` function is used to perform a similarity search on the index. It takes a question and a language model as input, and returns the most similar documents in the index. After the relevant documents are retrieved, the LLM can be used to generate a coherent and contextually relevant answer based on the retrieved documents.

In [62]:
index.query(question=query, llm=sm_llm)

'all instances supported in SageMaker'

We will now replicate the index.query functionality step by step to illustrate what happens

we will create a document search object using the FAISS vector store and our documents. This will allow us to perform similarity searches on our documents. Using this we retrieve the top 3 most similar docs to our query.

The `FAISS.from_documents` function is used to create a FAISS vector store from our documents. The embeddings of the documents are used to create the vector store. The vector store allows us to perform efficient similarity searches on the documents.

The `docsearch.similarity_search` function is used to perform a similarity search on the documents. It takes a query and a number of results to return as input, and returns the most similar documents in the vector store. The query is converted into an embedding and this embedding is then compared with the embeddings of the documents in the vector store.

In [63]:
documents = loader.load()
splitdocuments = text_splitter.split_documents(documents)
docsearch = FAISS.from_documents(splitdocuments, embeddings)
docs = docsearch.max_marginal_relevance_search(query, k=3)
docs

[Document(page_content='Q: Which instances can I use with Managed Spot Training?\n\nManaged Spot Training can be used with all instances supported in SageMaker.\n\nQ: Which Regions are supported with Managed Spot Training?', metadata={'source': 'https://aws.amazon.com/sagemaker/faqs/', 'start_index': 46752}),
 Document(page_content='Q: When should I use Managed Spot Training?', metadata={'source': 'https://aws.amazon.com/sagemaker/faqs/', 'start_index': 45014}),
 Document(page_content='Q: Why should I use SageMaker Serverless Inference?', metadata={'source': 'https://aws.amazon.com/sagemaker/faqs/', 'start_index': 55822})]

Finally, we will use our question-answering chain to answer our query using the documents we found.

The `chain` function is used to apply our question-answering chain to our query and documents.

In [64]:
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'output_text': 'all instances supported in SageMaker'}

We have looked at two flows so far:

a.

![alt text](flow.png)

b.

![alt text](RAGflow.png)

Now let's

In [66]:
retail_data = "s3://mysagebucket-4590283737/RAGFiles/"
!aws s3 cp --recursive $retail_data rag_data

download: s3://mysagebucket-4590283737/RAGFiles/retail_items.csv to rag_data/retail_items.csv


In [67]:
import pandas as pd
df = pd.read_csv('rag_data/retail_items.csv')

processed_df=df[['description']]
processed_df['description'] = df.apply(lambda row: f"{row['name']} is a {row['style']} in the {row['category']} category. Description: {row['description']} with a Price of ${row['price']} and Current stock is {row['current_stock']}.", axis=1)





processed_df.head(5)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  processed_df['description'] = df.apply(lambda row: f"{row['name']} is a {row['style']} in the {row['category']} category. Description: {row['description']} with a Price of ${row['price']} and Current stock is {row['current_stock']}.", axis=1)


Unnamed: 0,description
0,Sans Pareil Scarf is a scarf in the apparel ca...
1,Chef Knife is a kitchen in the housewares cate...
2,Gainsboro Jacket is a jacket in the apparel ca...
3,High Definition Speakers is a speaker in the e...
4,Spiffy Sandals is a sandals in the footwear ca...


In [68]:
processed_df[['description']].to_csv("rag_data/processed_retail_data.csv", index=False)

In [69]:
retail_data_loader = CSVLoader(file_path="rag_data/processed_retail_data.csv")
retail_data_documents = retail_data_loader.load()

In [70]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap  = 20,
    length_function = len,
    add_start_index = True,
)
retail_data_index_creator = VectorstoreIndexCreator(
    vectorstore_cls=FAISS,
    embedding=embeddings,
    text_splitter=text_splitter,
)
retail_data_index = retail_data_index_creator.from_loaders([retail_data_loader])

In [71]:
retail_query="What is the price and stock of Sans Pareil scarf?"

In [122]:
chain({"input_documents": input_documents, "question": retail_query}, return_only_outputs=True)

{'output_text': 'not enough information'}

In [73]:
retail_data_index.query(question=retail_query, llm=sm_llm)

'$114.99 and Current stock is 6'

FineTuning

In [132]:
!mkdir training_data
!touch training_data/retail_items.jsonl
!touch training_data/train_retail_items.jsonl
!touch training_data/valid_retail_items.jsonl
!touch training_data/temp-task-data.jsonl

In [133]:
import csv
import json
from uuid import uuid4

def csv_to_jsonl(csv_file_path, jsonl_file_path):
    with open(csv_file_path, 'r') as csv_file:
        reader = csv.DictReader(csv_file)
        data = list(reader)
        
    jsonl_data = {
        "version": "v2.0",
        "data": []
    }
    
    for row in data:
        gender_affinity = 'women' if row['gender_affinity']=='F' else 'men' if row['gender_affinity']=='M' else 'no gender bias'
        aliases = row['aliases'] if row['aliases'] else 'no aliases'
        context = f"The {row['name']} is a {row['style']} style item in the {row['category']} category. It's described as: {row['description']}. The item is priced at {row['price']} and is currently in stock with {row['current_stock']} units available. You can view the item at {row['url']} and its image at {row['image']}. The item is preferred by {gender_affinity} and its SKU is {row['sk']}. The item is also known as {aliases}. The item's ID is {row['id']}."
        jsonl_data["data"].append(
            {
                "title": row['name'],
                "paragraphs": [
                    {
                        "qas": [
                            {
                                "question": f"What is the name of the item?",
                                "id": str(uuid4()),
                                "answers": [
                                    {
                                        "text": row['name'],
                                        "answer_start": context.find(row['name'])
                                    }
                                ],
                                "is_impossible": False
                            },
                            {
                                "question": f"What is the style of the item?",
                                "id": str(uuid4()),
                                "answers": [
                                    {
                                        "text": row['style'],
                                        "answer_start": context.find(row['style'])
                                    }
                                ],
                                "is_impossible": False
                            },
                            {
                                "question": f"What category does the item belong to?",
                                "id": str(uuid4()),
                                "answers": [
                                    {
                                        "text": row['category'],
                                        "answer_start": context.find(row['category'])
                                    }
                                ],
                                "is_impossible": False
                            },
                            {
                                "question": f"What is the description of the item?",
                                "id": str(uuid4()),
                                "answers": [
                                    {
                                        "text": row['description'],
                                        "answer_start": context.find(row['description'])
                                    }
                                ],
                                "is_impossible": False
                            },
                            {
                                "question": f"What is the price of the item?",
                                "id": str(uuid4()),
                                "answers": [
                                    {
                                        "text": row['price'],
                                        "answer_start": context.find(row['price'])
                                    }
                                ],
                                "is_impossible": False
                            },
                            {
                                "question": f"How many units of the item are available?",
                                "id": str(uuid4()),
                                "answers": [
                                    {
                                        "text": row['current_stock'],
                                        "answer_start": context.find(row['current_stock'])
                                    }
                                ],
                                "is_impossible": False
                            },
                            {
                                "question": f"What is the URL of the item?",
                                "id": str(uuid4()),
                                "answers": [
                                    {
                                        "text": row['url'],
                                        "answer_start": context.find(row['url'])
                                    }
                                ],
                                "is_impossible": False
                            },
                            {
                                "question": f"What is the URL of the item's image?",
                                "id": str(uuid4()),
                                "answers": [
                                    {
                                        "text": row['image'],
                                        "answer_start": context.find(row['image'])
                                    }
                                ],
                                "is_impossible": False
                            },
                            {
                                "question": f"Who is the item preferred by?",
                                "id": str(uuid4()),
                                "answers": [
                                    {
                                        "text": gender_affinity,
                                        "answer_start": context.find(gender_affinity)
                                    }
                                ],
                                "is_impossible": False
                            },
                            {
                                "question": f"What is the SKU of the item?",
                                "id": str(uuid4()),
                                "answers": [
                                    {
                                        "text": row['sk'],
                                        "answer_start": context.find(row['sk'])
                                    }
                                ],
                                "is_impossible": False
                            },
                            {
                                "question": f"What are the aliases of the item?",
                                "id": str(uuid4()),
                                "answers": [
                                    {
                                        "text": aliases,
                                        "answer_start": context.find(aliases)
                                    }
                                ],
                                "is_impossible": False
                            },
                            {
                                "question": f"What is the ID of the item?",
                                "id": str(uuid4()),
                                "answers": [
                                    {
                                        "text": row['id'],
                                        "answer_start": context.find(row['id'])
                                    }
                                ],
                                "is_impossible": False
                            },
                        ],
                        "context": context
                    }
                ]
            }
        )
        
    with open(jsonl_file_path, 'w') as jsonl_file:
        jsonl_file.write(json.dumps(jsonl_data) + '\n')

csv_to_jsonl('rag_data/retail_items.csv', 'training_data/retail_items.jsonl')


In [134]:
def split_jsonl(jsonl_file_path, train_file_path, valid_file_path, valid_ratio=0.2):
    with open(jsonl_file_path, 'r') as jsonl_file:
        data = json.load(jsonl_file)
        
    train_data = {
        "version": "v2.0",
        "data": data["data"]
    }

    valid_data = {
        "version": "v2.0",
        "data": []
    }
    
    valid_size = int(len(data["data"]) * valid_ratio)

    for i, item in enumerate(data["data"]):
        if i < valid_size:
            valid_data["data"].append(item)
        
    with open(train_file_path, 'w') as train_file:
        train_file.write(json.dumps(train_data) + '\n')

    with open(valid_file_path, 'w') as valid_file:
        valid_file.write(json.dumps(valid_data) + '\n')



split_jsonl('training_data/retail_items.jsonl', 'training_data/train_retail_items.jsonl', 'training_data/valid_retail_items.jsonl')


In [135]:
!pip install nest-asyncio --quiet
!pip install ipywidgets --quiet
!pip install --upgrade sagemaker --quiet

[0m

In [153]:
import boto3
import sagemaker
# Get current region, role, and default bucket
aws_region = boto3.Session().region_name
aws_role = sagemaker.session.Session().get_caller_identity_arn()
output_bucket = sagemaker.Session().default_bucket()
# This will be useful for printing
newline, bold, unbold = "\n", "\033[1m", "\033[0m"
print(f"{bold}aws_region:{unbold} {aws_region}")
print(f"{bold}aws_role:{unbold} {aws_role}")
print(f"{bold}output_bucket:{unbold} {output_bucket}")

[1maws_region:[0m eu-central-1
[1maws_role:[0m arn:aws:iam::881683121006:role/service-role/AmazonSageMaker-ExecutionRole-20230623T100989
[1moutput_bucket:[0m sagemaker-eu-central-1-881683121006


In [154]:
from sagemaker.instance_types import retrieve_default
model_id, model_version = dropdown.value, "*"
# Instance types for training and inference
training_instance_type = 'ml.g5.48xlarge'
inference_instance_type = retrieve_default(
model_id=model_id, model_version=model_version, scope="inference"
)
print(f"{bold}model_id:{unbold} {model_id}")
print(f"{bold}training_instance_type:{unbold} {training_instance_type}")
print(f"{bold}inference_instance_type:{unbold} {inference_instance_type}")

[1mmodel_id:[0m huggingface-text2text-flan-t5-xl
[1mtraining_instance_type:[0m ml.g5.48xlarge
[1minference_instance_type:[0m ml.g5.2xlarge


In [155]:
import IPython
from ipywidgets import Dropdown
from sagemaker.jumpstart.filters import And
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models
# Default model choice
model_id = "huggingface-text2text-flan-t5-xl"
# Identify FLAN T5 models that support fine-tuning
filter_value = And(
"task == text2text", "framework == huggingface", "training_supported == true"
)
model_list = [m for m in list_jumpstart_models(filter=filter_value) if "flan-t5" in m]
# Display the model IDs in a dropdown, for user to select
dropdown = Dropdown(
value=model_id,
options=model_list,
description="FLAN T5 models available for fine-tuning:",
style={"description_width": "initial"},
layout={"width": "max-content"},
)
display(IPython.display.Markdown("### Select a pre-trained model from the dropdown below"))
display(dropdown)



### Select a pre-trained model from the dropdown below

Dropdown(description='FLAN T5 models available for fine-tuning:', index=3, layout=Layout(width='max-content'),…

In [156]:
from sagemaker import image_uris, model_uris, script_uris
# Training instance will use this image
train_image_uri = image_uris.retrieve(
region=aws_region,
framework=None,  # automatically inferred from model_id
model_id=model_id,
model_version=model_version,
image_scope="training",
instance_type=training_instance_type,
)
# Pre-trained model
train_model_uri = model_uris.retrieve(
model_id=model_id, model_version=model_version, model_scope="training"
)
# Script to execute on the training instance
train_script_uri = script_uris.retrieve(
model_id=model_id, model_version=model_version, script_scope="training"
)
print(f"{bold}image uri:{unbold} {train_image_uri}")
print(f"{bold}model uri:{unbold} {train_model_uri}")
print(f"{bold}script uri:{unbold} {train_script_uri}")

INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.


[1mimage uri:[0m 763104351884.dkr.ecr.eu-central-1.amazonaws.com/huggingface-pytorch-training:1.13.1-transformers4.26.0-gpu-py39-cu117-ubuntu20.04
[1mmodel uri:[0m s3://jumpstart-cache-prod-eu-central-1/huggingface-training/train-huggingface-text2text-flan-t5-xl.tar.gz
[1mscript uri:[0m s3://jumpstart-cache-prod-eu-central-1/source-directory-tarballs/huggingface/transfer_learning/text2text/prepack/v1.1.2/sourcedir.tar.gz


In [140]:
import json

original_data_file = "training_data/retail_items.jsonl"

local_data_file = "training_data/temp-task-data.jsonl"  # any name with .jsonl extension

with open(original_data_file) as f:
    data = json.load(f)

with open(local_data_file, "w") as f:
    for article in data["data"]:
        for paragraph in article["paragraphs"]:
            # iterate over questions for a given paragraph
            for qas in paragraph["qas"]:
                example = {"context": paragraph["context"], "question": qas["question"], "answer": qas["answers"][0]["text"]}
                json.dump(example, f)
                f.write("\n")
from sagemaker.s3 import S3Uploader

template = {
    "prompt": "question: {question} context: {context}",
    "completion": "{answer}",
}
with open("template.json", "w") as f:
    json.dump(template, f)


train_data_location = "s3://sagemaker-studio-881683121006-yxaepgxvcl/training_flan/training/notebookgen"
S3Uploader.upload(local_data_file, train_data_location)
S3Uploader.upload("template.json", train_data_location)
print(f"{bold}training data:{unbold} {train_data_location}")

[1mtraining data:[0m s3://sagemaker-studio-881683121006-yxaepgxvcl/training_flan/training/notebookgen


In [157]:
from sagemaker import hyperparameters

# Retrieve the default hyper-parameters for fine-tuning the model
hyperparameters = hyperparameters.retrieve_default(model_id=model_id, model_version=model_version)

# We will override some default hyperparameters with custom values
hyperparameters["epochs"] = "3"
# TODO
# hyperparameters["max_input_length"] = "300"  # data inputs will be truncated at this length
# hyperparameters["max_output_length"] = "40"  # data outputs will be truncated at this length
# hyperparameters["generation_max_length"] = "40"  # max length of generated output
print(hyperparameters)

{'epochs': '3', 'max_steps': '-1', 'seed': '42', 'batch_size': '64', 'learning_rate': '0.0001', 'lr_scheduler_type': 'constant_with_warmup', 'warmup_ratio': '0.0', 'warmup_steps': '0', 'validation_split_ratio': '0.05', 'train_data_split_seed': '0', 'max_train_samples': '-1', 'max_eval_samples': '-1', 'max_input_length': '-1', 'max_output_length': '128', 'pad_to_max_length': 'True', 'gradient_accumulation_steps': '1', 'weight_decay': '0.0', 'adam_beta1': '0.9', 'adam_beta2': '0.999', 'adam_epsilon': '1e-08', 'max_grad_norm': '1.0', 'load_best_model_at_end': 'True', 'early_stopping_patience': '3', 'early_stopping_threshold': '0.0', 'label_smoothing_factor': '0', 'logging_strategy': 'steps', 'logging_first_step': 'False', 'logging_steps': '500', 'logging_nan_inf_filter': 'True', 'save_strategy': 'epoch', 'save_steps': '500', 'save_total_limit': '2', 'dataloader_drop_last': 'False', 'dataloader_num_workers': '0', 'evalaution_strategy': 'epoch', 'eval_steps': '500', 'eval_accumulation_steps

In [158]:
from sagemaker.estimator import Estimator
from sagemaker.utils import name_from_base

output_location="s3://sagemaker-studio-881683121006-yxaepgxvcl/finetuned/"
model_name = "-".join(model_id.split("-")[2:])  # get the most informative part of ID
training_job_name = name_from_base(f"js-demo-{model_name}-{hyperparameters['epochs']}")
print(f"{bold}job name:{unbold} {training_job_name}")

training_metric_definitions = [
{"Name": "val_loss", "Regex": "'eval_loss': ([0-9\\.]+)"},
{"Name": "train_loss", "Regex": "'loss': ([0-9\\.]+)"},
{"Name": "epoch", "Regex": "'epoch': ([0-9\\.]+)"},
]

# Create SageMaker Estimator instance
sm_estimator = Estimator(
role=aws_role,
image_uri=train_image_uri,
model_uri=train_model_uri,
source_dir=train_script_uri,
entry_point="transfer_learning.py",
instance_count=1,
instance_type=training_instance_type,
volume_size=300,
max_run=360000,
hyperparameters=hyperparameters,
output_path=output_location,
metric_definitions=training_metric_definitions,
)

# Launch a SageMaker training job over data located in the given S3 path
# Training jobs can take hours, it is recommended to set wait=False,
# and monitor job status through SageMaker console
sm_estimator.fit({"training": train_data_location,}, job_name=training_job_name, wait=False)

INFO:sagemaker:Creating training-job with name: js-demo-flan-t5-xl-3-2023-08-07-05-09-06-656


[1mjob name:[0m js-demo-flan-t5-xl-3-2023-08-07-05-09-06-656


ResourceLimitExceeded: An error occurred (ResourceLimitExceeded) when calling the CreateTrainingJob operation: The account-level service limit 'ml.g5.48xlarge for training job usage' is 0 Instances, with current utilization of 0 Instances and a request delta of 1 Instances. Please use AWS Service Quotas to request an increase for this quota. If AWS Service Quotas is not available, contact AWS support to request an increase for this quota.

In [145]:
from sagemaker import TrainingJobAnalytics

# This can be called while the job is still running
df = TrainingJobAnalytics(training_job_name=training_job_name).dataframe()
df.head(10)



In [147]:
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base

from sagemaker import image_uris

# Retrieve the inference docker image URI. This is the base HuggingFace container image
deploy_image_uri = image_uris.retrieve(
    region=aws_region,
    framework=None,  # automatically inferred from model_id
    model_id=model_id,
    model_version=model_version,
    image_scope="inference",
    instance_type=inference_instance_type,
)

fine_tuned_name = name_from_base(f"jumpstart-demo-fine-tuned-{model_id}")
fine_tuned_model_uri = f"{output_location}{training_job_name}/output/model.tar.gz"

# Create the SageMaker model instance of the fine-tuned model
fine_tuned_model = Model(
image_uri=deploy_image_uri,
model_data=fine_tuned_model_uri,
role=aws_role,
predictor_cls=Predictor,
name=fine_tuned_name,
)

print(f"{bold}image URI:{unbold}{newline} {deploy_image_uri}")
print(f"{bold}model URI:{unbold}{newline} {fine_tuned_model_uri}")
print("Deploying an endpoint ...")

# Deploy the fine-tuned model.
fine_tuned_predictor = fine_tuned_model.deploy(
initial_instance_count=1,
instance_type=inference_instance_type,
predictor_cls=Predictor,
endpoint_name=fine_tuned_name,
)
print(f"{newline}Deployed an endpoint {fine_tuned_name}")

INFO:sagemaker:Creating model with name: jumpstart-demo-fine-tuned-huggingface-t-2023-08-07-04-51-18-246


[1mimage URI:[0m
 763104351884.dkr.ecr.eu-central-1.amazonaws.com/huggingface-pytorch-inference:1.13.1-transformers4.26.0-gpu-py39-cu117-ubuntu20.04
[1mmodel URI:[0m
 s3://sagemaker-studio-881683121006-yxaepgxvcl/finetuned/js-demo-flan-t5-xl-3-2023-08-06-22-54-21-407/output/model.tar.gz
Deploying an endpoint ...


INFO:sagemaker:Creating endpoint-config with name jumpstart-demo-fine-tuned-huggingface-t-2023-08-07-04-51-18-246
INFO:sagemaker:Creating endpoint with name jumpstart-demo-fine-tuned-huggingface-t-2023-08-07-04-51-18-246


-----------!
Deployed an endpoint jumpstart-demo-fine-tuned-huggingface-t-2023-08-07-04-51-18-246


In [150]:
fn_llm=SagemakerEndpoint(
            endpoint_name=fine_tuned_name,
            region_name="eu-central-1",
            model_kwargs=parameters,
            content_handler=llm_content_handler,
        )
prompt=PromptTemplate(
            template="Use the following pieces of context to answer the question at the end.\n{context}\nQuestion: {question}\nAnswer:",
            input_variables=["context", "question"]
        )
fn_chain = load_qa_chain(
        llm=fn_llm,
        prompt=prompt,
    )

In [152]:
query = "What is the price and stock of Sans Pareil scarf?"

input_documents = [Document(page_content="")]

fn_chain({"input_documents": input_documents, "question": query}, return_only_outputs=True)

ValueError: Error raised by inference endpoint: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "/opt/ml/model does not appear to have a file named config.json. Checkout \u0027https://huggingface.co//opt/ml/model/None\u0027 for available files."
}
". See https://eu-central-1.console.aws.amazon.com/cloudwatch/home?region=eu-central-1#logEventViewer:group=/aws/sagemaker/Endpoints/jumpstart-demo-fine-tuned-huggingface-t-2023-08-07-04-51-18-246 in account 881683121006 for more information.

## Cleanup

After you have finished with this notebook, you should clean up your AWS resources to avoid any unwanted charges. This includes deleting the SageMaker endpoint. [add cleanup steps]