# <div> <b> UKIR ENT Retail SA Workshop - 09August2023 - Generative AI </b> </div>  

---
<div class="alert alert-block alert-info">
<b>Tip:</b> Use the notebook environment: PyTorch 1.13  Python 3.9 GPU Optimized | ml.g4dn.xlarge
</div>

---

## SetUp

---
<div class="alert alert-block alert-warning">
<b> 
    - Install required core packages used in rest of the sections of the notebook.
    - Set global parameters required for the lab.
</b>
</div>

---

In [2]:
!pip install ipywidgets==7.0.0 --quiet
!pip install --upgrade sagemaker --quiet
!pip install langchain --quiet
!pip install ipyparallel --quiet

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip instal

In [3]:
import sagemaker, boto3, json
from sagemaker import get_execution_role
from sagemaker import image_uris, model_uris, script_uris, hyperparameters
from sagemaker.model import Model
from sagemaker.predictor import Predictor

aws_role = get_execution_role()
aws_region = boto3.Session().region_name
sm_session = sagemaker.Session()

In [4]:
MODEL_ID = 'huggingface-text2text-flan-t5-xl' # this is the default model for this lab
EMBEDDING_MODEL_ID = 'huggingface-textembedding-gpt-j-6b-fp16' # this is the default model for this lab
MODEL_VERSION = '*'
INF_INSTANCE_TYPE = 'ml.g5.2xlarge'
INF_INSTANCE_COUNT = 1
INF_IMAGE_SCOPE = 'inference'
TRN_INSTANCE_TYPE = 'ml.g5.12xlarge'
TRN_INSTANCE_COUNT = 1
TRN_IMAGE_SCOPE = 'training'
MODEL_DATA_DOWNLOAD_TIMEOUT = 3600  # in seconds
CONTAINER_STARTUP_HEALTH_CHECK_TIMEOUT = 3600
EBS_VOLUME_SIZE = 256  # in GB
CONTENT_TYPE = 'application/json'
MODEL_ENDPOINT_PREFIX = 'uki-ent-ret-sa'

# Section 1
<h1><b>Deploying Models</b></h1>
<h2>
   Deploy the state-of-the-art pre-trained models **[FLAN T5 models](https://huggingface.co/docs/transformers/model_doc/flan-t5)** and query the endpoint to generate response from the base model. Also deploy embedding model to generate embeddings for Retrieval Augmented Generation
    
</h2>

---

### **STEPS**
- [1.a Select a model](#1.a-Select-a-model)
- [1.b Retrieve Artifacts & Deploy an Endpoint](#1.b-Retrieve-Artifacts-&-Deploy-an-Endpoint)
- [1.c Query endpoint and parse response](#1.c-Query-endpoint-and-parse-response)
- [1.d Advanced features: How to use various parameters to control the generated text](#1.d-Advanced-features:-How-to-use-various-advanced-parameters-to-control-the-generated-text)

### **1.a Select a pre-trained model**
***
You can continue with the default model, or can choose a different model from the dropdown generated upon running the next cell. A complete list of SageMaker pre-trained models can also be accessed at [SageMaker pre-trained Models](https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html#).
***

In [5]:
from ipywidgets import Dropdown
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models

# Retrieves all Text Generation models available by SageMaker Built-In Algorithms.
filter_value = "task == text2text"
text_generation_models = list_jumpstart_models(filter=filter_value)

# display the model-ids in a dropdown to select a model for inference.
model_dropdown = Dropdown(
    options=text_generation_models,
    value=MODEL_ID,
    description="Select a model",
    style={"description_width": "initial"},
    layout={"width": "max-content"},
)

In [6]:
display(model_dropdown)

A Jupyter Widget

<div class="alert alert-block alert-warning">
    The current notebook is only tested against the model : <b> huggingface-text2text-flan-t5-xl </b>
</div>

In [7]:
# model_version="*" fetches the latest version of the model
MODEL_ID, MODEL_VERSION = model_dropdown.value, "*"

Later on for RAG we will need an embeddings model to generate embeddings of our prompts and support data so we select that here:

In [8]:
# Retrieves all Text Generation models available by SageMaker Built-In Algorithms.
filter_value = "task == textembedding"
embedding_models = list_jumpstart_models(filter=filter_value)

# display the model-ids in a dropdown to select a model for inference.
embedding_model_dropdown = Dropdown(
    options=embedding_models,
    value=EMBEDDING_MODEL_ID,
    description="Select a model",
    style={"description_width": "initial"},
    layout={"width": "max-content"},
)

In [9]:
display(embedding_model_dropdown)

A Jupyter Widget

In [10]:
# model_version="*" fetches the latest version of the model
EMBEDDING_MODEL_ID, MODEL_VERSION = embedding_model_dropdown.value, "*"

### **1.b Retrieve Artifacts & Deploy an Endpoint for base models**

In [11]:
# Define a unique endpoint name for the current model deployment. Add current timestamp as the suffix if needed
base_endpoint_name = f'{MODEL_ENDPOINT_PREFIX}-base-{MODEL_ID}'

In [12]:
# Retrieve the inference docker container uri.
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    image_scope=INF_IMAGE_SCOPE,
    model_id=MODEL_ID,
    model_version=MODEL_VERSION,
    instance_type=INF_INSTANCE_TYPE
)

In [13]:
# Retrieve the model uri.
model_uri = model_uris.retrieve(
    model_id=MODEL_ID, model_version=MODEL_VERSION, model_scope=INF_IMAGE_SCOPE
)

#### **huggingface-text2text-flan-t5-xl** is already packed with the inference script and model artifacts, so the `source_dir` argument and entryPoint script to the Model are not required.


In [14]:
# Create the SageMaker model instance. Note that we need to pass Predictor class when we deploy model through Model class,
# for being able to run inference through the sagemaker API.

model = Model(
    image_uri=deploy_image_uri,
    model_data=model_uri,
    role=aws_role,
    predictor_cls=Predictor,
    name=base_endpoint_name
)

In [15]:
# Define a unique endpoint name for the current model deployment. Add current timestamp as the suffix if needed
base_embedding_endpoint_name = f'{MODEL_ENDPOINT_PREFIX}-base-{EMBEDDING_MODEL_ID}'

In [16]:
# Retrieve the inference docker container uri.
deploy_embedding_image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    image_scope=INF_IMAGE_SCOPE,
    model_id=EMBEDDING_MODEL_ID,
    model_version=MODEL_VERSION,
    instance_type=INF_INSTANCE_TYPE
)

In [17]:
# Retrieve the model uri.
model_embedding_uri = model_uris.retrieve(
    model_id=EMBEDDING_MODEL_ID, model_version=MODEL_VERSION, model_scope=INF_IMAGE_SCOPE
)

In [18]:
# Create the SageMaker model instance. Note that we need to pass Predictor class when we deploy model through Model class,
# for being able to run inference through the sagemaker API.

embedding_model = Model(
    image_uri=deploy_embedding_image_uri,
    model_data=model_embedding_uri,
    role=aws_role,
    predictor_cls=Predictor,
    name=base_embedding_endpoint_name
)

Now we deploy both models

In [19]:
# deploy the Model
base_model_predictor = model.deploy(
    initial_instance_count=INF_INSTANCE_COUNT,
    instance_type=INF_INSTANCE_TYPE,
    endpoint_name=base_endpoint_name
)

----------!

In [20]:
# deploy the Model
base_embedding_model_predictor = embedding_model.deploy(
    initial_instance_count=INF_INSTANCE_COUNT,
    instance_type=INF_INSTANCE_TYPE,
    endpoint_name=base_embedding_endpoint_name
)

-----------!

### **1.c Query endpoint and parse response**

In [21]:
newline, bold, unbold = "\n", "\033[1m", "\033[0m"


def query_endpoint(encoded_text, endpoint_name):
    client = boto3.client("runtime.sagemaker")
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType="application/x-text", Body=encoded_text
    )
    return response


def parse_response(query_response):
    model_predictions = json.loads(query_response["Body"].read())
    generated_text = model_predictions["generated_text"]
    return generated_text

def my_query_endpoint(query):
    payload = {
        "text_inputs": query,
        "max_length": 5000,
        "num_return_sequences": 1,
        "top_k": 50,
        "top_p": 0.95,
        "do_sample": False,
        "temperature": 0.2,
    }
    client = boto3.client('runtime.sagemaker')
    response = client.invoke_endpoint(EndpointName=base_endpoint_name, ContentType='application/json', Body=json.dumps(payload).encode('utf-8'))
    return response


def get_completion(query):
    return parse_response_multiple_texts(
        my_query_endpoint(query)
    )

In [22]:
newline, bold, unbold = "\n", "\033[1m", "\033[0m"

text1 = "Translate to German:  My name is SageMaker Jumpstart"
text2 = "A step by step guide to deploy a large language model:"


for text in [text1, text2]:
    query_response = query_endpoint(text.encode("utf-8"), endpoint_name=base_endpoint_name)
    generated_text = parse_response(query_response)
    print(
        f"Inference:{newline}"
        f"input text: {text}{newline}"
        f"generated text: {bold}{generated_text}{unbold}{newline}"
    )

Inference:
input text: Translate to German:  My name is SageMaker Jumpstart
generated text: [1mIch bin SageMaker Jumpstart.[0m

Inference:
input text: A step by step guide to deploy a large language model:
generated text: [1mStep 1: Create a large language model. Step 2: Create a large language model[0m



### **1.d. Advanced features: How to use various advanced parameters to control the generated text**

***
This model also supports many advanced parameters while performing inference. They include:

* **max_length:** Model generates text until the output length (which includes the input context length) reaches `max_length`. If specified, it must be a positive integer.
* **num_return_sequences:** Number of output sequences returned. If specified, it must be a positive integer.
* **num_beams:** Number of beams used in the greedy search. If specified, it must be integer greater than or equal to `num_return_sequences`.
* **no_repeat_ngram_size:** Model ensures that a sequence of words of `no_repeat_ngram_size` is not repeated in the output sequence. If specified, it must be a positive integer greater than 1.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **early_stopping:** If True, text generation is finished when all beam hypotheses reach the end of sentence token. If specified, it must be boolean.
* **do_sample:** If True, sample the next word as per the likelihood. If specified, it must be boolean.
* **top_k:** In each step of text generation, sample from only the `top_k` most likely words. If specified, it must be a positive integer.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.
* **seed:** Fix the randomized state for reproducibility. If specified, it must be an integer.

We may specify any subset of the parameters mentioned above while invoking an endpoint. Next, we show an example of how to invoke endpoint with these arguments

***

In [23]:
# Input must be a json
payload = {
    "text_inputs": "Tell me the steps to create an ec2 instance on AWS cloud",
    "max_length": 100,
    "num_return_sequences": 1,
    "top_k": 20,
    "top_p": 0.8,
    "do_sample": True,
    "temperature":0.8
}


def query_endpoint_with_json_payload(encoded_json, endpoint_name):
    client = boto3.client("runtime.sagemaker")
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType="application/json", Body=encoded_json
    )
    return response


query_response = query_endpoint_with_json_payload(
    json.dumps(payload).encode("utf-8"), endpoint_name=base_endpoint_name
)


def parse_response_multiple_texts(query_response):
    model_predictions = json.loads(query_response["Body"].read())
    generated_text = model_predictions["generated_texts"]
    return generated_text


generated_texts = parse_response_multiple_texts(query_response)
print(generated_texts)

['Create a new EC2 instance on AWS. Select an instance type. Enter the instance size.']


---
#### **Text Summarization**

---

In [24]:
text = "CodeWhisperer is an AI coding companion that generates real-time, single-line or full-function code suggestions in your Integrated Development Environment (IDE) to help you quickly build software. With CodeWhisperer, you can write a comment in natural language that outlines a specific task in English, such as “Upload a file with server-side encryption.” Based on this information, CodeWhisperer recommends one or more code snippets directly in the IDE that can accomplish the task. You can quickly and easily accept the top suggestion (tab key), view more suggestions (arrow keys), or continue writing your own code. You should always review a code suggestion before accepting them, and you may need to edit it to ensure it does exactly what you intended. CodeWhisperer helps accelerate software development by providing code suggestions that reduce total development effort and allow more time for ideation, complex problem solving, and writing differentiated code. In addition to general purpose code suggestions, CodeWhisperer has additional training to provide code suggestions for using AWS APIs. CodeWhisperer can also help you improve application security by helping detect and remediate security vulnerabilities. As you are writing code, CodeWhisperer analyzes the English language comments and surrounding code to infer what code is needed to complete the task at hand. CodeWhisperer suggests one or more code snippets directly in the code editor, accelerating you as you code. The code suggestions provided by CodeWhisperer are based on a large language models (LLMs) trained on billions of lines of code, including Amazon and open-source code. You can quickly and more easily accept the top suggestion (tab key), view more suggestions (arrow keys), or continue writing your own code. Always review a code suggestion before accepting it, and you may need to edit it to ensure that it does exactly what you intended."

In [25]:
prompts = [
    "Briefly summarize this sentence: {text}",
    "Write a short summary for this text: {text}",
    "Generate a short summary this sentence:\n{text}",
    "{text}\n\nWrite a brief summary in a sentence or less",
    "{text}\nSummarize the aforementioned text in a single phrase.",
    "{text}\nCan you generate a short summary of the above paragraph?",
    "Write a sentence based on this summary: {text}",
    "Write a sentence based on '{text}'",
    "Summarize this article:\n\n{text}",
]

num_return_sequences = 1
parameters = {
    "max_length": 50,
    "num_return_sequences": num_return_sequences,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}

print(f"{bold}Number of return sequences are set as {num_return_sequences}{unbold}{newline}")
for each_prompt in prompts:
    payload = {"text_inputs": each_prompt.replace("{text}", text), **parameters}
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=base_endpoint_name
    )
    generated_texts = parse_response_multiple_texts(query_response)
    print(f"{bold} For prompt: '{each_prompt}'{unbold}{newline}")
    print(f"{bold} The {num_return_sequences} summarized results are{unbold}:{newline}")
    for idx, each_generated_text in enumerate(generated_texts):
        print(f"{bold}Result {idx}{unbold}: {each_generated_text}{newline}")

[1mNumber of return sequences are set as 1[0m

[1m For prompt: 'Briefly summarize this sentence: {text}'[0m

[1m The 1 summarized results are[0m:

[1mResult 0[0m: CodeWhisperer is an AI coding companion that generates real-time, single-line or full-function code suggestions in your Integrated Development Environment (IDE). With CodeWhisperer, you can write a comment

[1m For prompt: 'Write a short summary for this text: {text}'[0m

[1m The 1 summarized results are[0m:

[1mResult 0[0m: Download CodeWhisperer to accelerate software development with code suggestions.

[1m For prompt: 'Generate a short summary this sentence:
{text}'[0m

[1m The 1 summarized results are[0m:

[1mResult 0[0m: CodeWhisperer generates intelligent coding suggestions to help software developers accelerate development. With CodeWhisperer, you can write a comment that outlines a specific task in English. CodeWhisperer then recommends

[1m For prompt: '{text}

Write a brief summary in a sentence

---
#### **Question and Answering**

---

In [26]:
context = """The newest and most innovative Kindle yet lets you take notes on millions of books and documents, write lists and journals, and more. 

For readers who have always wished they could write in their eBooks, Amazon’s new Kindle lets them do just that. The Kindle Scribe is the first Kindle for reading and writing and allows users to supplement their books and documents with notes, lists, and more.

Here’s everything you need to know about the Kindle Scribe, including frequently asked questions.

The Kindle Scribe makes it easy to read and write like you would on paper 

The Kindle Scribe features a 10.2-inch, glare-free screen (the largest of all Kindle devices), crisp 300 ppi resolution, and 35 LED front lights that automatically adjust to your environment. Further personalize your experience with the adjustable warm light, font sizes, line spacing, and more.

It comes with your choice of the Basic Pen or the Premium Pen, which you use to write on the screen like you would on paper. They also attach magnetically to your Kindle and never need to be charged. The Premium Pen includes a dedicated eraser and a customizable shortcut button.

The Kindle Scribe has the most storage options of all Kindle devices: choose from 8 GB, 16 GB, or 32 GB to suit your level of reading and writing.
"""
question = "what are the key features of new Kindle?"

In [27]:
prompts = [
    """Answer based on context:\n\n{context}\n\n{question}""",
    """{context}\n\nAnswer this question based on the article: {question}""",
    """{context}\n\n{question}""",
    """{context}\nAnswer this question: {question}""",
    """Read this article and answer this question {context}\n{question}""",
    """{context}\n\nBased on the above article, answer a question. {question}""",
    """Write an article that answers the following question: {question} {context}""",
]


parameters = {
    "max_length": 50,
    "num_return_sequences": 1,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}


for each_prompt in prompts:
    input_text = each_prompt.replace("{context}", context)
    input_text = input_text.replace("{question}", question)
    #print(f"{bold} For prompt{unbold}: '{each_prompt}'{newline}")
    payload = {"text_inputs": input_text, **parameters}
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=base_endpoint_name
    )
    generated_texts = parse_response_multiple_texts(query_response)
    print(f"{bold} The reasoning result is{unbold}: '{generated_texts}'{newline}")

[1m The reasoning result is[0m: '['a.']'

[1m The reasoning result is[0m: '['Kindle Scribe is the first Kindle for reading and writing and allows users to supplement their books and documents with notes, lists, and more']'

[1m The reasoning result is[0m: '['it’s got a huge, glare-free screen']'

[1m The reasoning result is[0m: '['lets you take notes']'

[1m The reasoning result is[0m: '['lets you take notes on millions of books and documents, write lists and journals, and more.']'

[1m The reasoning result is[0m: '['The Kindle Scribe has the most storage options of all Kindle devices']'

[1m The reasoning result is[0m: '['The Premium Pen is more powerful than ever, allowing you to take notes on paper-sized pages. You can also connect an external digital pen to your Kindle for on-the-go writing and drawing. How does the Kindle Scri']'



---
#### **Imaginary article generation based on a title**

---

In [28]:
title = "AnyCompany business has a new product category coming up"

In [29]:
prompts = [
    """Title: \"{title}\"\\nGiven the above title of an imaginary article, imagine the article.\\n"""
]


parameters = {
    "max_length": 5000,
    "num_return_sequences": 1,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}


for each_prompt in prompts:
    input_text = each_prompt.replace("{title}", title)
    #print(f"{bold} For prompt{unbold}: '{input_text}'{newline}")
    payload = {"text_inputs": input_text, **parameters}
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=base_endpoint_name
    )
    generated_texts = parse_response_multiple_texts(query_response)
    print(f"{bold} The reasoning result is{unbold}: '{generated_texts}'{newline}")

[1m The reasoning result is[0m: '['nn"I love you all and can\'t wait to work with you"n']'



# Section 2
---
<h1><b> Retrieval Augmented Generation(RAG) </b></h1>

<h2>
   Implementing RAG with a local vector store index using langchain and FAISS.
    <br/>
    
</h2>

---

### **STEPS**
- [2.a Create text2textgeneration(T2T) llm contenthandler](#2.a-Create-text2textgeneration-contenthandler)
- [2.b Create embedding llm contenthandler](#2.b-Create-embediing-contenthandler)
- [2.c Generating embeddings and populating index](#2.c-Generating-embediing-and-populating-index)
- [2.d Querying t2t llm using RAG ](#2.D-Querying-t2t-llm-using-RAG)

Before we proceed to the next steps, let's ensure that we have the necessary libraries installed. We will need the `langchain` library for the following steps. If it's not already installed, we can install it using pip.

The `langchain` library is a Python library that provides utilities for working with large language models. It includes utilities for creating prompts, querying endpoints, parsing responses, and more. We will use this library in the following steps to interact with our SageMaker endpoint.

In [30]:
!apt update
!apt-get install libmagic-dev -y

Get:1 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]
Get:2 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:3 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB][33m
Get:4 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Get:5 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [2619 kB]
Get:6 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [2925 kB]
Get:7 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB]     [0m
Get:8 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [29.3 kB]
Get:9 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [1091 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB][0m[33m
Get:11 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB]
Get:12 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB]
Get:13 http://archiv

In [31]:
!pip install --upgrade python-magic unstructured faiss-cpu pandas --quiet

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


Now, let's import some necessary modules from the `langchain` library.

- `PromptTemplate`: This class is used to create a template for the prompts that we will pass to the language model. 
- `SagemakerEndpoint`: This class is used to interact with the SageMaker endpoint.
- `LLMContentHandler`: This class is used to handle the content that we send to and receive from the language model.
- `load_qa_chain`: This function is used to load a question-answering chain. A chain is a sequence of transformations applied to the input to generate an answer.
- `Document`: This class is used to create documents that the language model can use to find the answer to a question.
- `EmbeddingsContentHandler`: This class is used to handle the content that we send to and receive from the embedding model.
- `SagemakerEndpointEmbeddings`: This class is used to interact with the SageMaker embeddings enpoint.

In [32]:
from langchain import PromptTemplate, SagemakerEndpoint
from langchain.llms.sagemaker_endpoint import LLMContentHandler
from langchain.chains.question_answering import load_qa_chain, LLMChain
from langchain.docstore.document import Document
from langchain.embeddings.sagemaker_endpoint import EmbeddingsContentHandler
from langchain.embeddings import SagemakerEndpointEmbeddings
import json
from typing import Dict, List

### **2.a Creating text2textgeneration llm contenthandler**

We will now create a content handler for the language model to transform input to a format that the SageMaker endpoint expects and output to a form that the language model class expects. We will also define some parameters for the model.

The `ContentHandler` class is a subclass of the `LLMContentHandler` class. It defines two methods:

- `transform_input`: This method takes a prompt and a dictionary of model parameters as input, and returns the input in a format that the SageMaker endpoint expects. In this case, it converts the input to a JSON string and encodes it to bytes.
- `transform_output`: This method takes the output from the SageMaker endpoint and returns it in a form that the language model class expects. In this case, it decodes the output from bytes to a string, parses the JSON, and returns the 'generated_texts' field.

The `parameters` dictionary defines the parameters that we will use when querying the language model. These parameters control the behavior of the language model, such as the maximum length of the generated text, the number of sequences to return, and the sampling strategy.

In [33]:
parameters = {
    "max_length": 5000,
    "num_return_sequences": 1,
    "top_k": 250,
    "top_p": 0.95,
    "do_sample": True,
    "temperature": 0.01,
}

class ContentHandler(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs: Dict) -> bytes:
        input_str = json.dumps({"text_inputs": prompt, **model_kwargs})
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        return response_json['generated_texts'][0]
    


llm_content_handler = ContentHandler()
sm_llm=SagemakerEndpoint(
            endpoint_name=base_endpoint_name,
            region_name="eu-central-1",
            model_kwargs=parameters,
            content_handler=llm_content_handler,
        )
creative_llm=SagemakerEndpoint(
            endpoint_name=base_endpoint_name,
            region_name="eu-central-1",
            model_kwargs={
                "max_length": 5000,
                "num_return_sequences": 1,
                "top_k": 250,
                "top_p": 0.95,
                "do_sample": False,
                "temperature": 2.5
            },
            content_handler=llm_content_handler,
        )

Next, we will define a prompt template and load a chain. 

The prompt template is used to format the input to the language model. It accepts a set of parameters from the user that can be used to generate a prompt for a language model. 

The question answering chain is a sequence of transformations applied to the input to generate an answer.

The `PromptTemplate` class takes a template string and a list of input variables as arguments. The template string is a string that contains placeholders for the input variables. The placeholders are enclosed in curly braces `{}` and correspond to the names of the input variables. When we use the prompt template, we will replace the placeholders with the actual values of the input variables.

The `chain` function loads a chain. A chain is a sequence of transformations applied to the input to generate an answer. Chains allow us to combine multiple components together to create a single, coherent application. For example, we can create a chain that takes user input, formats it with a PromptTemplate, and then passes the formatted response to an LLM. In this case, the chain includes the language model and the prompt template.

In [34]:
prompt=PromptTemplate(
            template="Use the following pieces of context to answer the question at the end.\n{context}\nQuestion: {question}\nAnswer:",
            input_variables=["context", "question"]
        )
chain = load_qa_chain(
        llm=sm_llm,
        prompt=prompt,
    )

Now, let's test our question answering chain with a sample question and some context. The context is a list of documents that the model can use to find the answer to the question.

The `chain` function takes a dictionary as input and returns the output of the chain. The input dictionary must contain the 'input_documents' and 'question' keys. The 'input_documents' key corresponds to a list of documents that the model can use to find the answer to the question. The 'question' key corresponds to the question that we want to answer.

In [35]:
query = "Which instances can I use with Managed Spot Training in SageMaker?"

input_documents = [Document(page_content="")]

chain({"input_documents": input_documents, "question": query}, return_only_outputs=True)

{'output_text': 'SageMaker'}

#### The correct answer should show that all instances in Sagemaker can be used with Managed Spot Training

### **2.b Creating text2textgeneration llm contenthandler**

Next, we will create a content handler for embeddings to transform a format that the SageMaker endpoint expects and output to a form that the embeddings class expects.

The `SagemakerEndpointEmbeddingsJumpStart` class is a subclass of the `SagemakerEndpointEmbeddings` class. It defines the `embed_documents` method, which computes document embeddings using a SageMaker Inference Endpoint. The method takes a list of texts and a chunk size as input, and returns a list of embeddings.

The `ContentHandler` class is a subclass of the `EmbeddingsContentHandler` class. It defines two methods:

- `transform_input`: This method takes a prompt and a dictionary of model parameters as input, and returns the input in a format that the SageMaker endpoint expects. In this case, it converts the input to a JSON string and encodes it to bytes.
- `transform_output`: This method takes the output from the SageMaker endpoint and returns it in a form that the embeddings class expects. In this case, it decodes the output from bytes to a string, parses the JSON, and returns the 'embedding' field.m

In [36]:
class SagemakerEndpointEmbeddingsJumpStart(SagemakerEndpointEmbeddings):
    def embed_documents(self, texts: List[str], chunk_size: int = 5) -> List[List[float]]:
        """Compute doc embeddings using a SageMaker Inference Endpoint.

        Args:
            texts: The list of texts to embed.
            chunk_size: The chunk size defines how many input texts will
                be grouped together as request. If None, will use the
                chunk size specified by the class.

        Returns:
            List of embeddings, one for each text.
        """
        results = []
        _chunk_size = len(texts) if chunk_size > len(texts) else chunk_size

        for i in range(0, len(texts), _chunk_size):
            response = self._embedding_func(texts[i : i + _chunk_size])
            print
            results.extend(response)
        return results

class ContentHandler(EmbeddingsContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs={}) -> bytes:
        input_str = json.dumps({"text_inputs": prompt, **model_kwargs})
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        embeddings = response_json["embedding"]
        return embeddings
embeddings_content_handler=ContentHandler()
embeddings = SagemakerEndpointEmbeddingsJumpStart(
    endpoint_name=base_embedding_endpoint_name,
    region_name="eu-central-1",
    content_handler=embeddings_content_handler,
)

### **2.c Generating embeddings and populating index**

Now we will load the data we will embed for contextual prompting

In [37]:
from langchain.document_loaders.url import UnstructuredURLLoader

In [38]:
urls = [
    "https://aws.amazon.com/codewhisperer/faqs/",
    "https://aws.amazon.com/sagemaker/faqs/",
]
headers={"ssl_verify":"False"}
loader = UnstructuredURLLoader(urls=urls,headers=headers)

We will now install the `faiss-cpu` library

`faiss-cpu` provides efficient similarity search and clustering of dense vectors.

FAISS (Facebook AI Similarity Search) is a library developed by Facebook AI that allows for efficient similarity search and clustering of dense vectors. So, given a set of vectors(in this case a vector representation of a document i.e. an embedding), we can index them using Faiss — then using another vector (the query vector), we search for the most similar vectors within the index.

In [39]:
!pip install faiss-cpu --quiet

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


We will now create an index of our documents using the VectorstoreIndexCreator. This index will allow us to perform efficient similarity searches on our documents.

The VectorstoreIndexCreator is a utility that helps us create an index of our documents. It uses the embeddings of the documents to create the index. The embeddings are dense vectors that represent the documents. The index allows us to perform efficient similarity searches on the documents.

In [40]:
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import Chroma, AtlasDB, FAISS
from langchain.text_splitter import CharacterTextSplitter,RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader,CSVLoader

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 200,
    chunk_overlap  = 20,
    length_function = len,
    add_start_index = True,
)

index_creator = VectorstoreIndexCreator(
    vectorstore_cls=FAISS,
    embedding=embeddings,
    text_splitter = text_splitter
)
index = index_creator.from_loaders([loader])

### **2.d Querying t2t llm using RAG**

Let's test our index by querying it with a sample question.

The `index.query` function is used to perform a similarity search on the index. It takes a question and a language model as input, and returns the most similar documents in the index. After the relevant documents are retrieved, the LLM can be used to generate a coherent and contextually relevant answer based on the retrieved documents.

In [41]:
index.query(question=query, llm=sm_llm)

'all instances supported in SageMaker'

we will create a document search object using the FAISS vector store and our documents. This will allow us to perform similarity searches on our documents. Using this we retrieve the top 3 most similar docs to our query.

The `FAISS.from_documents` function is used to create a FAISS vector store from our documents. The embeddings of the documents are used to create the vector store. The vector store allows us to perform efficient similarity searches on the documents.

The `docsearch.similarity_search` function is used to perform a similarity search on the documents. It takes a query and a number of results to return as input, and returns the most similar documents in the vector store. The query is converted into an embedding and this embedding is then compared with the embeddings of the documents in the vector store.

In [42]:
documents = loader.load()
splitdocuments = text_splitter.split_documents(documents)
docsearch = FAISS.from_documents(splitdocuments, embeddings)
docs = docsearch.max_marginal_relevance_search(query, k=3)
docs

[Document(page_content='Q: Which instances can I use with Managed Spot Training?\n\nManaged Spot Training can be used with all instances supported in SageMaker.\n\nQ: Which Regions are supported with Managed Spot Training?', metadata={'source': 'https://aws.amazon.com/sagemaker/faqs/', 'start_index': 46752}),
 Document(page_content='Q: When should I use Managed Spot Training?', metadata={'source': 'https://aws.amazon.com/sagemaker/faqs/', 'start_index': 45014}),
 Document(page_content='Q: Why should I use SageMaker Serverless Inference?', metadata={'source': 'https://aws.amazon.com/sagemaker/faqs/', 'start_index': 55822})]

Finally, we will use our question-answering chain to answer our query using the documents we found.

The `chain` function is used to apply our question-answering chain to our query and documents.

In [43]:
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'output_text': 'all instances supported in SageMaker'}

Now let's try this with some data from our retail demo store. To do this upload the retail_items.csv file in the data provided to an s3 bucket for retrieval by this notebook and replace 'retail_data_s3_path' with your S3 path.

In [44]:
retail_data_s3_path = "s3://mysagebucket-4590283737/RAGFiles/"
!mkdir rag_data
!aws s3 cp --recursive $retail_data_s3_path rag_data

download: s3://mysagebucket-4590283737/RAGFiles/retail_items.csv to rag_data/retail_items.csv


Now let's preprocess the data in the csv to a format for suitable for documents(this is not prescriptive, just came to this using experimentation)

In [45]:
import pandas as pd
df = pd.read_csv('rag_data/retail_items.csv')

processed_df=df[['description']]
processed_df['description'] = df.apply(lambda row: f"{row['name']} is a {row['style']} in the {row['category']} category. Description: {row['description']} with a Price of ${row['price']} and Current stock is {row['current_stock']}.", axis=1)

processed_df.head(5)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  processed_df['description'] = df.apply(lambda row: f"{row['name']} is a {row['style']} in the {row['category']} category. Description: {row['description']} with a Price of ${row['price']} and Current stock is {row['current_stock']}.", axis=1)


Unnamed: 0,description
0,Sans Pareil Scarf is a scarf in the apparel ca...
1,Chef Knife is a kitchen in the housewares cate...
2,Gainsboro Jacket is a jacket in the apparel ca...
3,High Definition Speakers is a speaker in the e...
4,Spiffy Sandals is a sandals in the footwear ca...


In [46]:
processed_df[['description']].to_csv("rag_data/processed_retail_data.csv", index=False)

In [47]:
retail_data_loader = CSVLoader(file_path="rag_data/processed_retail_data.csv")
retail_data_documents = retail_data_loader.load() #we now load the data into documents

In [48]:
text_splitter = RecursiveCharacterTextSplitter( #we create a text splitter to split the documents into chunks
    chunk_size = 500,
    chunk_overlap  = 20,
    length_function = len,
    add_start_index = True,
)
retail_data_index_creator = VectorstoreIndexCreator( #we create an index creator to create an index of the embeddings of the documents
    vectorstore_cls=FAISS,
    embedding=embeddings,
    text_splitter=text_splitter,
)
retail_data_index = retail_data_index_creator.from_loaders([retail_data_loader])

Now let's query our base model without RAG

In [49]:
retail_query="What is the price and stock of Sans Pareil scarf?"

In [50]:
chain({"input_documents": input_documents, "question": retail_query}, return_only_outputs=True)

{'output_text': 'not enough information'}

And querying using RAG:

In [51]:
retail_data_index.query(question=retail_query, llm=sm_llm)

'$114.99 and Current stock is 6'

# Section 3
---
<h1><b> Prompt Engineering </b></h1>

<h2>
   Explore Prompt Engineering Techniques
    
</h2>

---

### **Prompting Principles**

- [3.a Write clear and specific instructions.](#3.a-Write-clear-and-specific-instructions)
- [3.b Give the model time to “think”.](#3.b-Give-the-model-time-to-"think")

### **3.a Tactics for 'Write clear and specific instructions'.**

#### Tactic 1: Use delimiters to clearly indicate distinct parts of the input

In [52]:
text = "CodeWhisperer is an AI-powered coding companion designed to assist developers in \
real-time within their Integrated Development Environment (IDE). \
It provides single-line or full-function code suggestions based on natural language comments, \
such as specific tasks or instructions. The suggestions are generated from large language models \
trained on billions of lines of code, including Amazon and open-source code. \
Developers can quickly accept, review, or continue writing their code, \
with the ability to edit suggestions to ensure accuracy. CodeWhisperer also offers \
specialized training for AWS APIs and helps in improving application security by detecting vulnerabilities. \
It supports multiple programming languages and can be used across various IDEs. \
The service also emphasizes responsible AI use, including bias filtering and \
tracking of suggestions that might resemble open-source training data."

delimiter_prompt = f"Summarize the text delimited by triple backticks into only a single sentence:\n```{text}```"

get_completion(delimiter_prompt)

['CodeWhisperer is an AI-powered coding companion designed to assist developers in real-time within their Integrated Development Environment (IDE).']

#### Tactic 2: Ask for a structured output

In [53]:
query = f"Summarise the key features of CodeWhisperer outlined in the text delimited by triple backticks\
into a single sentence. And output them in JSON form with key being a name of the feature and value \
being the description.For example,'feature1':'description1'\
```{text}```"

get_completion(query)

['CodeWhisperer is an AI-powered coding companion designed to assist developers in real-time within their Integrated Development Environment (IDE). It provides single-line or full-function code suggestions based on natural language comments, such as specific tasks or instructions. The suggestions are generated from large language models trained on billions of lines of code, including Amazon and open-source code. Developers can quickly accept, review, or continue writing their code, with the ability to edit suggestions to ensure accuracy. CodeWhisperer also offers specialized training for AWS APIs and helps in improving application security by detecting vulnerabilities. It supports multiple programming languages and can be used across various IDEs. The service also emphasizes responsible AI use, including bias filtering and tracking of suggestions that might resemble open-source training data.']

#### Tactic 3: Ask the model to check whether conditions are satisfied

In [54]:
text_1 = "Before you use CodeWhisperer for the first time, you do the following: \
Choose your IDE. Install or update your IDE (if applicable). \
Install or update the AWS Toolkit (if applicable). Choose your authentication method.\
Set up your Builder ID, IAM Identity Center, or IAM credentials."

prompt = f"""
You will be provided with text delimited by triple quotes. 
If it contains a sequence of instructions, \ 
re-write those instructions in the following format

Step 1 - ...
Step 2 - …
…
Step N - …

If the text does not contain a sequence of instructions, \ 
then simply write \"No steps provided.\"

\"\"\"{text_1}\"\"\"
"""

print(get_completion(prompt))

['Step 1 - Choose your IDE Step 2 - Install or update your IDE (if applicable) Step 3 - Install or update the AWS Toolkit (if applicable) Step 4 - Choose your authentication method Step 5 - Set up your Builder ID, IAM Identity Center, or IAM credentials']


In [55]:
text = "CodeWhisperer is an AI-powered coding companion designed to assist developers in \
real-time within their Integrated Development Environment (IDE). \
It provides single-line or full-function code suggestions based on natural language comments, \
such as specific tasks or instructions. The suggestions are generated from large language models \
trained on billions of lines of code, including Amazon and open-source code. \
Developers can quickly accept, review, or continue writing their code, \
with the ability to edit suggestions to ensure accuracy. CodeWhisperer also offers \
specialized training for AWS APIs and helps in improving application security by detecting vulnerabilities. \
It supports multiple programming languages and can be used across various IDEs. \
The service also emphasizes responsible AI use, including bias filtering and \
tracking of suggestions that might resemble open-source training data."

prompt = f"""
You will be provided with text delimited by triple quotes. 
If it contains a sequence of instructions, \ 
re-write those instructions in the following format:

Step 1 - ...
Step 2 - …
…
Step N - …

If the text does not contain a sequence of instructions, \ 
then simply write \"No steps provided.\"

\"\"\"{text}\"\"\"
"""

get_completion(prompt)

['No steps provided']

#### Tactic 4: "Few-shot" prompting

In [56]:
prompt = f"""
Your task is to answer in a consistent style.

<child>: Teach me about patience.

<grandparent>: The river that carves the deepest \ 
valley flows from a modest spring; the \ 
grandest symphony originates from a single note; \ 
the most intricate tapestry begins with a solitary thread.

<child>: Teach me about resilience.

<grandparent>: The mightiest oak stands tall not because \
it never faced a storm, but because it \
bent and swayed without breaking; the \
most enduring diamond was once mere coal, \
pressured yet unyielding; the brightest stars \
shine after the darkest nights.

<child>: Teach me about humility.
"""
get_completion(prompt)

['grandparent>: The humblest of men is the humblest of all; the proudest of men is the proudest of all; the most mighty of men is the most humble of all; the most mighty of men is the most humble of all.']

### **3.b Tactics for 'Give the model time to “think”.'**

#### Tactic 1: Specify the steps required to complete a task

In [57]:
text = "CodeWhisperer is an AI-powered coding companion designed to assist developers in \
real-time within their Integrated Development Environment (IDE). \
It provides single-line or full-function code suggestions based on natural language comments, \
such as specific tasks or instructions. The suggestions are generated from large language models \
trained on billions of lines of code, including Amazon and open-source code. \
Developers can quickly accept, review, or continue writing their code, \
with the ability to edit suggestions to ensure accuracy. CodeWhisperer also offers \
specialized training for AWS APIs and helps in improving application security by detecting vulnerabilities. \
It supports multiple programming languages and can be used across various IDEs. \
The service also emphasizes responsible AI use, including bias filtering and \
tracking of suggestions that might resemble open-source training data."
# example 1
prompt = f"""
Perform the following actions: 
1 - Summarize the following text delimited by triple \
backticks with 1 sentence.
2 - Translate the summary into French.

Separate your answers with line breaks.

Text:
```{text}```
"""
print("\nCompletion for prompt 1:")
get_completion(prompt)


Completion for prompt 1:


['CodeWhisperer est un aide à la coding à la base d’AI conçu pour aider les développeurs en temps réel dans leur environnement intégré de développement (IDE). Il fournit des suggestions de coding à la base de lignes ou de fonction complète à partir de commentaires de langue naturale, tels que des tâches ou des instructions. Les suggestions sont générées par des grandes modèles de langues éducés sur des milliards de lignes de coding, y compris de coding Amazon et de coding libre. Les développeurs peuvent rapidement accepter, revoir ou poursuivre l’écriture de leur coding, avec la possibilité d’editer les suggestions pour assurer l’exactitude. CodeWhisperer offre également une formation spécialisée pour les APIs de AWS et aide à améliorer la sécurité des applications en détectant les vulnérabilités. Il appuie plusieurs langues de programmation et peut être utilisé dans différentes IDEs. Le service souligne également l’utilisation responsable de l’AI, y compris la filtration des écarts et

#### Tactic 2: Instruct the model to work out its own solution before rushing to a conclusion

In [58]:
prompt = """
Determine if the student's solution is correct or not.

Question:
I'm building a solar power installation and I need \
 help working out the financials. 
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost \ 
me a flat $100k per year, and an additional $10 / square \
foot
What is the total cost for the first year of operations 
as a function of the number of square feet.

Student's Solution:
Let x be the size of the installation in square feet.
Costs:
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 100x
Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
"""

get_completion(prompt)

['No']

#### Note that the student's solution is actually not correct.
#### We can fix this by instructing the model to work out its own solution first.

In [60]:
prompt = f"""
Your task is to determine if the student's solution \
is correct or not.
To solve the problem do the following,
- First, work out your own solution to the problem. 
- Then compare your solution to the student's solution \ 
and evaluate if the student's solution is correct or not. 
Don't decide if the student's solution is correct until 
you have done the problem yourself.

Use the following format to outline your answer, break your process into steps, and explain each step
Question
```
question here
```
Student's solution
```
student's solution here
```
Actual solution
```
steps to work out the solution and your solution here
```
Is the student's solution the same as actual solution \
just calculated?
```
yes or no
```
Student grade
```
correct or incorrect
```

Question:
```
I'm building a solar power installation and I need help \
working out the financials. 
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost \
me a flat $100k per year, and an additional $10 / square \
foot
What is the total cost for the first year of operations \
as a function of the number of square feet.
``` 
Student's solution
```
Let x be the size of the installation in square feet.
Costs:
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 100x
Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
```
Actual solution. Analyse it step by step when explaining your reasoning why the student's solution is incorrect:
"""
get_completion(prompt)

["No, the student's solution is incorrect. The solar panels cost 250x / square foot, not 250x / square foot. The maintenance cost is 100x + 100x + 100x = 450x + 100,000. The total cost is 450x + 100,000 + 100x = 450x + 450x + 100,000."]

# Clean Up

<div class="alert alert-block alert-warning">
<b> IMPORTANT: Clean Up the resources to aviod charges for the resources in use. </b>
</div>

---
- Delete the endpoints for all the deployed models
- Delete the base model image. You can choose to not delete the trained/fine-tuned models from S3 so that you can redeploy them in future. Be aware of the storage charges involved
- Shutdown the kernel of this notebook and any active kernels on the other notebooks *(Check the running terminals and kernels icon in the left navigation of SageMaker studio)*

---

In [61]:
# Base Model - Delete the SageMaker endpoint and the model stored on S3
base_model_predictor.delete_model()
base_model_predictor.delete_endpoint()

In [62]:
base_embedding_model_predictor.delete_model()
base_embedding_model_predictor.delete_endpoint()

## Release the Notebook Resources

In [63]:
%%html

<p><b>Shutting down your kernel for this notebook to release resources.</b></p>
<button class="sm-command-button" data-commandlinker-command="kernelmenu:shutdown" style="display:none;">Shutdown Kernel</button>
        
<script>
try {
    els = document.getElementsByClassName("sm-command-button");
    els[0].click();
}
catch(err) {
    // NoOp
}    
</script>

<div class="alert alert-block alert-info">
<b>You have successfully completed the lab session. Do not forget to share your feedback through the below survey. </b>
</div>
