# <div> <b> UKIR ENT Retail SA Workshop - 09August2023 - Generative AI </b> </div>  

---
<div class="alert alert-block alert-info">
<b>Tip:</b> Use the notebook environment: PyTorch 1.13  Python 3.8 GPU Optimized | ml.g4dn.xlarge
</div>

---

---
<h1><b> TODO: RS - Update the section </b></h1>

<h2>
   Brief Description of the notebook and agenda with links to each section 
    
</h2>

---

## SetUp

---
<div class="alert alert-block alert-warning">
<b> 
    - Install required core packages used in rest of the sections of the notebook.
    - Set global parameters required for the lab.
</b>
</div>

---

In [2]:
!pip install ipywidgets==7.0.0 --quiet
!pip install --upgrade sagemaker --quiet

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [3]:
import sagemaker, boto3, json
from sagemaker import get_execution_role
from sagemaker import image_uris, model_uris, script_uris, hyperparameters
from sagemaker.model import Model
from sagemaker.predictor import Predictor

aws_role = get_execution_role()
aws_region = boto3.Session().region_name
sm_session = sagemaker.Session()

In [4]:
MODEL_ID = 'huggingface-text2text-flan-t5-xl'  # this is the default model for this lab
MODEL_VERSION = '*'
INF_INSTANCE_TYPE = 'ml.g5.2xlarge'
INF_INSTANCE_COUNT = 1
INF_IMAGE_SCOPE = 'inference'
TRN_INSTANCE_TYPE = 'ml.g5.12xlarge'
TRN_INSTANCE_COUNT = 1
TRN_IMAGE_SCOPE = 'training'
MODEL_DATA_DOWNLOAD_TIMEOUT = 3600  # in seconds
CONTAINER_STARTUP_HEALTH_CHECK_TIMEOUT = 3600
EBS_VOLUME_SIZE = 256  # in GB
CONTENT_TYPE = 'application/json'
MODEL_ENDPOINT_PREFIX = 'uki-ent-ret-sa'

# Section 1
---
<h1><b> TODO: RS - Update the section </b></h1>

<h2>
   Brief Description of the notebook and agenda with links to each section 

   Deploy the state-of-the-art pre-trained model **[FLAN T5 models](https://huggingface.co/docs/transformers/model_doc/flan-t5)** and query the endpoint to generate response from the base model.
    
</h2>

---

### **STEPS**
- [1.a Select a model](#1.a-Select-a-model)
- [1.b Retrieve Artifacts & Deploy an Endpoint](#1.b-Retrieve-Artifacts-&-Deploy-an-Endpoint)
- [1.c Query endpoint and parse response](#1.c-Query-endpoint-and-parse-response)
- [1.d Advanced features: How to use various parameters to control the generated text](#1.d-Advanced-features:-How-to-use-various-advanced-parameters-to-control-the-generated-text)

### **1.a Select a pre-trained model**
***
You can continue with the default model, or can choose a different model from the dropdown generated upon running the next cell. A complete list of SageMaker pre-trained models can also be accessed at [SageMaker pre-trained Models](https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html#).
***

In [5]:
from ipywidgets import Dropdown
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models

# Retrieves all Text Generation models available by SageMaker Built-In Algorithms.
filter_value = "task == text2text"
text_generation_models = list_jumpstart_models(filter=filter_value)

# display the model-ids in a dropdown to select a model for inference.
model_dropdown = Dropdown(
    options=text_generation_models,
    value=MODEL_ID,
    description="Select a model",
    style={"description_width": "initial"},
    layout={"width": "max-content"},
)

In [6]:
display(model_dropdown)

A Jupyter Widget

<div class="alert alert-block alert-warning">
    The current notebook is only tested against the model : <b> huggingface-text2text-flan-t5-xl </b>
</div>

In [7]:
# model_version="*" fetches the latest version of the model
MODEL_ID, MODEL_VERSION = model_dropdown.value, "*"

### **1.b Retrieve Artifacts & Deploy an Endpoint**

In [8]:
# Define a unique endpoint name for the current model deployment. Add current timestamp as the suffix if needed
base_endpoint_name = f'{MODEL_ENDPOINT_PREFIX}-base-{MODEL_ID}'

In [9]:
# Retrieve the inference docker container uri.
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    image_scope=INF_IMAGE_SCOPE,
    model_id=MODEL_ID,
    model_version=MODEL_VERSION,
    instance_type=INF_INSTANCE_TYPE
)

In [10]:
# Retrieve the model uri.
model_uri = model_uris.retrieve(
    model_id=MODEL_ID, model_version=MODEL_VERSION, model_scope=INF_IMAGE_SCOPE
)

#### **huggingface-text2text-flan-t5-xl** is already packed with the inference script and model artifacts, so the `source_dir` argument and entryPoint script to the Model are not required.


In [15]:
# Create the SageMaker model instance. Note that we need to pass Predictor class when we deploy model through Model class,
# for being able to run inference through the sagemaker API.

model = Model(
    image_uri=deploy_image_uri,
    model_data=model_uri,
    role=aws_role,
    predictor_cls=Predictor,
    name=base_endpoint_name
)

In [13]:
# deploy the Model. TODO
base_model_predictor = model.deploy(
    initial_instance_count=INF_INSTANCE_COUNT,
    instance_type=INF_INSTANCE_TYPE,
    endpoint_name=base_endpoint_name
)

Using already existing model: uki-ent-ret-sa-base-huggingface-text2text-flan-t5-xl


---------!

### **1.c Query endpoint and parse response**

In [16]:
newline, bold, unbold = "\n", "\033[1m", "\033[0m"


def query_endpoint(encoded_text, endpoint_name):
    client = boto3.client("runtime.sagemaker")
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType="application/x-text", Body=encoded_text
    )
    return response


def parse_response(query_response):
    model_predictions = json.loads(query_response["Body"].read())
    generated_text = model_predictions["generated_text"]
    return generated_text

In [29]:
newline, bold, unbold = "\n", "\033[1m", "\033[0m"

text1 = "Translate to German:  My name is SageMaker Jumpstart"
text2 = "A step by step guide to deploy a large language model:"


for text in [text1, text2]:
    query_response = query_endpoint(text.encode("utf-8"), endpoint_name=base_endpoint_name)
    generated_text = parse_response(query_response)
    print(
        f"Inference:{newline}"
        f"input text: {text}{newline}"
        f"generated text: {bold}{generated_text}{unbold}{newline}"
    )

Inference:
input text: Translate to German:  My name is SageMaker Jumpstart
generated text: [1mIch bin SageMaker Jumpstart.[0m

Inference:
input text: A step by step guide to deploy a large language model:
generated text: [1mStep 1: Create a large language model. Step 2: Create a large language model[0m



### **1.d. Advanced features: How to use various advanced parameters to control the generated text**

***
This model also supports many advanced parameters while performing inference. They include:

* **max_length:** Model generates text until the output length (which includes the input context length) reaches `max_length`. If specified, it must be a positive integer.
* **num_return_sequences:** Number of output sequences returned. If specified, it must be a positive integer.
* **num_beams:** Number of beams used in the greedy search. If specified, it must be integer greater than or equal to `num_return_sequences`.
* **no_repeat_ngram_size:** Model ensures that a sequence of words of `no_repeat_ngram_size` is not repeated in the output sequence. If specified, it must be a positive integer greater than 1.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **early_stopping:** If True, text generation is finished when all beam hypotheses reach the end of sentence token. If specified, it must be boolean.
* **do_sample:** If True, sample the next word as per the likelihood. If specified, it must be boolean.
* **top_k:** In each step of text generation, sample from only the `top_k` most likely words. If specified, it must be a positive integer.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.
* **seed:** Fix the randomized state for reproducibility. If specified, it must be an integer.

We may specify any subset of the parameters mentioned above while invoking an endpoint. Next, we show an example of how to invoke endpoint with these arguments

***

In [48]:
# Input must be a json
payload = {
    "text_inputs": "Tell me the steps to create an ec2 instance on AWS cloud",
    "max_length": 100,
    "num_return_sequences": 1,
    "top_k": 20,
    "top_p": 0.8,
    "do_sample": True,
    "temperature":0.8
}


def query_endpoint_with_json_payload(encoded_json, endpoint_name):
    client = boto3.client("runtime.sagemaker")
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType="application/json", Body=encoded_json
    )
    return response


query_response = query_endpoint_with_json_payload(
    json.dumps(payload).encode("utf-8"), endpoint_name=base_endpoint_name
)


def parse_response_multiple_texts(query_response):
    model_predictions = json.loads(query_response["Body"].read())
    generated_text = model_predictions["generated_texts"]
    return generated_text


generated_texts = parse_response_multiple_texts(query_response)
print(generated_texts)

['Create an AWS account. Navigate to the AWS Console. Click on the Create EC2 instance link. Enter the required information. Click on Create.']


---
#### **Text Summarization**

---

In [54]:
text = "CodeWhisperer is an AI coding companion that generates real-time, single-line or full-function code suggestions in your Integrated Development Environment (IDE) to help you quickly build software. With CodeWhisperer, you can write a comment in natural language that outlines a specific task in English, such as “Upload a file with server-side encryption.” Based on this information, CodeWhisperer recommends one or more code snippets directly in the IDE that can accomplish the task. You can quickly and easily accept the top suggestion (tab key), view more suggestions (arrow keys), or continue writing your own code. You should always review a code suggestion before accepting them, and you may need to edit it to ensure it does exactly what you intended. CodeWhisperer helps accelerate software development by providing code suggestions that reduce total development effort and allow more time for ideation, complex problem solving, and writing differentiated code. In addition to general purpose code suggestions, CodeWhisperer has additional training to provide code suggestions for using AWS APIs. CodeWhisperer can also help you improve application security by helping detect and remediate security vulnerabilities. As you are writing code, CodeWhisperer analyzes the English language comments and surrounding code to infer what code is needed to complete the task at hand. CodeWhisperer suggests one or more code snippets directly in the code editor, accelerating you as you code. The code suggestions provided by CodeWhisperer are based on a large language models (LLMs) trained on billions of lines of code, including Amazon and open-source code. You can quickly and more easily accept the top suggestion (tab key), view more suggestions (arrow keys), or continue writing your own code. Always review a code suggestion before accepting it, and you may need to edit it to ensure that it does exactly what you intended."

In [53]:
prompts = [
    "Briefly summarize this sentence: {text}",
    "Write a short summary for this text: {text}",
    "Generate a short summary this sentence:\n{text}",
    "{text}\n\nWrite a brief summary in a sentence or less",
    "{text}\nSummarize the aforementioned text in a single phrase.",
    "{text}\nCan you generate a short summary of the above paragraph?",
    "Write a sentence based on this summary: {text}",
    "Write a sentence based on '{text}'",
    "Summarize this article:\n\n{text}",
]

num_return_sequences = 1
parameters = {
    "max_length": 50,
    "num_return_sequences": num_return_sequences,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}

print(f"{bold}Number of return sequences are set as {num_return_sequences}{unbold}{newline}")
for each_prompt in prompts:
    payload = {"text_inputs": each_prompt.replace("{text}", text), **parameters}
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=base_endpoint_name
    )
    generated_texts = parse_response_multiple_texts(query_response)
    print(f"{bold} For prompt: '{each_prompt}'{unbold}{newline}")
    print(f"{bold} The {num_return_sequences} summarized results are{unbold}:{newline}")
    for idx, each_generated_text in enumerate(generated_texts):
        print(f"{bold}Result {idx}{unbold}: {each_generated_text}{newline}")

[1mNumber of return sequences are set as 1[0m

[1m For prompt: 'Briefly summarize this sentence: {text}'[0m

[1m The 1 summarized results are[0m:

[1mResult 0[0m: CodeWhisperer is an AI coding companion that generates real-time, single-line or full-function code suggestions in your Integrated Development Environment (IDE) to help you quickly build software

[1m For prompt: 'Write a short summary for this text: {text}'[0m

[1m The 1 summarized results are[0m:

[1mResult 0[0m: CodeWhisperer is an AI coding companion that generates real-time, single-line or full-function code suggestions in your Integrated Development Environment (IDE) to help you quickly build software.

[1m For prompt: 'Generate a short summary this sentence:
{text}'[0m

[1m The 1 summarized results are[0m:

[1mResult 0[0m: CodeWhisperer is an AI coding companion that generates real-time, single-line or full-function code suggestions in your Integrated Development Environment (IDE) to help you quickl

---
#### **Question and Answering**

---

In [58]:
context = """The newest and most innovative Kindle yet lets you take notes on millions of books and documents, write lists and journals, and more. 

For readers who have always wished they could write in their eBooks, Amazon’s new Kindle lets them do just that. The Kindle Scribe is the first Kindle for reading and writing and allows users to supplement their books and documents with notes, lists, and more.

Here’s everything you need to know about the Kindle Scribe, including frequently asked questions.

The Kindle Scribe makes it easy to read and write like you would on paper 

The Kindle Scribe features a 10.2-inch, glare-free screen (the largest of all Kindle devices), crisp 300 ppi resolution, and 35 LED front lights that automatically adjust to your environment. Further personalize your experience with the adjustable warm light, font sizes, line spacing, and more.

It comes with your choice of the Basic Pen or the Premium Pen, which you use to write on the screen like you would on paper. They also attach magnetically to your Kindle and never need to be charged. The Premium Pen includes a dedicated eraser and a customizable shortcut button.

The Kindle Scribe has the most storage options of all Kindle devices: choose from 8 GB, 16 GB, or 32 GB to suit your level of reading and writing.
"""
question = "what are the key features of new Kindle?"

In [72]:
prompts = [
    """Answer based on context:\n\n{context}\n\n{question}""",
    """{context}\n\nAnswer this question based on the article: {question}""",
    """{context}\n\n{question}""",
    """{context}\nAnswer this question: {question}""",
    """Read this article and answer this question {context}\n{question}""",
    """{context}\n\nBased on the above article, answer a question. {question}""",
    """Write an article that answers the following question: {question} {context}""",
]


parameters = {
    "max_length": 50,
    "num_return_sequences": 1,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}


for each_prompt in prompts:
    input_text = each_prompt.replace("{context}", context)
    input_text = input_text.replace("{question}", question)
    #print(f"{bold} For prompt{unbold}: '{each_prompt}'{newline}")
    payload = {"text_inputs": input_text, **parameters}
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=base_endpoint_name
    )
    generated_texts = parse_response_multiple_texts(query_response)
    print(f"{bold} The reasoning result is{unbold}: '{generated_texts}'{newline}")

[1m The reasoning result is[0m: '['Kindle Scribe']'

[1m The reasoning result is[0m: '['10.2-inch, glare-free screen']'

[1m The reasoning result is[0m: '['The Kindle Scribe features a 10.2-inch, glare-free screen (the largest of all Kindle devices), crisp 300 ppi resolution, and 35 LED front lights that automatically adjust to your environment']'

[1m The reasoning result is[0m: '['The Kindle Scribe is the first Kindle for reading and writing and allows users to supplement their books and documents with notes, lists, and more.']'

[1m The reasoning result is[0m: '['the large screen']'

[1m The reasoning result is[0m: '['1.']'

[1m The reasoning result is[0m: '['More than 60 percent of customers say they are excited to use the Kindle Scribe.']'



---
#### **Imaginary article generation based on a title**

---

In [70]:
title = "AnyCompany business has a new product category coming up"

In [71]:
prompts = [
    """Title: \"{title}\"\\nGiven the above title of an imaginary article, imagine the article.\\n"""
]


parameters = {
    "max_length": 5000,
    "num_return_sequences": 1,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}


for each_prompt in prompts:
    input_text = each_prompt.replace("{title}", title)
    #print(f"{bold} For prompt{unbold}: '{input_text}'{newline}")
    payload = {"text_inputs": input_text, **parameters}
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=base_endpoint_name
    )
    generated_texts = parse_response_multiple_texts(query_response)
    print(f"{bold} The reasoning result is{unbold}: '{generated_texts}'{newline}")

[1m The reasoning result is[0m: '['There’s always been a smattering of people talking about the products that AnyCompany has to offer. There is, after all, an entire online marketplace of companies, some of which are known for their expertise in one particular industry.']'



# Section 2
---
<h1><b> TODO: RS - Update the section </b></h1>

<h2>
   Brief Description of the notebook and agenda with links to each section 
    <br/>

   - **Implement RAG** 
    
</h2>

---

# Section 3
---
<h1><b> TODO: RS - Update the section </b></h1>

<h2>
   Brief Description of the notebook and agenda with links to each section 

   - Explore Prompt Engineering Techniques
    
</h2>

---

# Section 4
---
# **TODO: RS - Update the section** #

##   - **Transfer Learning for Domain Adaptation** ##

---

## **TBR - RS**
### 4.1. Retrieve Training artifacts
Here, for the selected model, we retrieve the training docker container, the training algorithm source, the pre-trained model, and a python dictionary of the training hyper-parameters that the algorithm accepts with their default values. Note that the model_version="*" fetches the latest model. Also, we do need to specify the training_instance_type to fetch train_image_uri.

In [None]:
# Retrieve the docker image
train_image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    model_id=MODEL_ID,
    model_version=MODEL_VERSION,
    image_scope=TRN_IMAGE_SCOPE,
    instance_type=TRN_INSTANCE_TYPE,
)

# Retrieve the training script
train_source_uri = script_uris.retrieve(
    model_id=MODEL_ID, model_version=MODEL_VERSION, script_scope=TRN_IMAGE_SCOPE
)

# Retrieve the pre-trained model tarball to further fine-tune
train_model_uri = model_uris.retrieve(
    model_id=MODEL_ID, model_version=MODEL_VERSION, model_scope=TRN_IMAGE_SCOPE
)

### 4.2. Set Training parameters
Now that we are done with all the setup that is needed, we are ready to fine-tune our Text Classification model. To begin, let us create a [``sageMaker.estimator.Estimator``](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html) object. This estimator will launch the training job. 

There are two kinds of parameters that need to be set for training. 

The first one are the parameters for the training job. These include: (i) Training data path. This is S3 folder in which the input data is stored, (ii) Output path: This the s3 folder in which the training output is stored. (iii) Training instance type: This indicates the type of machine on which to run the training. Typically, we use GPU instances for these training. We defined the training instance type above to fetch the correct train_image_uri. 
***
The second set of parameters are algorithm specific training hyper-parameters. It is also used for sepcifying the model name if we want to fine-tune on the model which is not present in the dropdown list.
***

# Clean Up

<div class="alert alert-block alert-warning">
<b> IMPORTANT: Clean Up the resources to aviod charges for the resources in use. </b>
</div>

---
- Delete the endpoints for all the deployed models
- Delete the base model image. You can choose to not delete the trained/fine-tuned models from S3 so that you can redeploy them in future. Be aware of the storage charges involved
- Shutdown the kernel of this notebook and any active kernels on the other notebooks *(Check the running terminals and kernels icon in the left navigation of SageMaker studio)*

---

In [None]:
# Base Model - Delete the SageMaker endpoint and the model stored on S3
base_model_predictor.delete_model()
base_model_predictor.delete_endpoint()

In [None]:
# PEFT Model - Delete the SageMaker endpoint and the model stored on S3
# peft_model_predictor.delete_model() # Optional
peft_model_predictor.delete_endpoint()

In [None]:
# PEFT Model - Delete the SageMaker endpoint and the model stored on S3
# train_model_predictor.delete_model() # Optional
train_model_predictor.delete_endpoint()

## Release the Notebook Resources

In [None]:
%%html

<p><b>Shutting down your kernel for this notebook to release resources.</b></p>
<button class="sm-command-button" data-commandlinker-command="kernelmenu:shutdown" style="display:none;">Shutdown Kernel</button>
        
<script>
try {
    els = document.getElementsByClassName("sm-command-button");
    els[0].click();
}
catch(err) {
    // NoOp
}    
</script>

<div class="alert alert-block alert-info">
<b>You have successfully completed the lab session. Do not forget to share your feedback through the below survey. </b>
</div>

### **TODO :: - Pulse Survey Link**
