# SageMaker JumpStart Foundation Models - HuggingFace Text2Text Generation

Note: This notebook was tested on ml.t3.medium instance in Amazon SageMaker Studio with Python 3 (Data Science) kernel and in Amazon SageMaker Notebook instance with conda_python3 kernel.

### 1. Set Up

---
Before executing the notebook, there are some initial steps required for set up. This notebook requires ipywidgets.

---

In [2]:
!pip install ipywidgets==7.0.0 --quiet
!pip install --upgrade sagemaker --quiet

[0m[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063[0m[33m
[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sagemaker-datawrangler 0.4.3 requires sagemaker-data-insights==0.4.0, but you have sagemaker-data-insights 0.3.3 which is incompatible.[0m[31m
[0m[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be 

#### Permissions and environment variables

---
To host on Amazon SageMaker, we need to set up and authenticate the use of AWS services. Here, we use the execution role associated with the current notebook as the AWS account role with SageMaker access. 

---

In [3]:
import sagemaker, boto3, json
from sagemaker.session import Session

sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()



sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


## 2. Select a pre-trained model
***
You can continue with the default model, or can choose a different model from the dropdown generated upon running the next cell. A complete list of SageMaker pre-trained models can also be accessed at [SageMaker pre-trained Models](https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html#).
***

In [13]:
#model_id, model_version = "huggingface-text2text-flan-t5-xl", "*"
model_id, model_version = "meta-textgeneration-llama-2-7b", "*"

***
[Optional] Select a different SageMaker pre-trained model. Here, we download the model_manifest file from the Built-In Algorithms s3 bucket, filter-out all the Text Generation models and select a model for inference.
***

### 3. Retrieve Artifacts & Deploy an Endpoint

***

Using SageMaker, we can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset. We start by retrieving the `deploy_image_uri`, `deploy_source_uri`, and `model_uri` for the pre-trained model. To host the pre-trained model, we create an instance of [`sagemaker.model.Model`](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html) and deploy it. This may take a few minutes.

***

In [14]:
def get_sagemaker_session(local_download_dir) -> sagemaker.Session:
    """Return the SageMaker session."""

    sagemaker_client = boto3.client(
        service_name="sagemaker", region_name=boto3.Session().region_name
    )

    session_settings = sagemaker.session_settings.SessionSettings(
        local_download_dir=local_download_dir
    )

    # the unit test will ensure you do not commit this change
    session = sagemaker.session.Session(
        sagemaker_client=sagemaker_client, settings=session_settings
    )

    return session

We need to create a directory to host the downloaded model. 

In [15]:
!mkdir -p download_dir

---
This text-to-text generation task supports a wide variety of model sizes that have different compute requirements. Here, we specify the instance type for several large models along with an environment variable to set the multi-model endpoint number of workers to 1. This ensures we can support the largest possible token lengths since additional models are not consuming GPU memory resources.

---

In [16]:
_large_model_env = {
    "SAGEMAKER_MODEL_SERVER_WORKERS": "1",
    "TS_DEFAULT_WORKERS_PER_MODEL": "1"
}
_model_env_variable_map = {
    "huggingface-text2text-flan-t5-xxl": _large_model_env,
    "huggingface-text2text-flan-t5-xxl-fp16": _large_model_env,
    "huggingface-text2text-flan-t5-xxl-bnb-int8": _large_model_env,
    "huggingface-text2text-flan-t5-xl": {"MMS_DEFAULT_WORKERS_PER_MODEL": "1"},
    "huggingface-text2text-flan-t5-large": {"MMS_DEFAULT_WORKERS_PER_MODEL": "1"},
    "huggingface-text2text-flan-ul2-bf16": _large_model_env,
    "huggingface-text2text-bigscience-t0pp": _large_model_env,
    "huggingface-text2text-bigscience-t0pp-fp16": _large_model_env,
    "huggingface-text2text-bigscience-t0pp-bnb-int8": _large_model_env,
}

In [17]:
from sagemaker import image_uris, instance_types, model_uris, script_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base


endpoint_name = name_from_base(f"jumpstart-example-{model_id}")

# Retrieve the inference instance type for the specified model.
instance_type = instance_types.retrieve_default(
    model_id=model_id, model_version=model_version, scope="inference"
)

# Retrieve the inference docker container uri. This is the base HuggingFace container image for the default model above.
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,  # automatically inferred from model_id
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=instance_type,
)

# Retrieve the inference script uri. This includes all dependencies and scripts for model loading, inference handling etc.
deploy_source_uri = script_uris.retrieve(
    model_id=model_id, model_version=model_version, script_scope="inference"
)

# Retrieve the model uri.
model_uri = model_uris.retrieve(
    model_id=model_id, model_version=model_version, model_scope="inference"
)

# Create the SageMaker model instance
if model_id in _model_env_variable_map:
    # For those large models, we already repack the inference script and model
    # artifacts for you, so the `source_dir` argument to Model is not required.
    model = Model(
        image_uri=deploy_image_uri,
        model_data=model_uri,
        role=aws_role,
        predictor_cls=Predictor,
        name=endpoint_name,
        env=_model_env_variable_map[model_id],
    )
else:
    model = Model(
        image_uri=deploy_image_uri,
        source_dir=deploy_source_uri,
        model_data=model_uri,
        entry_point="inference.py",  # entry point file in source_dir and present in deploy_source_uri
        role=aws_role,
        predictor_cls=Predictor,
        name=endpoint_name,
        sagemaker_session=get_sagemaker_session("download_dir"),
    )

# deploy the Model. Note that we need to pass Predictor class when we deploy model through Model class,
# for being able to run inference through the sagemaker API.
model_predictor = model.deploy(
    initial_instance_count=1,
    instance_type=instance_type,
    predictor_cls=Predictor,
    endpoint_name=endpoint_name,
)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


ReadError: not a gzip file

### 4. Query endpoint and parse response

---
Input to the endpoint is any string of text formatted as json and encoded in `utf-8` format. Output of the endpoint is a `json` with generated text.

---

In [14]:
newline, bold, unbold = "\n", "\033[1m", "\033[0m"


def query_endpoint(encoded_text, endpoint_name):
    client = boto3.client("runtime.sagemaker")
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType="application/x-text", Body=encoded_text
    )
    return response


def parse_response(query_response):
    model_predictions = json.loads(query_response["Body"].read())
    generated_text = model_predictions["generated_text"]
    return generated_text

---
Below, we put in some example input text. You can put in any text and the model predicts next words in the sequence. Longer sequences of text can be generated by calling the model repeatedly.

---

In [16]:
# Input must be a json
#payload = {
#    "text_inputs": "Tell me the steps to make a pizza",
#    "max_length": 50,
#    "max_time": 50,
#    "num_return_sequences": 3,
#    "top_k": 50,
#    "top_p": 0.95,
#    "do_sample": True,
#}


def query_endpoint_with_json_payload(encoded_json, endpoint_name):
    client = boto3.client("runtime.sagemaker")
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType="application/json", Body=encoded_json
    )
    return response


query_response = query_endpoint_with_json_payload(
    json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
)


def parse_response_multiple_texts(query_response):
    model_predictions = json.loads(query_response["Body"].read())
    generated_text = model_predictions["generated_texts"]
    return generated_text


## Step 1. Sentiment Classification

Define the sentence you want to classify and the corresponding options.

In [57]:
#sentence = "Help with my tv always telling me I have too many streaming devices at one. I do not. Happens every day!"
#sentence = "The computer chat is bothersome and wastes time for customer.  Make an option to connect to a live agent.  The 'chat' changed the time from midnight to 10pm for talking to someone.  That was weird.  Then, when cancelling Premier Channels, was able to cancel one and then got an unusual message telling me that I could not do it online - ? - weird, I already did the first one,  and the webpage told me to call the phone number ********** to cancel channels.  I called the number, more than once.  This number is to move a phone number not do anything with channels.  I tried repeatedly over and over, as time went by and finally was able to cancel other premier channels that were part of a 3 month trial by going out of the webpages and then back in.  There is a glitch with the times changing during chat, with the wrong phone number given in chat and with changing/canceling channels online giving wrong information and making it difficult -  telling me that I could not do it online, after I already had cancelled one.  Additionally My Bill did not arrive in the mail again, despite telling Direct TV for months that I want mail statements.    I don't know why it keeps switching me to no statements.  The account shows a credit and also shows a late fee.  That doesn't make sense.  I don't want to make a payment with a $25.90 credit showing.    The billing in confusing."
#sentence = "I need someone who can explain the sports package fee.  I use my Hotspot because I live too far to get internet or DSL.  It's hard to explain to offshore folks what I'm talking about."
sentence = "I really enjoy DirecTV service.  Every since the pandemic, I have been addicted to my television thanks to DTV!  Keep up the good work!"


options_ = """OPTIONS:\n-positive \n-negative """

In [58]:
prompts = [
    """Review:\n{sentence}\nIs this movie review sentence negative or positive?\n{options_}""",
    """Short movie review: {sentence}\nDid the critic think positively or negatively of the movie?\n{options_}""",
    """Sentence from a movie review: {sentence}\nWas the movie seen positively or negatively based on the preceding review? \n\n{options_}""",
    """\"{sentence}\"\nHow would the sentiment of this sentence be perceived?\n\n{options_}""",
    """Is the sentiment of the following sentence positive or negative?\n{sentence}\n{options_}""",
    """What is the sentiment of the following movie review sentence?\n{sentence}\n{options_}""",
]

parameters = {
    "max_length": 50,
    "max_time": 50,
    "num_return_sequences": 1,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}


for each_prompt in prompts:
    input_text = each_prompt.replace("{sentence}", sentence)
    input_text = input_text.replace("{options_}", options_)
    print(f"{bold} For prompt{unbold}: '{input_text}'{newline}")
    payload = {"text_inputs": input_text, **parameters}
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
    )
    generated_texts = parse_response_multiple_texts(query_response)
    print(f"{bold} The reasoning result is{unbold}: '{generated_texts}'{newline}")

[1m For prompt[0m: 'Review:
I really enjoy DirecTV service.  Every since the pandemic, I have been addicted to my television thanks to DTV!  Keep up the good work!
Is this movie review sentence negative or positive?
OPTIONS:
-positive 
-negative '

[1m The reasoning result is[0m: '['positive']'

[1m For prompt[0m: 'Short movie review: I really enjoy DirecTV service.  Every since the pandemic, I have been addicted to my television thanks to DTV!  Keep up the good work!
Did the critic think positively or negatively of the movie?
OPTIONS:
-positive 
-negative '

[1m The reasoning result is[0m: '['negative']'

[1m For prompt[0m: 'Sentence from a movie review: I really enjoy DirecTV service.  Every since the pandemic, I have been addicted to my television thanks to DTV!  Keep up the good work!
Was the movie seen positively or negatively based on the preceding review? 

OPTIONS:
-positive 
-negative '

[1m The reasoning result is[0m: '['positive']'

[1m For prompt[0m: '"I reall

## Step 2-  Email generation based on a subject

In [59]:
#title = "Service outage due to hurricane"
title = sentence

In [60]:
prompts = [
    """Title: \"{title}\"\\nGiven the above title of an imaginary email, imagine the email.\\n""",
    """Subject: \"{title}\"\\nGiven the above subject of an imaginary email, write the email.\\n""",
    """Compose an email based on the subject \"{title}\"."""
]


parameters = {
    "max_length": 5000,
    "max_time": 50,
    "num_return_sequences": 1,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}



for each_prompt in prompts:
    input_text = each_prompt.replace("{title}", title)
    print(f"{bold} For prompt{unbold}: '{input_text}'{newline}")
    payload = {"text_inputs": input_text, **parameters}
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
    )
    generated_texts = parse_response_multiple_texts(query_response)
    print(f"{bold} The reasoning result is{unbold}: '{generated_texts}'{newline}")

[1m For prompt[0m: 'Title: "I really enjoy DirecTV service.  Every since the pandemic, I have been addicted to my television thanks to DTV!  Keep up the good work!"\nGiven the above title of an imaginary email, imagine the email.\n'

[1m The reasoning result is[0m: '["We were going to send an email to the President of DirecTV (which is actually the Company), since she was going to be our next President. However, as we were going through the process, we had to change the subject line to fit her schedule... but still wanted to let you know about how much we enjoy your service! We don't plan on switching anytime soon!"]'

[1m For prompt[0m: 'Subject: "I really enjoy DirecTV service.  Every since the pandemic, I have been addicted to my television thanks to DTV!  Keep up the good work!"\nGiven the above subject of an imaginary email, write the email.\n'

[1m The reasoning result is[0m: '["DTV Greetings, Dear DTV Customer, I've always enjoyed the quality and variety of the service t

## Step 3. Chat log Summarization

Input the Chat Log you want to summarize.

In [61]:
text = """Customer: HI
            Agent: Hello, how can i help you today? 
            Customer:I need my account password reset 
            Agent: ok, can you provide me your username? 
            Customer: smithzgg 
            Agent:Please give me a moment. 
            Customer:ok 
            Agent:ok your password has been reset, you will need to reset it on your next login. 
            Customer: ok, thank you 
            Agent: no problem, is there anything else I can help with? 
            Customer: no thank you 
            Agent: Great! have a nice day 
            Customer: you too, bye 
            Agent:goodbye"""

In [62]:
prompts = [
    "Summarize this article:\n\n{text}",
]

num_return_sequences = 1
parameters = {
    "max_length": 50,
    "max_time": 50,
    "num_return_sequences": num_return_sequences,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}

print(f"{bold}Number of return sequences are set as {num_return_sequences}{unbold}{newline}")
for each_prompt in prompts:
    payload = {"text_inputs": each_prompt.replace("{text}", text), **parameters}
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
    )
    generated_texts = parse_response_multiple_texts(query_response)
    print(f"{bold} For prompt: '{each_prompt}'{unbold}{newline}")
    print(f"{bold} The {num_return_sequences} summarized results are{unbold}:{newline}")
    for idx, each_generated_text in enumerate(generated_texts):
        print(f"{bold}Result {idx}{unbold}: {each_generated_text}{newline}")

[1mNumber of return sequences are set as 1[0m

[1m For prompt: 'Summarize this article:

{text}'[0m

[1m The 1 summarized results are[0m:

[1mResult 0[0m: Customer asks to reset his password, but forgot his username and password. The agent reset the password.



## Step 3a. Question and Answering

Now, let's ask a question based on the same chat log

In [63]:
#context = """Customer: HI Agent: Hello, how can i help you today Customer:I need my account password reset Agent: ok, can you provide me your username? Customer: smithzgg Agent:Please give me a moment. Customer:ok Agent:ok your password has been reset, you will need to reset it on your next login. Customer: ok, thank you Agent: no problem, is there anything else I can help with? Customer: no thank you Agent: Great! have a nice day Customer: you too, bye Agent:goodbye
#"""
context = text
question = "Was the customer happy with this action?"

In [64]:
prompts = [
    """Answer based on context:\n\n{context}\n\n{question}"""
]


parameters = {
    "max_length": 50,
    "max_time": 50,
    "num_return_sequences": 1,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}


for each_prompt in prompts:
    input_text = each_prompt.replace("{context}", context)
    input_text = input_text.replace("{question}", question)
    print(f"{bold} For prompt{unbold}: '{each_prompt}'{newline}")
    payload = {"text_inputs": input_text, **parameters}
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
    )
    generated_texts = parse_response_multiple_texts(query_response)
    print(f"{bold} The reasoning result is{unbold}: '{generated_texts}'{newline}")

[1m For prompt[0m: 'Answer based on context:

{context}

{question}'

[1m The reasoning result is[0m: '['Yes']'



### 7. Clean up the endpoint

In [None]:
# Delete the SageMaker endpoint
model_predictor.delete_model()
model_predictor.delete_endpoint()