# Generative AI Prompt Engineering

In this lab we will use a famous Encoder-Decoder LLM: Flan-T5. You will first do simple tasks to get your hands dirty.

Then you will learn about few shot prompting, and see how at a certain point the LLM just cannot do the task.

You will finish by testing the different possible configurations.

## Install Required Dependencies

Now install the required packages to use Hugging Face transformers and datasets.

In [1]:
# !pip install --upgrade pip
# !pip install \
#     transformers==4.35.2 \
#     datasets==2.15.0  --quiet

Load the datasets, Large Language Model (LLM), tokenizer, and configurator. Do not worry if you do not understand yet all of those components - they will be described and discussed later in the notebook.

In [2]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

## Doing Simple Tasks with Flan-T5

In this case we wil do simple sentiment analysis so you get the gist of how to use these LLMs. You will use the pre-trained Large Language Model (LLM) FLAN-T5 from Hugging Face. The list of available models in the Hugging Face `transformers` package can be found [here](https://huggingface.co/docs/transformers/index)

In [3]:
huggingface_dataset_name = "imdb"

dataset = load_dataset(huggingface_dataset_name)

In [4]:
dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 50000
    })
})

Let's use from the train set, but it is the same for us now

In [5]:
import numpy as np
def get_random_review_and_label():
  random_index = np.random.randint(1, 25000)
  random_review = dataset['train'][random_index]['text'] # get random review
  label = dataset['train'][random_index]['label']  # get label of that review
  return random_review, label

random_review, label = get_random_review_and_label()

dash_line = '-'.join('' for x in range(100))

print(f'Review: \n\n{random_review}')
print(dash_line)
print(f'Label: {label}')

Review: 

I am oh soooo glad I have not spent money to go to the cinema on it :-). It is nothing more than compilation of elements of few other classic titles like The Thing, Final Fantasy, The Abyss etc. framed in rather dull and meaningless scenario. I really can not figure out what was the purpose of creating this movie - it has absolutely nothing new to offer in its storyline which additionally is also senseless. Moreover there is nothing to watch - the FX'es look like there were taken from a second hand store, you generally saw all of them in other movies. But it is definitely a good lullaby.
---------------------------------------------------------------------------------------------------
Label: 0


Let's now use the model! For that we need to use the Tokenizer to transform the text into the "model language" (more on this during the course). Also we need to download the model.

In [6]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_name='google/flan-t5-large' # load google's flan-t5


model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# inputs = tokenizer("A step by step recipe to make bolognese pasta:", return_tensors="pt")
# outputs = model.generate(**inputs)
# print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
# model = None # Load the model
# tokenizer = None # Load the tokenizer 



In [7]:
sentence = random_review[:50]
print(f'Review trimmed: {sentence}')

sentence_encoded = tokenizer(sentence)

sentence_decoded = tokenizer.decode(
        sentence_encoded["input_ids"],
        skip_special_tokens=True
    )

print('ENCODED SENTENCE:')
print(sentence_encoded["input_ids"])
print('\nDECODED SENTENCE:')
print(sentence_decoded)

Review trimmed: I am oh soooo glad I have not spent money to go to


2024-05-15 17:56:19.547812: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-05-15 17:56:19.570165: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-15 17:56:19.570197: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-15 17:56:19.571254: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-05-15 17:56:19.576303: I tensorflow/core/platform/cpu_feature_guar

ENCODED SENTENCE:
[27, 183, 3, 32, 107, 3, 7, 16780, 3755, 27, 43, 59, 1869, 540, 12, 281, 12, 1]

DECODED SENTENCE:
I am oh soooo glad I have not spent money to go to


Now let's call the model. As this is a TFAutoModelForSeq2SeqLM this means that is a LLM for seq2seq tasks, like summarizing or text generation, so let's put our prompt that way.

In [8]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

inputs = tokenizer("A step by step recipe to make bolognese pasta:", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))



['Toss the pasta with the sauce, then add the meat and toss again.']


In [9]:
review, label = get_random_review_and_label()

prompt = f"""
Analyze the sentiment of the following review:
====
{review}
====
Sentiment:

"""

input = tokenizer(prompt, return_tensors="pt")



Analyze the sentiment of the following review:
====
I gave this a 1. There are so many plot twists that you can never be sure to root for. Total mayhem. Everyone gets killed or nearly so. I am tired of cross hairs and changing views. I cannot give the plot away. Convoluted and insane. If I had paid to see this I would demand my money back. I wish reviews were more honest.
====
Sentiment:




In [10]:
tokenizer.batch_decode(model.generate(**input), skip_special_tokens=True)

['negative']

In [11]:
# tokenizer.decode(
#         model.generate(tf.constant([input['input_ids']]), max_new_tokens=50)[0],
#         skip_special_tokens=True
#     )

And what was the real sentiment? Remember in this dataset `0` is negative and `1` is positive

In [12]:
label

0

## Summarize News without Prompt Engineering

In this use case, you will be generating a summary of news with Flan-T5.

Let's upload some simple dialogues from the dialogsum Hugging Face dataset. This dataset contains 10,000+ articles with the corresponding manually labeled summaries.

In [13]:
huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)

In [14]:
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
})

Print a couple of dialogues with their baseline summaries.

In [15]:
def get_random_dialogue_and_summary():
  random_index = np.random.randint(1, 10000)
  random_dialogue = dataset['train'][random_index]['dialogue']  # get random dialogue
  summary = dataset['train'][random_index]['summary'] # get summary
  return random_dialogue, summary

random_dialogue, summary = get_random_dialogue_and_summary()

dash_line = '-'.join('' for x in range(100))

print(f'Dialogue: \n\n{random_dialogue}')
print(dash_line)
print(f'Summary: {summary}')

Dialogue: 

#Person1#: Can I help you?
#Person2#: Yes, I'm looking for a house.
#Person1#: To buy or to rent?
#Person2#: Oh, to rent.
#Person1#: How much do you want to pay?
#Person2#: About 300 a month.
#Person1#: Well, I've got one here. It's 260 a month.
#Person2#: How big is it?
#Person1#: It's got a kitchen, a bathroom, and one bedroom.
#Person2#: Well, actually I prefer something a bit bigger if that's possible.
#Person1#: Yes, I think so. There is also an interesting one. It's opposite the park.
#Person2#: How much is it?
#Person1#: It's 325 a month. It's the biggest we've got in this area.
#Person2#: What's it like?
#Person1#: Well, There're two bedrooms, a sitting room, a kitchen and a bathroom.
#Person2#: It sounds interesting. Can I go and see it?
#Person1#: Of course, Sir.
---------------------------------------------------------------------------------------------------
Summary: #Person2# wants to rent a big house and #Person1# recommends one opposite the park for 325 a mo

Test the tokenizer encoding and decoding a simple sentence:

Now it's time to explore how well the base LLM summarizes a dialogue without any prompt engineering. **Prompt engineering** is an act of a human changing the **prompt** (input) to improve the response for a given task.

In [16]:
for i in range(3):
    dialogue, summary = get_random_dialogue_and_summary()
    inputs = tokenizer(dialogue, return_tensors="pt") # Tokenize the dialogue
    output = tokenizer.batch_decode(model.generate(**inputs), skip_special_tokens=True)

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'Dialogue:\n{dialogue}')
    print(dash_line)
    print(f'Summary:\n{summary}')
    print(dash_line)
    print(f'Model Summary - Without prompt engineering:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
Dialogue:
#Person1#: Can you direct me to some fresh produce that's on sale?
#Person2#: Well, we've got some great mangoes on sale.
#Person1#: Mangoes? What are mangoes?
#Person2#: Well, it's a fruit with a big seed in it.
#Person1#: Can you eat the seed?
#Person2#: No. Peel the skin with a sharp knife, and throw out the seed.
#Person1#: Well, how much are they?
#Person2#: Well, they're on sale today for only $ 1 each.
#Person1#: Can you describe their taste?
#Person2#: They usually taste sweet, but they remind me of an orange.
#Person1#: How can I tell if they're ripe?
#Person2#: You can buy them either ripe or unripe. Unripe ones are hard.
#Person1#: Where do they grow mangoes?
#Person2#: The ones that are on sale are from Central America.
-------------------------------------

You can see that the guesses of the model make some sense, but it doesn't seem to be sure what task it is supposed to accomplish. Seems it just makes up the next sentence in the dialogue. Prompt engineering can help here.

## Summarize Dialogue with an Instruction Prompt

Prompt engineering is an important concept in using foundation models for text generation.

<a name='3.1'></a>
### 3.1 - Zero Shot Inference with an Instruction Prompt

In order to instruct the model to perform a task - summarize a dialogue - you can take the dialogue and convert it into an instruction prompt. This is often called **zero shot inference**.  
Wrap the dialogue in a descriptive instruction and see how the generated text will change:

In [17]:
for i in range(3):
    dialogue, summary = get_random_dialogue_and_summary()
    prompt = f"""
Summarize the following conversation.
====
{dialogue}
====
Summary:
    """
    inputs = tokenizer(dialogue, return_tensors="pt") # Tokenize the dialogue
    output = tokenizer.batch_decode(model.generate(**inputs), skip_special_tokens=True)

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'Dialogue:\n{dialogue}')
    print(dash_line)
    print(f'Summary:\n{summary}')
    print(dash_line)
    print(f'Model Summary - Zero shot inference prompt engineering:\n{output}\n')


---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
Dialogue:
#Person1#: My camera has broken down. I'm thinking of buying a new one.
#Person2#: Try MA-205. You won't regret it.
#Person1#: I know this model is of a good quality and design but it's too expensive.
#Person2#: You can buy a cheaper one on the Internet.
#Person1#: On the Internet? How?
#Person2#: Use a search engine and search for cheap MA-205.
#Person1#: A search engine? Em, what's that?
#Person2#: You really live in the stone age. All right, tell me your budget and I'll see whether I can get one for you.
---------------------------------------------------------------------------------------------------
Summary:
#Person2# recommends #Person1# to buy a cheaper MA-205 on the Internet by using a search engine.
------------------------------------------------------------

This is much better! But the model still does not pick up on the nuance of the conversations though.

## Summarize Dialogue with One Shot and Few Shot Inference

**One shot and few shot inference** are the practices of providing an LLM with either one or more full examples of prompt-response pairs that match your task - before your actual prompt that you want completed. This is called "in-context learning" and puts your model into a state that understands your specific task.  

## One Shot Inference



In [18]:
def make_prompt_and_return_real_summary(number_of_shots):
    prompt = 'Summarize the following conversation.\n'
    for i in range(number_of_shots):
        dialogue, summary = get_random_dialogue_and_summary()

        # The stop sequence '{summary}\n\n\n' is important for FLAN-T5. Other models may have their own preferred stop sequence.
        prompt += f"""
Dialogue:
====
{dialogue}
====
Summary: {summary}


"""
    
    dialogue_to_analise , real_summary = get_random_dialogue_and_summary()

    prompt += f"""
Dialogue:
====
{dialogue_to_analise}
====
Summary:
"""
    
    return prompt, real_summary

Construct the prompt to perform one shot inference:

In [19]:
one_shot_prompt, real_summary = make_prompt_and_return_real_summary(1)

print(one_shot_prompt)
print(dash_line)
print(real_summary)

Summarize the following conversation.

Dialogue:
====
#Person1#: Mr. Emory? I'd appreciate it if you would look over these letters before you leave today.
#Person2#: I'd be glad to. Just leave them on my desk. I didn't expect you to finish so soon.
#Person1#: Thank you, sir. I'll leave them here. If there are no problems, I'll mail them out this afternoon.
#Person2#: Great. Good work.
====
Summary: #Person1# requests Mr. Emory to check the letters before leaving and he agrees.



Dialogue:
====
#Person1#: Hurry up. It is time for TV.
#Person2#: What are we going to watch?
#Person1#: A football match between Germany and Italy. It will be exciting.
#Person2#: But I am not interested in, football. I like to see a TV film.
#Person1#: Oh. no. You can see a TV film any other day.
#Person2#: There will he oilier football games any other day.
#Person1#: But this game is the most important of the season.
#Person2#: If you insist on watching the game, I will go.
#Person1#: Where are you going? A

Now pass this prompt to perform the one shot inference:

In [20]:
for i in range (3):
  one_shot_prompt, real_summary = make_prompt_and_return_real_summary(1)
  inputs = tokenizer(one_shot_prompt, return_tensors="pt")
  output = tokenizer.batch_decode(model.generate(**inputs), skip_special_tokens=True)

  print(dash_line)
  print(f'Example {i + 1}')
  print(dash_line)
  print(f'Dialogue:\n{one_shot_prompt}')
  print(dash_line)
  print(f'Summary From Dataset:\n{real_summary}')
  print(dash_line)
  print(f'Model Summary - One shot inference prompt engineering:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
Dialogue:
Summarize the following conversation.

Dialogue:
====
#Person1#: I'd like to buy a fridge. What about the quality of higher products? 
#Person2#: I strongly recommend it. As an international enterprise, it produces high quality household appliances. 
#Person1#: Is there a warranty? 
#Person2#: Yes, all their products have warranties. 
#Person1#: How long is it? 
#Person2#: The fridges is covered by a one-year warranty. 
#Person1#: Which model is the best seller of this year? 
#Person2#: This one. How do you like it? 
#Person1#: It's too big for me. Could you recommend something else? 
#Person2#: Sure, this way please. 
====
Summary: #Person1# wants to buy a fridge and asks about Higher products. #Person2# strongly recommends it and offers a detailed introduction.



Dia

Token indices sequence length is longer than the specified maximum sequence length for this model (553 > 512). Running this sequence through the model will result in indexing errors


---------------------------------------------------------------------------------------------------
Example 2
---------------------------------------------------------------------------------------------------
Dialogue:
Summarize the following conversation.

Dialogue:
====
#Person1#: Where are you going?
#Person2#: I'm going to the gym to lift weights. Want to come?
#Person1#: No, thanks. I'm going to prepare for my chemistry midterm. Do you usually just lift weights?
#Person2#: No. I lift to get stronger. Then I swim to help my heart and lungs and I jump rope to improve my balance.
#Person1#: Wow, I wish I had that much training.
#Person2#: Start slowly and a little more each day.
#Person1#: Thanks. Well, have fun.
====
Summary: #Person2#'s going to the gym and suggests #Person1# start slowly and a little more each day if #Person1# wants to take up training.



Dialogue:
====
#Person1#: Congratulations, Francis. Your hard working finally pays off. I am so happy for your promotion.
#Pe

### Few Shot Inference

Let's explore few shot inference by adding two more full dialogue-summary pairs to your prompt.

In [21]:
for i in range (3):
  few_shot_prompt, real_summary = make_prompt_and_return_real_summary(5)
  inputs = tokenizer(few_shot_prompt, return_tensors="pt")
  output = tokenizer.batch_decode(model.generate(**inputs), skip_special_tokens=True)

  print(dash_line)
  print(f'Example {i + 1}')
  print(dash_line)
  print(f'Dialogue:\n{few_shot_prompt}')
  print(dash_line)
  print(f'Summary:\n{real_summary}')
  print(dash_line)
  print(f'Model Summary - Few shot inference prompt engineering:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
Dialogue:
Summarize the following conversation.

Dialogue:
====
#Person1#: May I recommend you Tsingtao beer?
#Person2#: Tsingtao beer?
#Person1#: Yes, sir. It's one of the best beers in China.
#Person2#: Really?
#Person1#: Yes. The beer is brewed by using carefully selected malts, rice, hops and natural water from the Lao Mountain.
#Person2#: How about its taste?
#Person1#: Fine, sir.
#Person2#: That sounds great. Two Tsingtao beers, please.
#Person1#: Tin or bottle?
#Person2#: Tin, please.
#Person1#: Would you like it on the rocks, sir?
#Person2#: No, thank you.
#Person1#: You're welcome.
====
Summary: #Person1# recommends Tsingtao beer to #Person2# and #Person2# orders two tins.



Dialogue:
====
#Person1#: Hello, this is the International Student Office. This is Leah. How may

In this case, few shot did not provide much of an improvement over one shot inference.  And, anything above 5 or 6 shot will typically not help much, either.  Also, you need to make sure that you do not exceed the model's input-context length which, in our case, if 512 tokens.  Anything above the context length will be ignored.

However, you can see that feeding in at least one full example (one shot) provides the model with more information and qualitatively improves the summary overall.

## Configuration Parameters

In [28]:
from transformers import GenerationConfig
for i in range(1,10):
    i /=10
    generation_config = GenerationConfig(
        max_new_tokens=111, do_sample=True, temperature=i
    )# Create a very creative generative config

    inputs = tokenizer(few_shot_prompt, return_tensors="pt")
    output = tokenizer.batch_decode(model.generate(**inputs, generation_config=generation_config), skip_special_tokens=True)
    
    print(dash_line)
    print(f'MODEL GENERATION - FEW SHOT, TEMP={i}:\n{output}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{real_summary}\n')

---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT, TEMP=0.1:
['Person2 has just given birth to triplets. She is worried about the birth and labor.']
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# thinks it awesome that #Person2#'s got triplets, but #Person2#'s exhausted. #Person2#'s delivery went smoothly.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT, TEMP=0.2:
['Person2 has just had her third child.']
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# thinks it awesome that #Person2#'s got triplets, but #Person2#'s exhausted. #Person2#'s delivery went smoothly.

--------------------------------------------------------------------------------------------------

Comments related to the choice of the parameters in the code cell above:
- Choosing `max_new_tokens=10` will make the output text too short, so the dialogue summary will be cut.
- Putting `do_sample = True` and changing the temperature value you get more flexibility in the output.

As you can see, prompt engineering can take you a long way for this use case, but there are some limitations.