# Generative AI Prompt Engineering

In this lab we will use a famous Encoder-Decoder LLM: Flan-T5. You will first do simple tasks to get your hands dirty.

Then you will learn about few shot prompting, and see how at a certain point the LLM just cannot do the task.

You will finish by testing the different possible configurations.

## Install Required Dependencies

Now install the required packages to use Hugging Face transformers and datasets.

In [1]:
!pip install --upgrade pip
!pip install transformers==4.35.2 datasets==2.15.0  --quiet

[0m

Load the datasets, Large Language Model (LLM), tokenizer, and configurator. Do not worry if you do not understand yet all of those components - they will be described and discussed later in the notebook.

In [2]:
from datasets import load_dataset
from transformers import TFAutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

  _torch_pytree._register_pytree_node(


## Doing Simple Tasks with Flan-T5

In this case we wil do simple sentiment analysis so you get the gist of how to use these LLMs. You will use the pre-trained Large Language Model (LLM) FLAN-T5 from Hugging Face. The list of available models in the Hugging Face `transformers` package can be found [here](https://huggingface.co/docs/transformers/index)

In [3]:
huggingface_dataset_name = "imdb"

dataset = load_dataset("imdb") # Load the dataset

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [35]:
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
})

Let's use from the train set, but it is the same for us now

In [4]:
import numpy as np
def get_random_review_and_label():
  random_index = np.random.randint(1, 25000)
  random_review = dataset['train'][random_index]['text']
  label = dataset['train'][random_index]['label']
  return random_review, label

random_review, label = get_random_review_and_label()

dash_line = '-'.join('' for x in range(100))

print(f'Review: \n\n{random_review}')
print(dash_line)
print(f'Label: {label}')

Review: 

It could have been a better film. It does drag at points, and the central story shifts from Boyer completing his mission to Boyer avenging Wanda Hendrix's death, but Graham Greene is an author who is really hard to spoil. His stories are all morality tales, due to his own considerations of Catholicism, guilt and innocence (very relative terms in his world view), and the human condition.<br /><br />Boyer is Luis Denard, a well-known concert pianist, who has sided with the Republicans in the Spanish Civil War. He has been sent to England to try to carry through an arms purchase deal that is desperately needed. Unfortunately for Denard he is literally on his own - everyone of his contacts turns out to be a willing turncoat for the Falagists of Spain. In particular Katina Paxinou (Mrs. Melendez) a grim boarding house keeper, and Peter Lorre (Mr. Contreras) a teacher of an "esperanto" type international language. Wanda Hendrix is the drudge of a girl (Else) who works for Mrs. Mele

Let's now use the model! For that we need to use the Tokenizer to transform the text into the "model language" (more on this during the course). Also we need to download the model. Remember we are going to use a `TFAutoModelForSeq2SeqLM` model

In [5]:
model_name='google/flan-t5-base'
from transformers import pipeline


tokenizer = AutoTokenizer.from_pretrained(model_name)
model = TFAutoModelForSeq2SeqLM.from_pretrained(model_name)



  _torch_pytree._register_pytree_node(
All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


In [22]:
sentence = random_review[:50]
print(f'Review trimmed: {sentence}')

sentence_encoded = tokenizer(sentence)

sentence_decoded = tokenizer.decode(
        sentence_encoded["input_ids"],
        skip_special_tokens=True
    )

print('ENCODED SENTENCE:')
print(sentence_encoded["input_ids"])
print('\nDECODED SENTENCE:')
print(sentence_decoded)

Review trimmed: This was the second of the series of 6 "classic Ta
ENCODED SENTENCE:
[100, 47, 8, 511, 13, 8, 939, 13, 431, 96, 4057, 447, 2067, 1]

DECODED SENTENCE:
This was the second of the series of 6 "classic Ta


Now let's call the model. As this is a TFAutoModelForSeq2SeqLM this means that is a LLM for seq2seq tasks, like summarizing or text generation, so let's put our prompt that way.

In [6]:
import tensorflow as tf
review, label = get_random_review_and_label()

prompt = f"""
Analyze the sentiment of the following review:

{review}

Sentiment:

"""

input = tokenizer(prompt)

In [7]:
model.generate(tf.constant([input['input_ids']]), max_new_tokens=50)

<tf.Tensor: shape=(1, 3), dtype=int32, numpy=array([[   0, 1465,    1]], dtype=int32)>

In [8]:
tokenizer.decode(
        model.generate(tf.constant([input['input_ids']]), max_new_tokens=50)[0],
        skip_special_tokens=True
    )

'positive'

And what was the real sentiment? Remember in this dataset `0` is negative and `1` is positive

In [9]:
label

1

## Summarize News without Prompt Engineering

In this use case, you will be generating a summary of news with Flan-T5.

Let's upload some simple dialogues from the dialogsum Hugging Face dataset. This dataset contains 10,000+ articles with the corresponding manually labeled summaries.

In [10]:
huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)  # Load the dataset

In [16]:
dataset
print(dataset.keys)
dataset.shape

<built-in method keys of DatasetDict object at 0x78649045fc90>


{'train': (12460, 4), 'validation': (500, 4), 'test': (1500, 4)}

Print a couple of dialogues with their baseline summaries.

In [18]:
def get_random_dialogue_and_summary():
  # Implement this method

  random_index = np.random.randint(1, 10000)
  random_dialogue = dataset['train'][random_index]['dialogue']
  summary = dataset['train'][random_index]['summary']
  return random_dialogue, summary

random_dialogue, summary = get_random_dialogue_and_summary()

dash_line = '-'.join('' for x in range(100))

print(f'Dialogue: \n\n{random_dialogue}')
print(dash_line)
print(f'Summary: {summary}')

Dialogue: 

#Person1#: I'd like a cup of coffee and a cheeseburger, please.
#Person2#: I'm sorry, but we don't have any burgers at the moment.
#Person1#: But you always serve your whole menu for breakfast, lunch and dinner. That's why I come here.
#Person2#: You're right, but one of our cooks is sick. So we had to take some things off the menu for a while. If you want to come back in half an hour, we'll definitely have our normal lunch menu.
#Person1#: That's OK, I'm really hungry. Let me see. I'll still take the coffee and I'll have a bacon and egg sandwich, instead, please.
#Person2#: Do you want breakfast potatoes with that?
#Person1#: No, thank you.
#Person2#: OK, your total is $6.50.
#Person1#: Here is a 10.
#Person2#: And here's your change and receipt.
---------------------------------------------------------------------------------------------------
Summary: #Person1# wants a cheeseburger and a coffee, but the burger isn't available at the moment, so #Person1# takes an egg sand

Test the tokenizer encoding and decoding a simple sentence:

Now it's time to explore how well the base LLM summarizes a dialogue without any prompt engineering. **Prompt engineering** is an act of a human changing the **prompt** (input) to improve the response for a given task.

In [20]:
for i in range(3):
    dialogue, summary = get_random_dialogue_and_summary()
    inputs = tokenizer(dialogue)
    model_output =  model.generate(tf.constant([inputs['input_ids']]), max_new_tokens=50)[0] # Generate model output
    output = tokenizer.decode(
        model_output,
        skip_special_tokens=True
    )

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'Dialogue:\n{dialogue}')
    print(dash_line)
    print(f'Summary:\n{summary}')
    print(dash_line)
    print(f'Model Summary - Without prompt engineering:\n{output}\n')

Token indices sequence length is longer than the specified maximum sequence length for this model (1188 > 512). Running this sequence through the model will result in indexing errors


---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
Dialogue:
#Person1#: Hi, Tim. So, are you doing some last-minute shopping before the weekend?
#Person2#: Well, actually, I'm looking for supplies to put together 72-hour kits for each member of my family.
#Person1#: [A] 72-hour kit? What's that?
#Person2#: Basically, a 72-hour kit contains emergency supplies you would need to sustain yourself for three days in case of an emergency, like an earthquake.
#Person1#: An earthquake?! We haven't had an earthquake in years.
#Person2#: Well, you never know; you have to be prepared. Hey, if earthquakes don't get you, it could be a flood, hurricane, snowstorm, power outage, fire, alien attack. [Alien attack!] Well, you never know. Think of any situation in which you might find yourself without the basic necessities of life, including shelt

You can see that the guesses of the model make some sense, but it doesn't seem to be sure what task it is supposed to accomplish. Seems it just makes up the next sentence in the dialogue. Prompt engineering can help here.

## Summarize Dialogue with an Instruction Prompt

Prompt engineering is an important concept in using foundation models for text generation.

<a name='3.1'></a>
### 3.1 - Zero Shot Inference with an Instruction Prompt

In order to instruct the model to perform a task - summarize a dialogue - you can take the dialogue and convert it into an instruction prompt. This is often called **zero shot inference**.  
Wrap the dialogue in a descriptive instruction and see how the generated text will change:

In [21]:
for i in range(3):
    dialogue, summary = get_random_dialogue_and_summary()
    prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
    """
    inputs = tokenizer(prompt)
    output = tokenizer.decode(
        model.generate(tf.constant([inputs['input_ids']]), max_new_tokens=50)[0],
        skip_special_tokens=True
    )

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'Dialogue:\n{dialogue}')
    print(dash_line)
    print(f'Summary:\n{summary}')
    print(dash_line)
    print(f'Model Summary - Zero shot inference prompt engineering:\n{output}\n')


---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
Dialogue:
#Person1#: Hello. Can I help you?
#Person2#: Hello. Is my laundry ready? My room number is 210.
#Person1#: I'm afraid it is still being washed.
#Person2#: Can you take the stain off?
#Person1#: Yes, we can. But you need wait a moment.
#Person2#: That's right. Can I get it back in the afternoon? I really need them tonight.
#Person1#: Yes, it will be ready then.
#Person2#: OK. By the way, please get them pressed.
#Person1#: No problem.
---------------------------------------------------------------------------------------------------
Summary:
#Person2# requests #Person1# to get the laundry ready and pressed by the afternoon.
---------------------------------------------------------------------------------------------------
Model Summary - Zero shot inference prompt engin

This is much better! But the model still does not pick up on the nuance of the conversations though.

## Summarize Dialogue with One Shot and Few Shot Inference

**One shot and few shot inference** are the practices of providing an LLM with either one or more full examples of prompt-response pairs that match your task - before your actual prompt that you want completed. This is called "in-context learning" and puts your model into a state that understands your specific task.  

## One Shot Inference



In [22]:
def make_prompt_and_return_real_summary(number_of_shots):
    prompt = ''
    for i in range(number_of_shots):
        dialogue, summary = get_random_dialogue_and_summary()

        # The stop sequence '{summary}\n\n\n' is important for FLAN-T5. Other models may have their own preferred stop sequence.
        prompt += f"""
Dialogue:

{dialogue}

Summary:

{summary}


"""

    dialogue_to_analise , real_summary = get_random_dialogue_and_summary()

    prompt += f"""
Dialogue:

{dialogue_to_analise}

Summary:

"""

    return prompt, real_summary

Construct the prompt to perform one shot inference:

In [23]:
one_shot_prompt, real_summary = make_prompt_and_return_real_summary(1)

print(one_shot_prompt)


Dialogue:

#Person1#: Have you ever been to Xi ' an?
#Person2#: Yes, I ' Ve been there several times on business trips. But I have never really seen the terra-cotta warriors as it is outside the city.
#Person1#: I ' Ve heard many people saying that it is a place worth touring. I really want to see the old walls and terra-cotta warriors one day. Of course I won ' t miss the local food either. You know, the sites interests a food in scenery, food is a key factor when visiting a place.
#Person2#: I agree. As long as the food is not too bizarre once I saw some people eating insects. That is frightening.
#Person1#: Sure it is. Is it convenient to get there by plane?
#Person2#: Well, the airport is quite far from the downtown area, but it is still more convenient than taking the train.

Summary:

#Person1# and #Person2# are talking about Xi'an. #Person1# wants to see the site-interests and try the local food. #Person2# tells #Person1# it's more convenient to go to Xi'an by air.



Dialogue:

Now pass this prompt to perform the one shot inference:

In [24]:
for i in range (3):
  one_shot_prompt, real_summary = make_prompt_and_return_real_summary(1)
  inputs = tokenizer(one_shot_prompt)
  output = tokenizer.decode(model.generate(tf.constant([inputs['input_ids']]), max_new_tokens=50)[0]) # Get output
  print(dash_line)
  print(f'Example {i + 1}')
  print(dash_line)
  print(f'Dialogue:\n{one_shot_prompt}')
  print(dash_line)
  print(f'Summary:\n{real_summary}')
  print(dash_line)
  print(f'Model Summary - One shot inference prompt engineering:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
Dialogue:

Dialogue:

#Person1#: Good morning. Can I help you?
#Person2#: Yes. I wonder if you have a one-bedroom apartment to rent.
#Person1#: Let me check. Yes, we have one. It's on Nanjing Street, near a shopping center and a subway station.
#Person2#: Sounds nice. Does it face south?
#Person1#: Well. the bedroom faces east and the living room north. But it looks out on a beautiful park.
#Person2#: Mmm, is the living room large?
#Person1#: Yes. it's quite big. And there's a small kitchen and a bathroom as well. It's very comfortable.
#Person2#: Well, what's the rent per month?
#Person1#: 800 yuan.
#Person2#: Mmm. it's more than I have in mind. Let me think it over. I'll call you back in a day or two.
#Person1#: Certainly.

Summary:

#Person2# wants to rent an apartment. #Perso

### Few Shot Inference

Let's explore few shot inference by adding two more full dialogue-summary pairs to your prompt.

In [25]:
for i in range (3):
  few_shot_prompt, real_summary = make_prompt_and_return_real_summary(5)
  inputs = tokenizer(few_shot_prompt)
  output = tokenizer.decode(
      model.generate(tf.constant([inputs['input_ids']]), max_new_tokens=50)[0],
      skip_special_tokens=True
  )

  print(dash_line)
  print(f'Example {i + 1}')
  print(dash_line)
  print(f'Dialogue:\n{few_shot_prompt}')
  print(dash_line)
  print(f'Summary:\n{real_summary}')
  print(dash_line)
  print(f'Model Summary - Few shot inference prompt engineering:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
Dialogue:

Dialogue:

#Person1#: Are you ready to go shopping?
#Person2#: Not yet. I'm not finished with my research yet.
#Person1#: What research?
#Person2#: Reading my fashion magazines! How do you think I know so much about all the latest trends?
#Person1#: But they're just ads. . .
#Person2#: Duh. . . That's the point. The people in the ads are wearing what's in. Plus, there are articles on new trends. . .

Summary:

#Person1# wants to go shopping with #Person2# but #Person2# hasn't finished reading fashion magazines.



Dialogue:

#Person1#: Mom, I'm starving.
#Person2#: Here are some biscuits. Why are you back so early today?
#Person1#: My teacher had a sudden stomachache, so the class was cut shot. You?
#Person2#: Me what?
#Person1#: You are cooking at least two hours earl

In this case, few shot did not provide much of an improvement over one shot inference.  And, anything above 5 or 6 shot will typically not help much, either.  Also, you need to make sure that you do not exceed the model's input-context length which, in our case, if 512 tokens.  Anything above the context length will be ignored.

However, you can see that feeding in at least one full example (one shot) provides the model with more information and qualitatively improves the summary overall.

## Configuration Parameters

In [26]:
generation_config = GenerationConfig(max_new_tokens=100, do_sample=True, temperature=2.0)

inputs = tokenizer(few_shot_prompt)
output = tokenizer.decode(
    model.generate(tf.constant([inputs['input_ids']]), generation_config=generation_config)[0],
    skip_special_tokens=True
)

print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{real_summary}\n')

---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
People are hungry. #S vecm to meet the other indiana diner who offered them suggestions. One can learn how do cost in restaurant food as they eat out almost every other day of it....
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person2# suggests they eat out. But #Person1# wants a home-cook meal, because #Person2# ate out almost every day last week, and promises to cook.



Comments related to the choice of the parameters in the code cell above:
- Choosing `max_new_tokens=10` will make the output text too short, so the dialogue summary will be cut.
- Putting `do_sample = True` and changing the temperature value you get more flexibility in the output.

As you can see, prompt engineering can take you a long way for this use case, but there are some limitations.