# Generative AI Prompt Engineering

In this lab we will use a famous Encoder-Decoder LLM: Flan-T5. You will first do simple tasks to get your hands dirty.

Then you will learn about few shot prompting, and see how at a certain point the LLM just cannot do the task.

You will finish by testing the different possible configurations.

## Install Required Dependencies

Now install the required packages to use Hugging Face transformers and datasets.

In [None]:
!pip install --upgrade pip
!pip install transformers==4.35.2 datasets==2.15.0  --quiet

Collecting pip
  Downloading pip-23.3.2-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m15.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.1.2
    Uninstalling pip-23.1.2:
      Successfully uninstalled pip-23.1.2
Successfully installed pip-23.3.2
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m521.2/521.2 kB[0m [31m13.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[0m

Load the datasets, Large Language Model (LLM), tokenizer, and configurator. Do not worry if you do not understand yet all of those components - they will be described and discussed later in the notebook.

In [None]:
from datasets import load_dataset
from transformers import TFAutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

## Doing Simple Tasks with Flan-T5

In this case we wil do simple sentiment analysis so you get the gist of how to use these LLMs. You will use the pre-trained Large Language Model (LLM) FLAN-T5 from Hugging Face. The list of available models in the Hugging Face `transformers` package can be found [here](https://huggingface.co/docs/transformers/index)

In [None]:
huggingface_dataset_name = "imdb"

dataset = None # Load the dataset

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

In [None]:
dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 50000
    })
})

Let's use from the train set, but it is the same for us now

In [None]:
import numpy as np
def get_random_review_and_label():
  random_index = np.random.randint(1, 25000)
  random_review = dataset['train'][random_index]['text']
  label = dataset['train'][random_index]['label']
  return random_review, label

random_review, label = get_random_review_and_label()

dash_line = '-'.join('' for x in range(100))

print(f'Review: \n\n{random_review}')
print(dash_line)
print(f'Label: {label}')

Review: 

First off, I agree with quite a bit that escapes Mr. Chomsky's mouth. His matter-of-fact delivery of interesting counterpoint is what makes the man a hit on the university campus circus. He comes across likable, unassuming, pragmatic. He doesn't cater to the current political style (obnoxious bi-partisanship) and he sets his sights on the far left as well as the far right, chastising both, and for good reason.<br /><br />Unfortunately, the film itself is a dud. In fact, I would not even call this a documentary but rather just a collection of speeches. Watching "Rebel Without a Pause" is no different from watching a speaker on a 3am taped segment on CSPAN. There are no camera movements, no edits, no stylistic touches. There is no story, no narrative.<br /><br />Technically speaking, the production is strictly amateurish. Audio is terrible and inconsistent; sometimes we cannot hear Noam speak, other times we cannot hear the questions that are being posited by those in attendanc

Let's now use the model! For that we need to use the Tokenizer to transform the text into the "model language" (more on this during the course). Also we need to download the model. Remember we are going to use a `TFAutoModelForSeq2SeqLM` model

In [None]:
model_name='google/flan-t5-base'

model = None  # Load model
tokenizer = None # Load tokenizer

config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

In [None]:
sentence = random_review[:50]
print(f'Review trimmed: {sentence}')

sentence_encoded = tokenizer(sentence)

sentence_decoded = tokenizer.decode(
        sentence_encoded["input_ids"],
        skip_special_tokens=True
    )

print('ENCODED SENTENCE:')
print(sentence_encoded["input_ids"])
print('\nDECODED SENTENCE:')
print(sentence_decoded)

Review trimmed: First off, I agree with quite a bit that escapes M
ENCODED SENTENCE:
[1485, 326, 6, 27, 2065, 28, 882, 3, 9, 720, 24, 6754, 7, 283, 1]

DECODED SENTENCE:
First off, I agree with quite a bit that escapes M


Now let's call the model. As this is a TFAutoModelForSeq2SeqLM this means that is a LLM for seq2seq tasks, like summarizing or text generation, so let's put our prompt that way.

In [None]:
import tensorflow as tf
review, label = get_random_review_and_label()

prompt = f"""
Analyze the sentiment of the following review:

{review}

Sentiment:

"""

input = tokenizer(prompt)

Token indices sequence length is longer than the specified maximum sequence length for this model (685 > 512). Running this sequence through the model will result in indexing errors


In [None]:
model.generate(tf.constant([input['input_ids']]), max_new_tokens=50)

<tf.Tensor: shape=(1, 3), dtype=int32, numpy=array([[   0, 2841,    1]], dtype=int32)>

In [None]:
tokenizer.decode(
        model.generate(tf.constant([input['input_ids']]), max_new_tokens=50)[0],
        skip_special_tokens=True
    )

'negative'

And what was the real sentiment? Remember in this dataset `0` is negative and `1` is positive

In [None]:
label

1

## Summarize News without Prompt Engineering

In this use case, you will be generating a summary of news with Flan-T5.

Let's upload some simple dialogues from the dialogsum Hugging Face dataset. This dataset contains 10,000+ articles with the corresponding manually labeled summaries.

In [None]:
huggingface_dataset_name = "knkarthick/dialogsum"

dataset = None  # Load the dataset

Downloading readme:   0%|          | 0.00/4.65k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/442k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

In [None]:
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
})

Print a couple of dialogues with their baseline summaries.

In [None]:
def get_random_dialogue_and_summary():
  # Implement this method

  random_index = np.random.randint(1, 10000)
  random_dialogue = None
  summary = None
  return random_dialogue, summary

random_dialogue, summary = get_random_dialogue_and_summary()

dash_line = '-'.join('' for x in range(100))

print(f'Dialogue: \n\n{random_dialogue}')
print(dash_line)
print(f'Summary: {summary}')

Dialogue: 

#Person1#: Kathy, you look worried, why?
#Person2#: According to the screen, our flight to Sydney has been delayed by 3 hours. So now we won't be boarding the plane until 2:00 PM. But we have a meeting at night.
#Person1#: That shouldn't be a problem. The meeting with our customers isn't until 8:00 o'clock. Unfortunately, we won't have time to take a tour of the city as we planned. I have been looking forward to it for a long time.
#Person2#: What a pity! However, we can look around next time.
---------------------------------------------------------------------------------------------------
Summary: Kathy is worried due to the delayed flight. #Person1# comforts her that they won't be late for the meeting. But they won't have time for sightseeing.


Test the tokenizer encoding and decoding a simple sentence:

Now it's time to explore how well the base LLM summarizes a dialogue without any prompt engineering. **Prompt engineering** is an act of a human changing the **prompt** (input) to improve the response for a given task.

In [None]:
for i in range(3):
    dialogue, summary = get_random_dialogue_and_summary()
    inputs = tokenizer(dialogue)
    model_output = None  # Generate model output
    output = tokenizer.decode(
        model_output,
        skip_special_tokens=True
    )

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'Dialogue:\n{dialogue}')
    print(dash_line)
    print(f'Summary:\n{summary}')
    print(dash_line)
    print(f'Model Summary - Without prompt engineering:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
Dialogue:
#Person1#: Why am I being charged $ 10 for a movie that I never ordered?
#Person2#: Sir, according to your file, you spent Monday evening watching'Titanic. '
#Person1#: Well, the file is wrong. I was at a great concert that night.
#Person2#: Well, this wouldn't be the first time that a file was wrong. Just a moment, please.
#Person1#: Thank you for taking care of it so quickly.
#Person2#: Sir, when I deleted the $ 10, the program automatically added a $ 2 service charge.
#Person1#: You can't do that! You can't charge me for a mistake that you made!
#Person2#: Sometimes you can't win for losing, sir.
#Person1#: Well, now I've seen it all! What a rip-off this place is!
#Person2#: I don't blame you, sir. Two dollars is a lot of money.
-------------------------------------

You can see that the guesses of the model make some sense, but it doesn't seem to be sure what task it is supposed to accomplish. Seems it just makes up the next sentence in the dialogue. Prompt engineering can help here.

## Summarize Dialogue with an Instruction Prompt

Prompt engineering is an important concept in using foundation models for text generation.

<a name='3.1'></a>
### 3.1 - Zero Shot Inference with an Instruction Prompt

In order to instruct the model to perform a task - summarize a dialogue - you can take the dialogue and convert it into an instruction prompt. This is often called **zero shot inference**.  
Wrap the dialogue in a descriptive instruction and see how the generated text will change:

In [None]:
for i in range(3):
    dialogue, summary = get_random_dialogue_and_summary()
    prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
    """
    inputs = tokenizer(prompt)
    output = tokenizer.decode(
        model.generate(tf.constant([inputs['input_ids']]), max_new_tokens=50)[0],
        skip_special_tokens=True
    )

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'Dialogue:\n{dialogue}')
    print(dash_line)
    print(f'Summary:\n{summary}')
    print(dash_line)
    print(f'Model Summary - Zero shot inference prompt engineering:\n{output}\n')


---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
Dialogue:
#Person1#: I'm tired of watching television. Let's go to cinema to- night.
#Person2#: All right. Do you want to go downtown? Or is there a good movie in the neighborhood?
#Person1#: I'd rather not spend a lot of money. What does the pa- per say about neighborhood theaters?
#Person2#: Here's the list on page... Column 6. Here it is. Where's the Rialto? There's a perfect movie there.
#Person1#: That's too far away. And it's hard to find a place to park there.
#Person2#: Well, the Grand Theater has Gone with the wind.
#Person1#: I saw that years ago. I couldn't wait to see it again. Moreover, it's too long. We wouldn't get home until midnight.
#Person2#: The Center has a horror film. You wouldn't want to see that?
#Person1#: No, indeed. I wouldn't be able to sleep tonight

This is much better! But the model still does not pick up on the nuance of the conversations though.

## Summarize Dialogue with One Shot and Few Shot Inference

**One shot and few shot inference** are the practices of providing an LLM with either one or more full examples of prompt-response pairs that match your task - before your actual prompt that you want completed. This is called "in-context learning" and puts your model into a state that understands your specific task.  

## One Shot Inference



In [None]:
def make_prompt_and_return_real_summary(number_of_shots):
    prompt = ''
    for i in range(number_of_shots):
        dialogue, summary = get_random_dialogue_and_summary()

        # The stop sequence '{summary}\n\n\n' is important for FLAN-T5. Other models may have their own preferred stop sequence.
        prompt += f"""
Dialogue:

{dialogue}

Summary:

{summary}


"""

    dialogue_to_analise , real_summary = get_random_dialogue_and_summary()

    prompt += f"""
Dialogue:

{dialogue_to_analise}

Summary:

"""

    return prompt, real_summary

Construct the prompt to perform one shot inference:

In [None]:
one_shot_prompt, real_summary = make_prompt_and_return_real_summary(1)

print(one_shot_prompt)


Dialogue:

#Person1#: What are we going to do? I can't get the car out of this ditch. I'm stuck!
#Person2#: I'm worried, Tom. I haven't seen any other cars for almost an hour.
#Person1#: I know. This is terrible. What can we do? This snow doesn't stop falling!
#Person2#: I told you we should have stayed in town today.The weather report said 100 percent chance of snow.Why did you want to come up here?
#Person1#: I wanted to show you the cabin. We only had another half-hour to go.
#Person2#: Well, now we're stuck. What can we do?
#Person1#: I don't know.
#Person2#: I've heard that when this happens, it's important to save energy.
#Person1#: What do you mean?
#Person2#: We're stranded here, Tom. We may be here a long time.We need to conserve the gas in the car. The car's energy is what will keep us warm.
#Person1#: I have plenty of gas.
#Person2#: Yes, but the gas and the battery both have to stay working.We can't just let the car run and run.If we do, it will die soon. Then we'll freeze

Now pass this prompt to perform the one shot inference:

In [None]:
for i in range (3):
  one_shot_prompt, real_summary = make_prompt_and_return_real_summary(1)
  inputs = tokenizer(one_shot_prompt)
  output = None # Get output
  print(dash_line)
  print(f'Example {i + 1}')
  print(dash_line)
  print(f'Dialogue:\n{one_shot_prompt}')
  print(dash_line)
  print(f'Summary:\n{real_summary}')
  print(dash_line)
  print(f'Model Summary - One shot inference prompt engineering:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
Dialogue:

Dialogue:

#Person1#: I want to lose some weight!
#Person2#: So do I!
#Person1#: I have a yoga class tomorrow. Do you want to come with me?
#Person2#: No, it's too expensive for me. I've decided to take some exercises on my own.
#Person1#: What are you going to do?
#Person2#: Run around the track. In the morning I run for an hour, and in the afternoon I run around the building.
#Person1#: Good, I am sure it will work if you can persist.
#Person2#: I hope so. Would you like to join me?
#Person1#: Sounds good!

Summary:

#Person2# thinks yoga class is too expensive so #Person2# decides to take exercises on #Person2#'s own to lose weight.



Dialogue:

#Person1#: I can't stand the stupid guy any longer. It's unbelievable.
#Person2#: Oh, my dear lady, take it easy. You sho

### Few Shot Inference

Let's explore few shot inference by adding two more full dialogue-summary pairs to your prompt.

In [None]:
for i in range (3):
  few_shot_prompt, real_summary = make_prompt_and_return_real_summary(5)
  inputs = tokenizer(few_shot_prompt)
  output = tokenizer.decode(
      model.generate(tf.constant([inputs['input_ids']]), max_new_tokens=50)[0],
      skip_special_tokens=True
  )

  print(dash_line)
  print(f'Example {i + 1}')
  print(dash_line)
  print(f'Dialogue:\n{few_shot_prompt}')
  print(dash_line)
  print(f'Summary:\n{real_summary}')
  print(dash_line)
  print(f'Model Summary - Few shot inference prompt engineering:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
Dialogue:

Dialogue:

#Person1#: Let's see if we can reach some sort of agreement over your curfew.
#Person2#: Ok, every one asks their parents let them stay out until 2 or 3 in the morning.
#Person1#: Well, I'm not everyone-else's father. I think you need be in the house by ten o'clock.
#Person2#: That's absurd. I know some junior high kids who can stay out later than that.
#Person1#: I'll be worried if you stay out late.
#Person2#: Ok, how about midnight curfew. and I'll let you know where I am.

Summary:

#Person2# wants to put off the curfew #Person1# made. #Person1# worries about #Person2# if #Person2# stays out late.



Dialogue:

#Person1#: Hi Wei, what are you going to do this weekend?
#Person2#: I think I'll stay in on Saturday and rest.
#Person1#: Oh right. . . How abou

In this case, few shot did not provide much of an improvement over one shot inference.  And, anything above 5 or 6 shot will typically not help much, either.  Also, you need to make sure that you do not exceed the model's input-context length which, in our case, if 512 tokens.  Anything above the context length will be ignored.

However, you can see that feeding in at least one full example (one shot) provides the model with more information and qualitatively improves the summary overall.

## Configuration Parameters

In [None]:
generation_config = GenerationConfig(max_new_tokens=100, do_sample=True, temperature=2.0)

inputs = tokenizer(few_shot_prompt)
output = tokenizer.decode(
    model.generate(tf.constant([inputs['input_ids']]), generation_config=generation_config)[0],
    skip_special_tokens=True
)

print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{real_summary}\n')

---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
From Person2 you must have met greatgrandmother Janet Maffie.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
Annie tells #Person1# about the picture of Annie's seventh great grandfather who was a very personable man.



Comments related to the choice of the parameters in the code cell above:
- Choosing `max_new_tokens=10` will make the output text too short, so the dialogue summary will be cut.
- Putting `do_sample = True` and changing the temperature value you get more flexibility in the output.

As you can see, prompt engineering can take you a long way for this use case, but there are some limitations.