# Generative AI Prompt Engineering

In this lab we will use a famous Encoder-Decoder LLM: Flan-T5. You will first do simple tasks to get your hands dirty.

Then you will learn about few shot prompting, and see how at a certain point the LLM just cannot do the task.

You will finish by testing the different possible configurations.

## Install Required Dependencies

Now install the required packages to use Hugging Face transformers and datasets.

In [1]:
!pip install --upgrade pip
!pip install \
    transformers==4.35.2 \
    datasets==2.15.0  --quiet

Collecting pip
  Downloading pip-23.3.1-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.1.2
    Uninstalling pip-23.1.2:
      Successfully uninstalled pip-23.1.2
Successfully installed pip-23.3.1
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m521.2/521.2 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[0m

Load the datasets, Large Language Model (LLM), tokenizer, and configurator. Do not worry if you do not understand yet all of those components - they will be described and discussed later in the notebook.

In [3]:
from datasets import load_dataset
from transformers import TFAutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

## Doing Simple Tasks with Flan-T5

In this case we wil do simple sentiment analysis so you get the gist of how to use these LLMs. You will use the pre-trained Large Language Model (LLM) FLAN-T5 from Hugging Face. The list of available models in the Hugging Face `transformers` package can be found [here](https://huggingface.co/docs/transformers/index)

In [4]:
huggingface_dataset_name = "imdb"

dataset = load_dataset(huggingface_dataset_name)

Downloading builder script:   0%|          | 0.00/4.31k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/2.17k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/7.59k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/84.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

In [6]:
dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 50000
    })
})

Let's use from the train set, but it is the same for us now

In [14]:
import numpy as np
def get_random_review_and_label():
  random_index = np.random.randint(1, 25000)
  random_review = dataset['train'][random_index]['text']
  label = dataset['train'][random_index]['label']
  return random_review, label

random_review, label = get_random_review_and_label()

dash_line = '-'.join('' for x in range(100))

print(f'Review: \n\n{random_review}')
print(dash_line)
print(f'Label: {label}')

Review: 

the author of the book, by the same title, should not have let her name be used for this movie. if you have read the book, this movie takes such a liberal interpretation of the actual events in the book and its spirit that the movie and book seem to have quite little in common except the title and some superficial details. the movie adds nothing, in terms of artistic merit, to the book's own literary achievement.<br /><br />for those who have not read the book: you will also be disappointed. not only does the plot move at an incredibly slow pace, it doesn't offer anything more while it is moving slowly (like character development, for example). some viewers might be entertained by some of the graphic lesbian love scenes later on in the movie, but you might as well watch a showtime special for the stuff they show in therese and isabelle--its fairly tame and not imaginative at all.
-------------------------------------------------------------------------------------------------

Let's now use the model! For that we need to use the Tokenizer to transform the text into the "model language" (more on this during the course). Also we need to download the model.

In [15]:
model_name='google/flan-t5-base'

model = TFAutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


In [16]:
sentence = random_review[:50]
print(f'Review trimmed: {sentence}')

sentence_encoded = tokenizer(sentence)

sentence_decoded = tokenizer.decode(
        sentence_encoded["input_ids"],
        skip_special_tokens=True
    )

print('ENCODED SENTENCE:')
print(sentence_encoded["input_ids"])
print('\nDECODED SENTENCE:')
print(sentence_decoded)

Review trimmed: the author of the book, by the same title, should 
ENCODED SENTENCE:
[8, 2291, 13, 8, 484, 6, 57, 8, 337, 2233, 6, 225, 3, 1]

DECODED SENTENCE:
the author of the book, by the same title, should 


Now let's call the model. As this is a TFAutoModelForSeq2SeqLM this means that is a LLM for seq2seq tasks, like summarizing or text generation, so let's put our prompt that way.

In [31]:
import tensorflow as tf
review, label = get_random_review_and_label()

prompt = f"""
Analyze the sentiment of the following review:

{review}

Sentiment:

"""

input = tokenizer(prompt)

Token indices sequence length is longer than the specified maximum sequence length for this model (545 > 512). Running this sequence through the model will result in indexing errors


<tf.Tensor: shape=(1, 545), dtype=int32, numpy=
array([[ 5331,   120,   776,     8,  6493,    13,     8,   826,  1132,
           10,   366,  3396, 24786, 11359,   345,    47,   166,  1883,
            6,     8, 10836,    18,    18,    60, 29462,    30,     8,
         1189,    13,    48,  5677,    18,    18,  1647,  3737,     3,
            9,  7693,     3,     9,  2917,    12,     3,     9, 15612,
           23,   157,    31,     7, 18640,    15,     5,    94,    19,
           46,  2016,  1023,    10,  5330,  1545,    21,     8,  1726,
           57,    27,    52,     9, 16755,    29,     6,   113,     3,
        23800,   224, 10080,   930,    38,   391, 22177, 13845,   476,
           31,   134,   272,  5359,   476,    11,  1853,     3, 15258,
        12017, 18400,   549,  8087,   134,     6,     8,   577,    47,
           80,    13, 17963,    31,     7,   167,  2581,  9135,   277,
            6,    11,   365,   925,    26,  3186,  2318,  3493,    31,
            7,  2212,    34, 

In [35]:
model.generate(tf.constant([input['input_ids']]), max_new_tokens=50)

<tf.Tensor: shape=(1, 3), dtype=int32, numpy=array([[   0, 1465,    1]], dtype=int32)>

In [37]:
tokenizer.decode(
        model.generate(tf.constant([input['input_ids']]), max_new_tokens=50)[0],
        skip_special_tokens=True
    )

'positive'

And what was the real sentiment? Remember in this dataset `0` is negative and `1` is positive

In [38]:
label

1

## Summarize News without Prompt Engineering

In this use case, you will be generating a summary of news with Flan-T5.

Let's upload some simple dialogues from the dialogsum Hugging Face dataset. This dataset contains 10,000+ articles with the corresponding manually labeled summaries.

In [42]:
huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)

Downloading readme:   0%|          | 0.00/4.65k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/442k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

In [43]:
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
})

Print a couple of dialogues with their baseline summaries.

In [47]:
def get_random_dialogue_and_summary():
  random_index = np.random.randint(1, 10000)
  random_dialogue = dataset['train'][random_index]['dialogue']
  summary = dataset['train'][random_index]['summary']
  return random_dialogue, summary

random_dialogue, summary = get_random_dialogue_and_summary()

dash_line = '-'.join('' for x in range(100))

print(f'Dialogue: \n\n{random_dialogue}')
print(dash_line)
print(f'Summary: {summary}')

Dialogue: 

#Person1#: Good morning! What can I do for you, Madam?
#Person2#: Good morning! I'm looking for a coat.
#Person1#: What color would you like?
#Person2#: Could you show me some? I'd like a middle sized red coat.
#Person1#: Sorry. We haven't anything in your size.
#Person2#: Do you have a smaller size?
#Person1#: I'm sorry. The small size coats have just been sold out. What about the blue one? It looks nice and maybe fits you.
#Person2#: Well, may I try it on?
#Person1#: Yes, please.
#Person2#: It seems nice on me. How much is it?
#Person1#: 168 yuan.
#Person2#: OK. Here is 170 yuan. You keep the change please!
#Person1#: Thanks.
---------------------------------------------------------------------------------------------------
Summary: #Person2# is looking for a middle-sized red coat. #Person1# doesn't have it and recommends a blue one. #Person2# takes it.


Test the tokenizer encoding and decoding a simple sentence:

Now it's time to explore how well the base LLM summarizes a dialogue without any prompt engineering. **Prompt engineering** is an act of a human changing the **prompt** (input) to improve the response for a given task.

In [50]:
for i in range(3):
    dialogue, summary = get_random_dialogue_and_summary()
    inputs = tokenizer(dialogue)
    output = tokenizer.decode(
        model.generate(tf.constant([inputs['input_ids']]), max_new_tokens=50)[0],
        skip_special_tokens=True
    )

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'Dialogue:\n{dialogue}')
    print(dash_line)
    print(f'Summary:\n{summary}')
    print(dash_line)
    print(f'Model Summary - Without prompt engineering:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
Dialogue:
#Person1#: Can you help me pick out a gift for my daughter? 
#Person2#: She might like a laptop computer. 
#Person1#: Yes, that sounds like a good idea. 
#Person2#: Might I suggest a Mac? 
#Person1#: Okay. How much? 
#Person2#: Well, a 15-inch Pro is $2, 100. 
#Person1#: Oh, that sounds great. I'll take it. 
#Person2#: Great. How would you like to pay for it? 
#Person1#: Here's my VISA. 
#Person2#: Let me ring you up. Okay, sign here, please. 
#Person1#: Everything I need is in this box? 
#Person2#: It'll take her only a few minutes to get online. 
#Person1#: Thank you for your help. 
#Person2#: So long. Thank you for shopping here. 
---------------------------------------------------------------------------------------------------
Summary:
#Person2# recommends a Mac c

You can see that the guesses of the model make some sense, but it doesn't seem to be sure what task it is supposed to accomplish. Seems it just makes up the next sentence in the dialogue. Prompt engineering can help here.

## Summarize Dialogue with an Instruction Prompt

Prompt engineering is an important concept in using foundation models for text generation.

<a name='3.1'></a>
### 3.1 - Zero Shot Inference with an Instruction Prompt

In order to instruct the model to perform a task - summarize a dialogue - you can take the dialogue and convert it into an instruction prompt. This is often called **zero shot inference**.  
Wrap the dialogue in a descriptive instruction and see how the generated text will change:

In [51]:
for i in range(3):
    dialogue, summary = get_random_dialogue_and_summary()
    prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
    """
    inputs = tokenizer(prompt)
    output = tokenizer.decode(
        model.generate(tf.constant([inputs['input_ids']]), max_new_tokens=50)[0],
        skip_special_tokens=True
    )

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'Dialogue:\n{dialogue}')
    print(dash_line)
    print(f'Summary:\n{summary}')
    print(dash_line)
    print(f'Model Summary - Zero shot inference prompt engineering:\n{output}\n')


---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
Dialogue:
#Person1#: I went to Roth's to interview her, you know, Edith Roth is the author of a book about moths.
#Person2#: Is she? I thought she was a mathematician.
#Person1#: I'm so thirsty.
#Person2#: Are you? I thought you had something to drink at her home.
#Person1#: No. Edith didn't give anything to drink.
#Person2#: I'll buy you a drink.
#Person1#: Oh! Thank you.
---------------------------------------------------------------------------------------------------
Summary:
#Person1# went to interview Roth and is thirsty. #Person2#'ll buy #Person1# a drink.
---------------------------------------------------------------------------------------------------
Model Summary - Zero shot inference prompt engineering:
The author of a book about moths, Edith Roth, is a mathematicia

This is much better! But the model still does not pick up on the nuance of the conversations though.

## Summarize Dialogue with One Shot and Few Shot Inference

**One shot and few shot inference** are the practices of providing an LLM with either one or more full examples of prompt-response pairs that match your task - before your actual prompt that you want completed. This is called "in-context learning" and puts your model into a state that understands your specific task.  

## One Shot Inference



In [59]:
def make_prompt_and_return_real_summary(number_of_shots):
    prompt = ''
    for i in range(number_of_shots):
        dialogue, summary = get_random_dialogue_and_summary()

        # The stop sequence '{summary}\n\n\n' is important for FLAN-T5. Other models may have their own preferred stop sequence.
        prompt += f"""
Dialogue:

{dialogue}

Summary:

{summary}


"""

    dialogue_to_analise , real_summary = get_random_dialogue_and_summary()

    prompt += f"""
Dialogue:

{dialogue_to_analise}

Summary:

"""

    return prompt, real_summary

Construct the prompt to perform one shot inference:

In [60]:
one_shot_prompt, real_summary = make_prompt_and_return_real_summary(1)

print(one_shot_prompt)


Dialogue:

#Person1#: Mom, can we get cable TV or a satellite dish?
#Person2#: It costs money, dear. What's wrong with the regular television stations?
#Person1#: The shows are dull and they're too many advertisements.
#Person2#: Well, you already watch too much TV instead of doing your homework, anyway.
#Person1#: There're educational stations too. I could learn while I watched TV.
#Person2#: Well, that's true, but you'd probably only watch cartoons and action movies.
#Person1#: No I wouldn't. . . can't we get cable? Everybody has cable.
#Person2#: Well, if everybody jumped off a bridge, would you jump too?
#Person1#: Mom!!! Please. All my friends have had it for years.
#Person2#: Get new friends.
#Person1#: Why are you always so mean?
#Person2#: Because you'd end up spoiled rotten if I wasn't.
#Person1#: I could help pay for it.
#Person2#: Let's see how your grades are this semester, and maybe I'll talk to your father about it.
#Person1#: O. K. Thanks, Mom!

Summary:

#Person1# want

Now pass this prompt to perform the one shot inference:

In [63]:
for i in range (3):
  one_shot_prompt, real_summary = make_prompt_and_return_real_summary(1)
  inputs = tokenizer(one_shot_prompt)
  output = tokenizer.decode(
      model.generate(tf.constant([inputs['input_ids']]), max_new_tokens=50)[0],
      skip_special_tokens=True
  )

  print(dash_line)
  print(f'Example {i + 1}')
  print(dash_line)
  print(f'Dialogue:\n{one_shot_prompt}')
  print(dash_line)
  print(f'Summary:\n{real_summary}')
  print(dash_line)
  print(f'Model Summary - One shot inference prompt engineering:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
Dialogue:

Dialogue:

#Person1#: Good morning, young lady. You can call me Oma. Do you see anything you like? 
#Person2#: Yes. Many things! I especially love this beautiful quilt. 
#Person1#: That quilt was passed down to me from my oma in Holland. 
#Person2#: It sounds like a special quilt. Why do you want to sell it? 
#Person1#: Well, this home is too big for me now, so I'm moving to an apartment that is much smaller. Therefore, I need to part with a few things. 
#Person2#: Oh, I see. Umm, how much do you want for the quilt? 
#Person1#: Is fifteen dollars OK? 

Summary:

Oma wants to sell the quilt #Person1# likes because she needs to part with things before moving to a smaller apartment even though it's special.



Dialogue:

#Person1#: We're nearly there. Will we be allowed t

### Few Shot Inference

Let's explore few shot inference by adding two more full dialogue-summary pairs to your prompt.

In [64]:
for i in range (3):
  few_shot_prompt, real_summary = make_prompt_and_return_real_summary(5)
  inputs = tokenizer(few_shot_prompt)
  output = tokenizer.decode(
      model.generate(tf.constant([inputs['input_ids']]), max_new_tokens=50)[0],
      skip_special_tokens=True
  )

  print(dash_line)
  print(f'Example {i + 1}')
  print(dash_line)
  print(f'Dialogue:\n{few_shot_prompt}')
  print(dash_line)
  print(f'Summary:\n{real_summary}')
  print(dash_line)
  print(f'Model Summary - Few shot inference prompt engineering:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
Dialogue:

Dialogue:

#Person1#: Doctor, here's my report for my IVP examination.
#Person2#: Let me have a look. Can you see there's a stone in your kidney?
#Person1#: Oh, yes, is it dangerous?
#Person2#: No, but it's painful.
#Person1#: Do I have to have an operation?
#Person2#: No, it's not necessary since the stone is not big.
#Person1#: Good, I can still attend the Olympic Games.
#Person2#: Yes, you're lucky. But you should go to the Ultrasonic Department to disperse the stone. Meanwhile, I'll give you some herbal medicine.
#Person1#: Oh, I've heard a lot about the Chinese herbal medicine. I believe it will work.

Summary:

#Person2#, the doctor, reads the IVP examination report of #Person1# and suggests that #Person1# should disperse the stone in the body and take some herba

In this case, few shot did not provide much of an improvement over one shot inference.  And, anything above 5 or 6 shot will typically not help much, either.  Also, you need to make sure that you do not exceed the model's input-context length which, in our case, if 512 tokens.  Anything above the context length will be ignored.

However, you can see that feeding in at least one full example (one shot) provides the model with more information and qualitatively improves the summary overall.

## Configuration Parameters

In [66]:
generation_config = GenerationConfig(max_new_tokens=100, do_sample=True, temperature=2.0)

inputs = tokenizer(few_shot_prompt)
output = tokenizer.decode(
    model.generate(tf.constant([inputs['input_ids']]), generation_config=generation_config)[0],
    skip_special_tokens=True
)

print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{real_summary}\n')

---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
Person1 provides lodging at the Holiday Inn Hotel for the evening and weekends. Booking a single room will cost $20 one night with 2% discounts in that one night price per bedroom for Friday morning starting next weekend. Number of nights, including meals and meals per stay, may also include service Charge (Codest.ri) when booking during these other periods during the day. Room has received an "Acknowloader". All parties booked, or should change prior of their booking
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
Jones phones to book a single room for Friday night and Saturday night. #Person1# helps make the reservation and charges Jones $ 80.



Comments related to the choice of the parameters in the code cell above:
- Choosing `max_new_tokens=10` will make the output text too short, so the dialogue summary will be cut.
- Putting `do_sample = True` and changing the temperature value you get more flexibility in the output.

As you can see, prompt engineering can take you a long way for this use case, but there are some limitations.