In this exercise, you will perform prompt engineering on a dialogue summarization task using [Flan-T5](https://huggingface.co/google/flan-t5-large) and the [dialogsum dataset](https://huggingface.co/datasets/knkarthick/dialogsum). You will explore how different prompts affect the output of the model, and compare zero-shot and few-shot inferences. <br/>
Complete the code in the cells below.

### 1. Set up Required Dependencies

In [1]:
!pip install datasets -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m510.5/510.5 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [34]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig
from datasets import load_dataset

### 2. Explore the Dataset

In [33]:
from datasets import load_dataset

dataset = load_dataset('knkarthick/dialogsum')

Print several dialogues with their baseline summaries.

In [26]:
example_indices = [0, 42, 800]
dash_line = '-' * 100

for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example', i + 1)
    print(dash_line)
    print('INPUT DIALOGUE:')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)
    print('BASELINE HUMAN SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)
    print()

----------------------------------------------------------------------------------------------------
Example 1
----------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to 

### 3. Summarize Dialogues without Prompt Engineering

Load the Flan-T5-large model and its tokenizer.

In [35]:
model_name = 'google/flan-t5-large'

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

**Exercise**: Use the pre-trained model to summarize the example dialogues without any prompt engineering. Use the `model.generate()` function with `max_new_tokens=50`.

In [27]:
### WRITE YOUR CODE HERE
example_indices = [0, 42, 800]
dash_line = '-' * 100

for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example', i + 1)
    print(dash_line)
    print('INPUT DIALOGUE:')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)
    print('BASELINE HUMAN SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)
    print('GENERATED SUMMARY:')
    #print(dataset['test'][index]['summary'])
    input_ids = tokenizer(dataset['test'][index]['dialogue'], return_tensors="pt").input_ids
    outputs = model.generate(input_ids,max_new_tokens=50)
    print(tokenizer.decode(outputs[0]))
    print(dash_line)
    print()




----------------------------------------------------------------------------------------------------
Example 1
----------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to 

You can see that the model generations make some sense, but the model doesn't seem to be sure what task it is supposed to accomplish and it often just makes up the next sentence in the dialogue. Prompt engineering can help here.

### 4. Summarize Dialogues with Instruction Prompts

In order to instruct the model to perform a task (e.g., summarize a dialogue), you can take the dialogue and convert it into an instruction prompt. This is often called **zero-shot inference**.

**Exercise**: Wrap the dialogues in a descriptive instruction (e.g., "Summarize the following conversation."), and examine how the generated text changes.

In [28]:
### WRITE YOUR CODE HERE
example_indices = [0, 42, 800]
dash_line = '-' * 100

for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example', i + 1)
    print(dash_line)
    print('INPUT DIALOGUE:')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)
    print('BASELINE HUMAN SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)
    print('GENERATED SUMMARY:')
    input_ids = tokenizer("Summarize the following conversation: "+dataset['test'][index]['dialogue'], return_tensors="pt").input_ids
    outputs = model.generate(input_ids,max_new_tokens=50)
    print(tokenizer.decode(outputs[0]))
    print(dash_line)
    print()




----------------------------------------------------------------------------------------------------
Example 1
----------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to 

This is much better! But the model still does not pick up on the nuance of the conversations though.

**Exercise:** Experiment with the prompt text and see how it influences the generated output. Do the inferences change if you end the prompt with just empty string vs. `Summary: `?

Answer: For the first example the inference changed.

In [31]:
### WRITE YOUR CODE HERE
example_indices = [0, 42, 800]
dash_line = '-' * 100

for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example', i + 1)
    print(dash_line)
    print('INPUT DIALOGUE:')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)
    print('BASELINE HUMAN SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)
    print('GENERATED SUMMARY (ending prompt with empty string ):')
    input_ids = tokenizer("Summarize the following conversation: "+dataset['test'][index]['dialogue'], return_tensors="pt").input_ids
    outputs = model.generate(input_ids,max_new_tokens=50)
    print(tokenizer.decode(outputs[0]))
    print(dash_line)
    print('GENERATED SUMMARY (ending prompt with "Summary:" ):')
    input_ids = tokenizer("Summarize the following conversation: "+dataset['test'][index]['dialogue']+" Summary:", return_tensors="pt").input_ids
    outputs = model.generate(input_ids,max_new_tokens=50)
    print(tokenizer.decode(outputs[0]))
    print(dash_line)
    print()




----------------------------------------------------------------------------------------------------
Example 1
----------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to 

**Exercise:** Flan-T5 has many prompt templates that are published for certain tasks [here](https://github.com/google-research/FLAN/blob/main/flan/v2/templates.py). Try using its pre-built prompts for dialogue summarization (e.g., the ones under the `"samsum"` key) and see how they influence the outputs.


In [32]:
### WRITE YOUR CODE HERE
### WRITE YOUR CODE HERE
example_indices = [0, 42, 800,100,200]
dash_line = '-' * 100

for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example', i + 1)
    print(dash_line)
    print('INPUT DIALOGUE:')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)
    print('BASELINE HUMAN SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)
    print('GENERATED SUMMARY with prompt template-1')
    input_ids = tokenizer(dataset['test'][index]['dialogue']+"\n\nBriefly summarize that dialogue.", return_tensors="pt").input_ids
    outputs = model.generate(input_ids,max_new_tokens=50)
    print(tokenizer.decode(outputs[0]))
    print(dash_line)
    print('GENERATED SUMMARY with prompt template-2')
    input_ids = tokenizer("Here is a dialogue:\n "+dataset['test'][index]['dialogue']+"\n\nWrite a short summary!", return_tensors="pt").input_ids
    outputs = model.generate(input_ids,max_new_tokens=50)
    print(tokenizer.decode(outputs[0]))
    print(dash_line)
    print('GENERATED SUMMARY with prompt template-3')
    input_ids = tokenizer("Dialogue:\n"+dataset['test'][index]['dialogue']+"\nWhat were the main points in that conversation?", return_tensors="pt").input_ids
    outputs = model.generate(input_ids,max_new_tokens=50)
    print(tokenizer.decode(outputs[0]))





----------------------------------------------------------------------------------------------------
Example 1
----------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to 

Notice that the prompts from Flan-T5 did help, but the model still struggles to pick up on the nuance of the conversation in some cases. This is what you will try to solve with few-shot inferencing.

### 5. Summarize Dialogues with a Few-Shot Inference

**Few-shot inference** is the practice of providing an LLM with several examples of prompt-response pairs that match your task - before your actual prompt that you want completed. This is called "in-context learning" and puts your model into a state that understands your specific task.

**Exercise:** Build a function that takes a list of `in_context_example_indexes`, generates a prompt with the examples, then at the end appends the prompt that you want the model to complete (`test_example_index`). Use the same Flan-T5 prompt template from Section 3. Make sure to separate between the examples with `"\n\n\n"`.

In [99]:
# I used a different Flan-T5 prompt template to produce better result. I mention word limit of 50 words.
def make_prompt(in_context_example_indices, test_example_index):
    ### WRITE YOUR CODE HERE
    prompt = ''

    for index in in_context_example_indices:
        prompt += 'Summarize the following conversation in less than 50 words:\n'+dataset['test'][index]['dialogue'] + '\nSummary: '+ dataset['test'][index]['summary']+ '\n\n\n'

    prompt +=  'Summarize the following conversation:\n'+dataset['test'][test_example_index]['dialogue'] + '\nSummary: '
    return prompt

In [116]:
#in_context_example_indices = [0, 10, 20] Exceesing number of tokens
in_context_example_indices = [10, 20]
test_example_index = 800

few_shot_prompt = make_prompt(in_context_example_indices, test_example_index)
print(few_shot_prompt)

Summarize the following conversation in less than 50 words:
#Person1#: Happy Birthday, this is for you, Brian.
#Person2#: I'm so happy you remember, please come in and enjoy the party. Everyone's here, I'm sure you have a good time.
#Person1#: Brian, may I have a pleasure to have a dance with you?
#Person2#: Ok.
#Person1#: This is really wonderful party.
#Person2#: Yes, you are always popular with everyone. and you look very pretty today.
#Person1#: Thanks, that's very kind of you to say. I hope my necklace goes with my dress, and they both make me look good I feel.
#Person2#: You look great, you are absolutely glowing.
#Person1#: Thanks, this is a fine party. We should have a drink together to celebrate your birthday
Summary: #Person1# attends Brian's birthday party. Brian thinks #Person1# looks great and charming.


Summarize the following conversation in less than 50 words:
#Person1#: What's wrong with you? Why are you scratching so much?
#Person2#: I feel itchy! I can't stand it an

Now pass this prompt to the model perform a few shot inference:

In [118]:
### WRITE YOUR CODE HERE
input_ids = tokenizer(few_shot_prompt, return_tensors="pt").input_ids
outputs = model.generate(input_ids,max_new_tokens=50)
print('Generated Summary: '+tokenizer.decode(outputs[0]))
print('Actual Summary: '+dataset['test'][test_example_index]['summary'])

Generated Summary: <pad> #Person1# is talking to his father about his uncle Bill, his wife and two of their daughters.</s>
Actual Summary: #Person2# tells #Person1# about the relationships between their family and the uncle Bill's, who will visit them next year.


**Exercise:** Experiment with the few-shot inferencing:
- Choose different dialogues - change the indices in the `in_context_example_indices` list and `test_example_index` value.
- Change the number of examples. Be sure to stay within the model's 512 context length, however.

How well does few-shot inference work with other examples?

In [119]:
### WRITE YOUR CODE HERE
in_context_example_indices = [15, 23]
test_example_index = 812

few_shot_prompt = make_prompt(in_context_example_indices, test_example_index)

input_ids = tokenizer(few_shot_prompt, return_tensors="pt").input_ids
outputs = model.generate(input_ids,max_new_tokens=50)
print('Generated Summary: '+tokenizer.decode(outputs[0]))
print('Actual Summary: '+dataset['test'][test_example_index]['summary'])


Generated Summary: <pad> #Person2# is looking for a job. He is currently paid 1,800 yuan a month. #Person1# is willing to pay him 2,500 yuan a month.</s>
Actual Summary: Mr. Brown decides to hire #Person2# with a higher salary and other benefits.


In [120]:
### Experimenting with shorter conversations to increase number of examples
in_context_example_indices = [555,556,557]
test_example_index = 812

few_shot_prompt = make_prompt(in_context_example_indices, test_example_index)

input_ids = tokenizer(few_shot_prompt, return_tensors="pt").input_ids
outputs = model.generate(input_ids,max_new_tokens=50)
print('Generated Summary: '+tokenizer.decode(outputs[0]))
print('Actual Summary: '+dataset['test'][test_example_index]['summary'])

Generated Summary: <pad> #Person2# is looking for a job. #Person1# is looking for a secretary. #Person2# is currently paid 1,800 yuan a month. #Person1# is looking for
Actual Summary: Mr. Brown decides to hire #Person2# with a higher salary and other benefits.


In [121]:
# Seems like prompting with small conversations worsens summarization quality

In [122]:
### Experimenting with big conversation
in_context_example_indices = [1302]
test_example_index = 812

few_shot_prompt = make_prompt(in_context_example_indices, test_example_index)

input_ids = tokenizer(few_shot_prompt, return_tensors="pt").input_ids
outputs = model.generate(input_ids,max_new_tokens=50)
print(tokenizer.decode(outputs[0]))
print('Actual Summary: '+dataset['test'][test_example_index]['summary'])

<pad> Mr. Brown is looking for a secretary to work for him.</s>
Actual Summary: Mr. Brown decides to hire #Person2# with a higher salary and other benefits.


### 6. Generative Configuration Parameters for Inference

You can change the configuration parameters of the `generate()` method to see a different output from the LLM. So far the only parameter that you have been setting was `max_new_tokens=50`, which defines the maximum number of tokens to generate. A convenient way of organizing the configuration parameters is to use `GenerationConfig` class. By setting the parameter `do_sample = True`, you can activate various decoding strategies which influence the next token from the probability distribution over the entire vocabulary. You can then adjust the outputs changing `temperature` and other parameters (such as `top_k` and `top_p`). A full list of available parameters can be found in the [Hugging Face Generation documentation](https://huggingface.co/docs/transformers/v4.29.1/en/main_classes/text_generation#transformers.GenerationConfig).

**Exercise:** Change the configuration parameters to investigate their influence on the output. Analyze your results.

In [135]:
### WRITE YOUR CODE HERE
from transformers import GenerationConfig
config1 = GenerationConfig.from_pretrained(model_name,max_new_tokens=50,do_sample=True,temperature=0.2,top_k=20)
config2 = GenerationConfig.from_pretrained(model_name,max_new_tokens=50,do_sample=True,temperature=0.2,top_p=0.85, num_beams=3)
config3 = GenerationConfig.from_pretrained(model_name,max_new_tokens=50,do_sample=False, num_beams=4,num_beam_groups=2,diversity_penalty=0.2)


In [136]:
in_context_example_indices = [10, 20]
test_example_index = 800

few_shot_prompt = make_prompt(in_context_example_indices, test_example_index)

input_ids = tokenizer(few_shot_prompt, return_tensors="pt").input_ids

o1 = model.generate(input_ids,**config1.to_dict())
print('Generated Summary: '+tokenizer.decode(o1[0]) +'\n')

o2 = model.generate(input_ids,**config2.to_dict())
print('Generated Summary: '+tokenizer.decode(o2[0])+'\n')

o3 = model.generate(input_ids,**config3.to_dict())
print('Generated Summary: '+tokenizer.decode(o3[0])+'\n')


print('Actual Summary: '+dataset['test'][test_example_index]['summary'])

Both `max_new_tokens` (=50) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Generated Summary: <pad> #Person1#'s father, #Person2#, is visiting his family in New Zealand.</s>



Both `max_new_tokens` (=50) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Generated Summary: <pad> #Person1# is talking to his father about his uncle Bill, his wife and two of their daughters.</s>



Both `max_new_tokens` (=50) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Generated Summary: <pad> #Person1#'s father is visiting his family in New Zealand. #Person2#'s uncle Bill, his wife and two of their daughters are his cousins. #Person1#'s cousins are Sarah and

Actual Summary: #Person2# tells #Person1# about the relationships between their family and the uncle Bill's, who will visit them next year.
