In this exercise, you will perform prompt engineering on a dialogue summarization task using [Flan-T5](https://huggingface.co/google/flan-t5-large) and the [dialogsum dataset](https://huggingface.co/datasets/knkarthick/dialogsum). You will explore how different prompts affect the output of the model, and compare zero-shot and few-shot inferences. <br/>
Complete the code in the cells below.

### 1. Set up Required Dependencies

In [27]:
!pip install datasets -q

In [28]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig
from datasets import load_dataset

### 2. Explore the Dataset

In [29]:
from datasets import load_dataset

dataset = load_dataset('knkarthick/dialogsum')

Print several dialogues with their baseline summaries.

In [30]:
example_indices = [0, 42, 800]
dash_line = '-' * 100

for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example', i + 1)
    print(dash_line)
    print('INPUT DIALOGUE:')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)
    print('BASELINE HUMAN SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)
    print()

----------------------------------------------------------------------------------------------------
Example 1
----------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to 

### 3. Summarize Dialogues without Prompt Engineering

Load the Flan-T5-large model and its tokenizer.

In [31]:
model_name = 'google/flan-t5-large'

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

**Exercise**: Use the pre-trained model to summarize the example dialogues without any prompt engineering. Use the `model.generate()` function with `max_new_tokens=50`.

In [32]:
### WRITE YOUR CODE HERE
for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example', i + 1)
    print(dash_line)

    input_text = dataset['test'][index]['dialogue']
    input = tokenizer(input_text, return_tensors='pt')

    response = model.generate(**input,max_new_tokens = 50)
    response_text = tokenizer.decode(response[0], skip_special_tokens=True)

    print('INPUT DIALOGUE:')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)

    print('BASELINE HUMAN SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)

    print('AI SUMMARY:')
    print(response_text)
    print(dash_line)

    print()



----------------------------------------------------------------------------------------------------
Example 1
----------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to 

You can see that the model generations make some sense, but the model doesn't seem to be sure what task it is supposed to accomplish and it often just makes up the next sentence in the dialogue. Prompt engineering can help here.

### 4. Summarize Dialogues with Instruction Prompts

In order to instruct the model to perform a task (e.g., summarize a dialogue), you can take the dialogue and convert it into an instruction prompt. This is often called **zero-shot inference**.

**Exercise**: Wrap the dialogues in a descriptive instruction (e.g., "Summarize the following conversation."), and examine how the generated text changes.

In [33]:
### WRITE YOUR CODE HERE
for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example', i + 1)
    print(dash_line)

    input_text = dataset['test'][index]['dialogue']
    instruction = "Summarize the following conversation."

    prompt = instruction + input_text

    input = tokenizer(prompt, return_tensors='pt')

    response = model.generate(**input,max_new_tokens = 50)
    response_text = tokenizer.decode(response[0], skip_special_tokens=True)

    print('INPUT DIALOGUE:')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)

    print('BASELINE HUMAN SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)

    print('AI SUMMARY:')
    print(response_text)
    print(dash_line)

    print()


----------------------------------------------------------------------------------------------------
Example 1
----------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to 

This is much better! But the model still does not pick up on the nuance of the conversations though.

**Exercise:** Experiment with the prompt text and see how it influences the generated output. Do the inferences change if you end the prompt with just empty string vs. `Summary: `?

In [34]:
### WRITE YOUR CODE HERE
for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example', i + 1)
    print(dash_line)

    input_text = dataset['test'][index]['dialogue']
    instruction = "Write a concise summary of the conversation, focusing only on the key points and outcomes."

    prompt = instruction + input_text

    input = tokenizer(prompt, return_tensors='pt')

    response = model.generate(**input,max_new_tokens = 50)
    response_text = tokenizer.decode(response[0], skip_special_tokens=True)

    print('INPUT DIALOGUE:')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)

    print('BASELINE HUMAN SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)

    print('AI SUMMARY:')
    print(response_text)
    print(dash_line)

    print()



----------------------------------------------------------------------------------------------------
Example 1
----------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to 

In [35]:
for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example', i + 1)
    print(dash_line)

    input_text = dataset['test'][index]['dialogue']
    instruction = "Summarize the conversation in a structured way, listing each person's name followed by their main points."

    prompt = instruction + input_text

    input = tokenizer(prompt, return_tensors='pt')

    response = model.generate(**input,max_new_tokens = 50)
    response_text = tokenizer.decode(response[0], skip_special_tokens=True)

    print('INPUT DIALOGUE:')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)

    print('BASELINE HUMAN SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)

    print('AI SUMMARY:')
    print(response_text)
    print(dash_line)

    print()

----------------------------------------------------------------------------------------------------
Example 1
----------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to 

In [36]:
for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example', i + 1)
    print(dash_line)

    input_text = dataset['test'][index]['dialogue']
    instruction = "Summarize the following conversation."

    prompt = instruction + input_text+ "\n\nSummary : "

    input = tokenizer(prompt, return_tensors='pt')

    response = model.generate(**input,max_new_tokens = 50)
    response_text = tokenizer.decode(response[0], skip_special_tokens=True)

    print('INPUT DIALOGUE:')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)

    print('BASELINE HUMAN SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)

    print('AI SUMMARY:')
    print(response_text)
    print(dash_line)

    print()

----------------------------------------------------------------------------------------------------
Example 1
----------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to 

Observation : Having a clear prompt that gives proper instructions for the model to follow gives better results. Having "Summary:" at the end of the prompt gives slightly better results but still struggles to capture the nuances.

**Exercise:** Flan-T5 has many prompt templates that are published for certain tasks [here](https://github.com/google-research/FLAN/blob/main/flan/v2/templates.py). Try using its pre-built prompts for dialogue summarization (e.g., the ones under the `"samsum"` key) and see how they influence the outputs.


In [37]:
### WRITE YOUR CODE HERE

prompt_templates = [
    ("{dialogue}\n\nBriefly summarize that dialogue.", "{summary}"),
    ("Here is a dialogue:\n{dialogue}\n\nWrite a short summary!", "{summary}"),
    ("Dialogue:\n{dialogue}\n\nWhat is a summary of this dialogue?", "{summary}"),
    ("{dialogue}\n\nWhat was that dialogue about, in two sentences or less?", "{summary}"),
    ("Here is a dialogue:\n{dialogue}\n\nWhat were they talking about?", "{summary}"),
    ("Dialogue:\n{dialogue}\nWhat were the main points in that conversation?", "{summary}"),
    ("Dialogue:\n{dialogue}\nWhat was going on in that conversation?", "{summary}"),
]

for i, index in enumerate(example_indices):
    print(dash_line)
    print(f"Example {i + 1}")
    print(dash_line)

    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    for j, (prompt_template, expected_output_type) in enumerate(prompt_templates):
        # Format the prompt
        if expected_output_type == "{summary}":
            prompt = prompt_template.format(dialogue=dialogue)
        else:
            prompt = prompt_template.format(summary=summary)

        # Tokenize and generate output
        inputs = tokenizer(prompt, return_tensors='pt', truncation=True)
        outputs = model.generate(**inputs, max_new_tokens=100)
        generated = tokenizer.decode(outputs[0], skip_special_tokens=True)

        # Print results clearly
        print(f"\n--- Prompt Style {j + 1} ---")
        print("\nPROMPT:\n" + "-"*50)
        print(prompt)
        print("\nGENERATED OUTPUT:\n" + "-"*50)
        print(generated)
        print("\n" + dash_line)


----------------------------------------------------------------------------------------------------
Example 1
----------------------------------------------------------------------------------------------------

--- Prompt Style 1 ---

PROMPT:
--------------------------------------------------
#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communic

Notice that the prompts from Flan-T5 did help, but the model still struggles to pick up on the nuance of the conversation in some cases. This is what you will try to solve with few-shot inferencing.

### 5. Summarize Dialogues with a Few-Shot Inference

**Few-shot inference** is the practice of providing an LLM with several examples of prompt-response pairs that match your task - before your actual prompt that you want completed. This is called "in-context learning" and puts your model into a state that understands your specific task.

**Exercise:** Build a function that takes a list of `in_context_example_indexes`, generates a prompt with the examples, then at the end appends the prompt that you want the model to complete (`test_example_index`). Use the same Flan-T5 prompt template from Section 3. Make sure to separate between the examples with `"\n\n\n"`.

In [38]:
def make_prompt(in_context_example_indices, test_example_index):
    ### WRITE YOUR CODE HERE
    prompt_few_shot = ""
    for i, index in enumerate(in_context_example_indices):
      input_text_dialogue = dataset['test'][index]['dialogue']
      input_text_summary = dataset['test'][index]['summary']

      example_prompt = f"{input_text_dialogue}\nSummary: {input_text_summary}"
      prompt_few_shot += example_prompt + "\n\n\n"

    input_text = dataset['test'][test_example_index]['dialogue']
    prompt = prompt_few_shot + f"{input_text}\n\nSummary : "

    return prompt

In [39]:
in_context_example_indices = [0, 10, 20]
test_example_index = 800

few_shot_prompt = make_prompt(in_context_example_indices, test_example_index)
print(few_shot_prompt)

#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to communicate with their clients.
#Person1#: They will just have to change their communication methods. I don't want any - one using Instant Messaging in this office. It wastes too much time! Now, please continue with the memo. Wh

Now pass this prompt to the model perform a few shot inference:

In [40]:
### WRITE YOUR CODE HERE
input = tokenizer(few_shot_prompt, return_tensors='pt')
response_text = model.generate(**input, max_new_tokens=50)

response = tokenizer.decode(response_text[0], skip_special_tokens=True)
print(response)


Token indices sequence length is longer than the specified maximum sequence length for this model (1160 > 512). Running this sequence through the model will result in indexing errors


#Person1#'s father keeps talking about his family in New Zealand. #Person2#'s father's brother is his brother-in-law. #Person1#'s uncle Bill, his wife and two


**Exercise:** Experiment with the few-shot inferencing:
- Choose different dialogues - change the indices in the `in_context_example_indices` list and `test_example_index` value.
- Change the number of examples. Be sure to stay within the model's 512 context length, however.

How well does few-shot inference work with other examples?

In [41]:
### WRITE YOUR CODE HERE
in_context_example_indices = [7, 19, 1]
test_example_index = 550

few_shot_prompt = make_prompt(in_context_example_indices, test_example_index)
print(few_shot_prompt)

input = tokenizer(few_shot_prompt, return_tensors='pt')
response_text = model.generate(**input, max_new_tokens=50)

response = tokenizer.decode(response_text[0], skip_special_tokens=True)
print(response)


#Person1#: Kate, you never believe what's happened.
#Person2#: What do you mean?
#Person1#: Masha and Hero are getting divorced.
#Person2#: You are kidding. What happened?
#Person1#: Well, I don't really know, but I heard that they are having a separation for 2 months, and filed for divorce.
#Person2#: That's really surprising. I always thought they are well matched. What about the kids? Who get custody?
#Person1#: Masha, it seems quiet and makable, no quarrelling about who get the house and stock and then contesting the divorce with other details worked out.
#Person2#: That's the change from all the back stepping we usually hear about. Well, I still can't believe it, Masha and Hero, the perfect couple. When would they divorce be final?
#Person1#: Early in the New Year I guess.
Summary: #Person1# tells Kate that Masha and Hero are getting a peaceful divorce. Kate feels surprised and asks about their kids.


#Person1#: What's wrong with you? Why are you scratching so much?
#Person2#: I 

Analysis : Even though few shot inference clearly instructs the model to do a certain task i.e, Summarize, the summary generated by the model still falls short of the expectation. It is very vague and doesn't clearly provide the main points of each person.

### 6. Generative Configuration Parameters for Inference

You can change the configuration parameters of the `generate()` method to see a different output from the LLM. So far the only parameter that you have been setting was `max_new_tokens=50`, which defines the maximum number of tokens to generate. A convenient way of organizing the configuration parameters is to use `GenerationConfig` class. By setting the parameter `do_sample = True`, you can activate various decoding strategies which influence the next token from the probability distribution over the entire vocabulary. You can then adjust the outputs changing `temperature` and other parameters (such as `top_k` and `top_p`). A full list of available parameters can be found in the [Hugging Face Generation documentation](https://huggingface.co/docs/transformers/v4.29.1/en/main_classes/text_generation#transformers.GenerationConfig).

**Exercise:** Change the configuration parameters to investigate their influence on the output. Analyze your results.

In [42]:
### WRITE YOUR CODE HERE
from transformers import GenerationConfig

gen_config = GenerationConfig(
    max_new_tokens=80,
    do_sample=True,
    temperature=0.3,
    top_k=50,
    top_p=0.9,
    repetition_penalty=1.1
)

outputs = model.generate(**inputs, generation_config=gen_config)

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)


Dad keeps talking about family in New Zealand. They are Uncle Bill, his wife and two of their daughters. Sarah and Jane are cousins. They want to travel to Europe next year and will visit us at the same Ae.


In [43]:
gen_config = GenerationConfig(
    max_new_tokens=80,
    do_sample=True,
    temperature=0.6,
    top_k=50,
    top_p=0.9,
    repetition_penalty=1.1
)

outputs = model.generate(**inputs, generation_config=gen_config)

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)


The father of the daughter is talking about her uncle Bill, his wife and two of their daughters in New Zealand.


In [44]:
gen_config = GenerationConfig(
    max_new_tokens=80,
    do_sample=True,
    temperature=0.9,
    top_k=50,
    top_p=0.9,
    repetition_penalty=1.1
)

outputs = model.generate(**inputs, generation_config=gen_config)

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)


Sarah and Jane are cousins of #Person1#. #Person2#'s uncle Bill is from New Zealand. #Person1#'s uncle Bill is cousins with Jack and his wife.


### Analysis

In this experiment, `top_k` and `top_p` were kept constant while `temperature` was increased from **0.3 → 0.6 → 0.9**.  
- **Temperature = 0.3** - Output was concise, factually accurate, and closely followed the dialogue.  
- **Temperature = 0.6** - Output became shorter and omitted some details, showing increased randomness.  
- **Temperature = 0.9** - Output introduced factual errors (e.g., “Bill is cousins with Jack”), indicating that higher temperature boosts creativity but also increases the risk of hallucinations.