# Summarize Dialogue
Here we will do the dialogue summarization task using generative AI. Exploring how the input text affects the output of the model, and perform prompt engineering to direct it towards the task we need. Also performing zero-shot, one-shot and few-shot inferences. Using prompt engineering we can enhance the generative output of LLMs.

## Installing required packages

In [1]:
%pip install --upgrade pip
%pip install --disable-pip-version-check \
    torch==1.13.1 \
    torchdata==0.5.1 --quiet

%pip install \
    transformers==4.27.2 \
    datasets==2.11.0  --quiet


Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement torch==1.13.1 (from versions: 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0)
ERROR: No matching distribution found for torch==1.13.1


Note: you may need to restart the kernel to use updated packages.


Loading the datasets, LLM, tokenizer and configurator.

In [2]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

  from .autonotebook import tqdm as notebook_tqdm


## Summarizing Dialogue without Prompt Engineering
Here we will use the FLAN-T5 Model(LLM) available on Hugging Face. Here is the list of models available on `transformers` :-  https://huggingface.co/docs/transformers/index <br>
Download the dataset form https://huggingface.co/datasets/knkarthick/dialogsum from Hugging Face Dataset. This dataset contains 10K+ dialogues with corresponding manually labeled summaries and topics.

In [3]:
huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)

Found cached dataset csv (C:/Users/sayed/.cache/huggingface/datasets/knkarthick___csv/knkarthick--dialogsum-cd36827d3490488d/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1)
100%|███████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 133.69it/s]


In [4]:
example_indices = [40, 200]

dash_line = '-'.join('' for x in range(100))

for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print('INPUT DIALOGUE:')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)
    print('BASELINE HUMAN SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)
    print()

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------
Exa

Now load the `FLAN-T5 Model`, creating an instance of the `AutoModelForSeq2SeqLM` class with `.from_pretrained()` method. For more info on FLAN-T5 :- https://huggingface.co/docs/transformers/model_doc/flan-t5 

In [5]:
model_name='google/flan-t5-base'

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

config.json: 100%|████████████████████████████████████████████████████████████████████████| 1.40k/1.40k [00:00<?, ?B/s]
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
model.safetensors: 100%|████████████████████████████████████████████████████████████| 990M/990M [01:48<00:00, 9.11MB/s]
generation_config.json: 100%|█████████████████████████████████████████████████████████████████| 147/147 [00:00<?, ?B/s]


To perform encoding and decoding we need to tokenize the text. **Tokenization** is a process of splitting texts into smaller units that can be processed by the models. So we have to download the tokenizer for the FLAN-T5 model using `AutoTokenizer.from_pretrained()` method.

In [6]:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████| 2.54k/2.54k [00:00<?, ?B/s]
spiece.model: 100%|█████████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 9.08MB/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████| 2.42M/2.42M [00:00<00:00, 2.50MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████| 2.20k/2.20k [00:00<?, ?B/s]


We can test the tokenizer for encoding and decoding simple sentance

In [7]:
sentence = "Hello, my name is Ebad! How are you?"
encoded = tokenizer(sentence, return_tensors='pt')
decoded = tokenizer.decode(encoded["input_ids"][0], skip_special_tokens=True)
print('Encoded Sentence :', encoded["input_ids"][0])
print('\nDecoded Sentence :', decoded)

Encoded Sentence : tensor([8774,    6,   82,  564,   19,  262, 5514,   55,  571,   33,   25,   58,
           1])

Decoded Sentence : Hello, my name is Ebad! How are you?


In [8]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']
    
    inputs = tokenizer(dialogue, return_tensors='pt')
    output = tokenizer.decode(model.generate(inputs["input_ids"], max_new_tokens=50)[0],skip_special_tokens=True)
    
    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{dialogue}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - WITHOUT PROMPT ENGINEERING:
Person1: It's ten to nine.

-------------------------------

We can see that the output of the model make some sense, but it doesn't seem to be sure what task it is supposed to accomplish. It just makes up the next sentence in the dialogue.

## Summarizing Dialogue with an Instruction Prompt
Prompt engineering is an act of a human changing the prompt (input) to improve the response for a given task. It is an important concept in using foundation models for text generation. <br>
**Amazon Science :-** https://www.amazon.science/blog/emnlp-prompt-engineering-is-the-new-feature-engineering

### 1. Zero Shot Inference with an Instruction Prompt
In order to instruct the model, we can take the dialogue and convert it into an instruction prompt. This is called **zero shot inference**. Here we have to wrap the dialogue in a descriptive instruction and the generated text will be better. 

In [10]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
    """

    # Input constructed prompt instead of the dialogue.
    inputs = tokenizer(prompt, return_tensors='pt')
    output = tokenizer.decode(model.generate(inputs["input_ids"], max_new_tokens=50)[0],skip_special_tokens=True)
    
    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{prompt}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)    
    print(f'MODEL GENERATION - ZERO SHOT:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Summarize the following conversation.

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

Summary:
    
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - ZERO SHOT:
The train is about to

We can keep experimenting with the prompt to see the changes.

### 2. Zero Shot Inference with the Prompt Template from FLAN-T5
 FLAN-T5 has many prompt templates that are published for certain tasks. Here we will use one of the `pre-built FLAN-T5 prompts`. For more prompts :- https://github.com/google-research/FLAN/tree/main/flan/v2 

In [11]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']
        
    prompt = f"""
Dialogue:

{dialogue}

What was going on?
"""

    inputs = tokenizer(prompt, return_tensors='pt')
    output = tokenizer.decode(model.generate(inputs["input_ids"], max_new_tokens=50)[0],skip_special_tokens=True)

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{prompt}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
    print(dash_line)
    print(f'MODEL GENERATION - ZERO SHOT:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Dialogue:

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

What was going on?

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - ZERO SHOT:
Tom is late for the train.

--------------

Notice that this prompt from FLAN-T5 did help a bit, but still struggles to pick up on the nuance of the conversation. This is what we will try to solve with the few shot inferencing.

## Summarize Dialogue with One Shot and Few Shot Inference
**One shot** and **few shot inference** are the practices of providing an LLM with either one or more full examples of prompt-response pairs that match our task, before our actual prompt that we want completed. This is called **`in-context learning`** and puts the model into a state that understands our specific task.

### 1. One Shot Inference
`make_prompt` is a function that takes a list of `example_indices_full`, generates a prompt with full examples, then at the end appends the prompt which we want the model to complete `example_index_to_summarize`.

In [12]:
def make_prompt(example_indices_full, example_index_to_summarize):
    prompt = ''
    for index in example_indices_full:
        dialogue = dataset['test'][index]['dialogue']
        summary = dataset['test'][index]['summary']
        
        # The stop sequence '{summary}\n\n\n' is important for FLAN-T5. Other models may have their own preferred stop sequence.
        prompt += f"""
Dialogue:

{dialogue}

What was going on?
{summary}


"""
    
    dialogue = dataset['test'][example_index_to_summarize]['dialogue']
    
    prompt += f"""
Dialogue:

{dialogue}

What was going on?
"""
        
    return prompt

In [13]:
example_indices_full = [40]
example_index_to_summarize = 200

one_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)

print(one_shot_prompt)


Dialogue:

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

What was going on?
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.



Dialogue:

#Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It would allow you to make up your own flyers and banners for advertising.
#Person2#: That would be a definite bonus.
#Person1#: You might also want to upgrade your hardware because it is pretty outdated now.
#Person2#: How can we do that?
#Person1#: You'd probably need a faster processor, to begin with. And you also ne

Now we will pass this prompt to perform one shot inference.

In [14]:
summary = dataset['test'][example_index_to_summarize]['summary']

inputs = tokenizer(one_shot_prompt, return_tensors='pt')
output = tokenizer.decode(model.generate(inputs["input_ids"], max_new_tokens=50)[0],skip_special_tokens=True)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - ONE SHOT:\n{output}')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - ONE SHOT:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to add a CD-ROM drive.


### 2. Few Shot Inference
Let's apply few shot inference by adding two more full dialogue-summary pairs to our prompt.

In [15]:
example_indices_full = [40, 80, 120]
example_index_to_summarize = 200

few_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)

print(few_shot_prompt)


Dialogue:

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

What was going on?
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.



Dialogue:

#Person1#: May, do you mind helping me prepare for the picnic?
#Person2#: Sure. Have you checked the weather report?
#Person1#: Yes. It says it will be sunny all day. No sign of rain at all. This is your father's favorite sausage. Sandwiches for you and Daniel.
#Person2#: No, thanks Mom. I'd like some toast and chicken wings.
#Person1#: Okay. Please take some fruit salad and crackers for me.
#Person2#: Done. Oh, don't forget to take napkins disposable plates, cups and picnic blanket.
#Person1#: All set. 

In [16]:
summary = dataset['test'][example_index_to_summarize]['summary']

inputs = tokenizer(few_shot_prompt, return_tensors='pt')
output = tokenizer.decode(model.generate(inputs["input_ids"], max_new_tokens=50)[0],skip_special_tokens=True)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')

Token indices sequence length is longer than the specified maximum sequence length for this model (819 > 512). Running this sequence through the model will result in indexing errors


---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to upgrade his hardware.


Here few shot did not provide much of an improvement over one shot inference. And, anything above 5 or 6 shot will typically not help much, either. Also, we need to make sure that we do not exceed the model's input-context length which, in our case, if 512 tokens. Anything above the context length will be ignored.<br>
However, we can see that feeding in at least one full example (one shot) provides the model with more information and qualitatively improves the summary overall.

We can change the configuration parameters of the generate() method to see a different output from the LLM. A convenient way of organizing the configuration parameters is to use `GenerationConfig` class.
**Available parameters :-** https://huggingface.co/docs/transformers/v4.29.1/en/main_classes/text_generation#transformers.GenerationConfig


In [17]:
generation_config = GenerationConfig(max_new_tokens=50)
# generation_config = GenerationConfig(max_new_tokens=10)
# generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=0.1)
# generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=0.5)
# generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=1.0)

inputs = tokenizer(few_shot_prompt, return_tensors='pt')
output = tokenizer.decode(model.generate(inputs["input_ids"], max_new_tokens=50)[0],skip_special_tokens=True)

print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')

---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to upgrade his hardware.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.



Choosing `max_new_tokens=10` will make the output text too short, so the dialogue summary will be cut.<br>
Putting `do_sample = True` and changing the temperature value we get more flexibility in the output.