# Generative AI Use Case: Summarize Dialogue

This notebook demonstrates how input text influences the output of a language model and introduces the concept of prompt engineering. 

It compares zero-shot, one-shot, and few-shot inferences, showcasing how different prompting techniques can guide the model toward specific tasks. By exploring these methods, you will gain insights into how prompt engineering can enhance the generative capabilities of Large Language Models.

## Outcome overview

We will be leveraging open source data that is available in Hugging Face's datasets library, to summarize conversational data.

## 1. Package installation

These are the required packages to use PyTorch and Hugging Face transformers and datasets.

In [18]:
%pip install --upgrade pip
%pip install torch torchdata -q
%pip install -U datasets -q
%pip install transformers -q

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


Here, we load the datasets, LLM, tokenizer and configurator. 

In [1]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

  from .autonotebook import tqdm as notebook_tqdm


## 2. Summarize Dialogue without Prompt Engineeering

In this section, we will generate a summary of a dialogue with the pre-trained LLLM FLAN-T5 from Hugging Face. The list of available models in Huggign face <code>transformers</code> pacakge can be found [here](https://huggingface.co/docs/transformers/en/index). 

We will be working with the sample dialogues from the "DialogSum" Hugging Face dataset. This dataset contains 10,000+ dialogues with the corresponding manally labelled summaries and topics. 

In [2]:
dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(dataset_name)

Explore the dataset by printing some dialogues with their baseline summaries.

In [3]:
example_indices = [40, 200]

dash_line = "-".join('' for x in range (100))

for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example', i+1)
    print(dash_line)
    print('INPUT DIALOGUE:')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)
    print('BASELINE HUMAN SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)
    print()

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------
Exam

Load the FLAN-T5 model, creating an instance of the <code>AutoModelForSeq2SeqLM</code> class with the <code>.from_pretrained()</code> method.

In [4]:
model_name = 'google/flan-t5-base'

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

  return torch.load(checkpoint_file, map_location="cpu")


To perform encoding and decoding, you need to work with text in a tokenized form. Tokenization is the process of splitting texts into smaller units that can be processed by the LLM models. It converts the raw text into the vector space which can then be processed by the model. 

Download the tokenizer for the FLAN-T5 model using <code>AutoTokenizer.from_pretrained()</code> method. Parameter <code>use_fast</code> switches on fast tokenizer. 

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

0

Test the tokenizer encoding and decoding with a simple sentence. The tokenizer's job is to convert raw text into numbers, where those numbers point to a set of vectors (embeddings) that are then used in mathematical operations such as deep learning, backpropagation, etc.

In [6]:
sentence = "what time is it, Tom?"

sentence_encoded = tokenizer(sentence, return_tensors='pt')

sentence_decoded = tokenizer.decode(
    sentence_encoded['input_ids'][0], 
    skip_special_tokens=True
    )

print('ENDCODED SENTENCE:')
print(sentence_encoded['input_ids'][0])
print('\nDECODED SENTENCE:')
print(sentence_decoded)

ENDCODED SENTENCE:
tensor([ 125,   97,   19,   34,    6, 3059,   58,    1])

DECODED SENTENCE:
what time is it, Tom?


Now, we will let the base LLM summaize a dialogue without any prompt engineering. <b>Prompt engineering</b> is an act of a human changing the <b>prompt</b> (input) to improve the response for a given LLM task. 

In [11]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    inputs = tokenizer(dialogue, return_tensors='pt')
    model_output = model.generate(
            inputs['input_ids'],
            max_new_tokens=50
        )
    no_pe_output = tokenizer.decode(model_output[0], skip_special_tokens=True)

    print(dash_line)
    print('Example', i+1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{dialogue}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{no_pe_output}\n')

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - WITHOUT PROMPT ENGINEERING:
Person1: It's ten to nine.

--------------------------------

## 3. Summarize Dialogue with an Instruction Prompt

Prompt engineering is an important concept in using foundation models for text generation.

### 3.1 Zero Shot Inference with an Instruction Prompt
In order to instruct the model to perform a task - summarize a dialogue - you can take the dialogue and convert it into an instruction prompt. This is often called zeroshot inference. [Refer](https://aws.amazon.com/blogs/machine-learning/zero-shot-prompting-for-the-flan-t5-foundation-model-in-amazon-sagemaker-jumpstart/) to this blog for more information on zero shot learning and why it is important for LLMs. 

In the next cell, we will wrap the dialogue in a descriptive instruction and see how the generated text will change. 

In [26]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    # Develop the prompt with an instruction
    prompt = f'''
Summarize the following conversation. 

{dialogue}
'''
    # Without prompt engineering
    no_pe_inputs = tokenizer(dialogue, return_tensors='pt')
    no_pe_model_output = model.generate(no_pe_inputs['input_ids'], max_new_tokens=50)
    no_pe_output = tokenizer.decode(no_pe_model_output[0], skip_special_tokens=True)
    
    # Input consturcted prompt instead of the dialogue
    zero_shot_inputs = tokenizer(prompt, return_tensors='pt')
    zero_shot_model_output = model.generate(zero_shot_inputs['input_ids'], max_new_tokens=50)
    zero_shot_output = tokenizer.decode(zero_shot_model_output[0], skip_special_tokens=True)

    print(dash_line)
    print('Example', i+1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{dialogue}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{no_pe_output}')
    print(dash_line)
    print(f'MODEL GENERATION - ZERO SHOT:\n{zero_shot_output}\n')

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - WITHOUT PROMPT ENGINEERING:
Person1: It's ten to nine.
---------------------------------

While the response for the first dialogue has improved, it is still not picking up on the nuance of the conversation. The next cell will rephrase the prompt to see how it will influence the generated output. 

In [28]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    # Develop the prompt with an instruction
    prompt_1 = f'''
Summarize the following conversation. 

{dialogue}

'''
    
    prompt_2 = f'''
Dialogue: 

{dialogue}

What was going on?
'''
    # Without prompt engineering
    no_pe_inputs = tokenizer(dialogue, return_tensors='pt')
    no_pe_model_output = model.generate(no_pe_inputs['input_ids'], max_new_tokens=50)
    no_pe_output = tokenizer.decode(no_pe_model_output[0], skip_special_tokens=True)
    
    # Input consturcted prompt instead of the dialogue
    zero_shot_inputs = tokenizer(prompt_1, return_tensors='pt')
    zero_shot_model_output = model.generate(zero_shot_inputs['input_ids'], max_new_tokens=50)
    zero_shot_output = tokenizer.decode(zero_shot_model_output[0], skip_special_tokens=True)

    # Updated prompt
    zero_shot_inputs_2 = tokenizer(prompt_2, return_tensors='pt')
    zero_shot_model_output_2 = model.generate(zero_shot_inputs_2['input_ids'], max_new_tokens=50)
    zero_shot_output_2 = tokenizer.decode(zero_shot_model_output_2[0], skip_special_tokens=True)

    print(dash_line)
    print('Example', i+1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{dialogue}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{no_pe_output}')
    print(dash_line)
    print(f'MODEL GENERATION - ZERO SHOT:\n{zero_shot_output}')
    print(dash_line)
    print(f'MODEL GENERATION - ZERO SHOT with Prompt Engineering:\n{zero_shot_output_2}\n')

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - WITHOUT PROMPT ENGINEERING:
Person1: It's ten to nine.
---------------------------------

The output has improved, however, still missing some nuance. We will try and solve this with a few shot inferencing. 