In [1]:
# !pip install --upgrade pip

In [2]:
# !pip install --disable-pip-version-check torch torchdata --quiet

In [3]:
# !pip install transformers datasets

In [4]:
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM,GenerationConfig

#### Summarize Dialogue without Prompt Engineering

Here, we will be generating a summary of a dialogue with a pre-trained LLM FLAN-T5 from HuggingFace.
Let's upload some simple dialogues from `dialogsum` huggindface dataset. This dataset contains 10k+ dialogues with the
corresponding manually labelled summaries and topics 

In [5]:
huggingface_dataset_name = 'knkarthick/dialogsum'
dataset = load_dataset(huggingface_dataset_name)

In [6]:
# print a couple of dialogues with their baseline summaries
example_indices = [40,200]
dash_line = '-'.join('' for _ in range(100))
for i,index in enumerate(example_indices):
    print(dash_line)
    print('Example',i+1)
    print(dash_line)
    print('INPUT DIALOGUE:')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)
    print('BASELINE HUMAN SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)
    print()
    

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------
Exam

Now, we will improve the summary by our model

Load the `Flan-T5` model and create an instance of the `AutoModelForSeq2SeqLM` class with `.from_pretrained()` method.

In [7]:
model_name = 'google/flan-t5-base'
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

Loading weights:   0%|          | 0/282 [00:00<?, ?it/s]



To perform encoding and decoding, you need the text in tokenized form. Tokenization is the process of splitting text
into smaller units that can be processed by LLM models 

Download the tokenizer for `Flan-T5` model using `AutoTokenizer.from_pretrained()` method. Parameter `use_fast` 
switches on the fast tokenizer.

In [8]:
tokenizer = AutoTokenizer.from_pretrained(model_name,use_fast=True)

These are all from the huggingface transformers library.

#### Test the tokenizer encoding and decoding a sentence

In [9]:
sentence = 'What time is it, Tom?'
sentence_encoded = tokenizer(sentence,return_tensors='pt')

sentence_decoded = tokenizer.decode(sentence_encoded['input_ids'][0],skip_special_tokens=True)
print("Encoded Sentence: ")
print(sentence_encoded['input_ids'][0])
print('\n Decoded Sentence: ')
print(sentence_decoded)

Encoded Sentence: 
tensor([ 363,   97,   19,   34,    6, 3059,   58,    1])

 Decoded Sentence: 
What time is it, Tom?


Now, it's time to explore how well the base LLM summarizes a dialogue without any prompt engineering. Prompt
engineering is an act of human changing the prompt to improve the response for a given task 

In [10]:
for i,index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary =  dataset['test'][index]['summary']
    inputs = tokenizer(dialogue,return_tensors = 'pt')
    output = tokenizer.decode(model.generate(inputs['input_ids'], max_new_tokens=50,)[0], skip_special_tokens=True)
    print(dash_line)
    print('Example',i+1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{dialogue}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n {output}\n')

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - WITHOUT PROMPT ENGINEERING:
 Person1: It's ten to nine.

-------------------------------

It is not doing a very good job.

#### Summarize Dialogue with an Instruction Prompt

Prompt Engineering is an important concept in using foundation models for text generation tasks.

##### Zero Shot Inference with an Instruction Prompt:

In order to instruct the model to perform a task - summarize a dialogue - you can take the dialogue and convert
it into an instruction prompt. This is often called "zero-shot inference".  

In [11]:
for i,index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary =  dataset['test'][index]['summary']
    prompt = f"""
    
    Summarize the following conversation.
    {dialogue}
    Summary:
    """
    #input constructed prompt instead of the dialogue.
    inputs = tokenizer(prompt,return_tensors = 'pt')
    output = tokenizer.decode(model.generate(inputs['input_ids'], max_new_tokens=50,)[0], skip_special_tokens=True)
    print(dash_line)
    print('Example',i+1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{prompt}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - ZERO SHOT:\n {output}\n')

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:


    Summarize the following conversation.
    #Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
    Summary:
    
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - ZERO SHOT:
 The train 

In [12]:
for i,index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary =  dataset['test'][index]['summary']
    prompt = f"""
    
    Dialogue:
    {dialogue}
    What was going on?
    """
    #input constructed prompt instead of the dialogue.
    inputs = tokenizer(prompt,return_tensors = 'pt')
    generation_config = GenerationConfig(max_new_tokens=50, temperature=0.7, top_p=0.9)
    output = tokenizer.decode(model.generate(inputs['input_ids'], generation_config=generation_config)[0], skip_special_tokens=True)
    print(dash_line)
    print('Example',i+1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{prompt}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - ZERO SHOT WITH TEMPERATURE AND TOP-P SAMPLING:\n {output}\n')

The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:


    Dialogue:
    #Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
    What was going on?
    
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - ZERO SHOT WITH TEMPERATURE AND TOP-P SAM

#### Summarize dialogue with One Shot and Few Shot Inference

One shot and few shot inference are the practices of providing an LLM with either one or more full examples of prompt-response
pairs that match your task before your actual prompt that you want completed. This is called "in-context" learning and puts
your model into a state that understands your specific task. 

#####  One Shot Inference:

In [13]:
def make_prompt(example_indices_full, example_index_to_summarize):
    prompt = ''
    for idx in example_indices_full:
        dialogue = dataset['test'][idx]['dialogue']
        summary = dataset['test'][idx]['summary']
        # The stop sequence '{summary}\n\n\n' is important for FLAN-T5. Other models may have their own preferred
        # stop sequence        
        prompt += f"""
        Dialogue:
        {dialogue}
        What was going on?
        {summary}
        """
    dialogue_to_summarize = dataset['test'][example_index_to_summarize]['dialogue']
    prompt += f"""
    Dialogue:
    {dialogue_to_summarize}
    What was going on?
    """
    return prompt

Construct the prompt to perform one shot inference

In [14]:
example_indices_full = [40]
example_index_to_summarize = 200
one_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)
print('ONE SHOT PROMPT:')
print(one_shot_prompt)

ONE SHOT PROMPT:

        Dialogue:
        #Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
        What was going on?
        #Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
        
    Dialogue:
    #Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It would allow you to make up your own flyers and banners for advertising.
#Person2#: That would be a definite bonus.
#Person1#: You might also want to upgrade your hardware because it is pretty outdated now.
#Person2#: How can we do that?
#Person1#: You'd prob

Now, pass the prompt to perform the one shot inference.

In [15]:
summary =  dataset['test'][example_index_to_summarize]['summary']
inputs = tokenizer(one_shot_prompt,return_tensors = 'pt')
output = tokenizer.decode(model.generate(inputs['input_ids'], max_new_tokens=50,)[0], skip_special_tokens=True)
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}')
print(dash_line)
print(f'MODEL GENERATION - ONE SHOT:\n{output}\n')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - ONE SHOT:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to add a CD-ROM drive.



prompt engineering, zero shot, one shot, few shot inferences are the first step always if we have a hub of models

##### Few Shot Inferences

In [16]:
# we are helping the model
example_indices_full = [40,80,120]
example_index_to_summarize = 200
few_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)
print('FEW SHOT PROMPT:')
print(few_shot_prompt)

FEW SHOT PROMPT:

        Dialogue:
        #Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
        What was going on?
        #Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
        
        Dialogue:
        #Person1#: May, do you mind helping me prepare for the picnic?
#Person2#: Sure. Have you checked the weather report?
#Person1#: Yes. It says it will be sunny all day. No sign of rain at all. This is your father's favorite sausage. Sandwiches for you and Daniel.
#Person2#: No, thanks Mom. I'd like some toast and chicken wings.
#Person1#: Okay. Please take some fruit salad and crackers for me.
#Person2#: Done. Oh, don't forget to take napk

In [17]:
generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=0.7, top_p=0.9)

In [18]:
summary = dataset['test'][example_index_to_summarize]['summary']
inputs = tokenizer(few_shot_prompt,return_tensors = 'pt')
output = tokenizer.decode(model.generate(inputs['input_ids'], generation_config=generation_config,)[0],   skip_special_tokens=True)
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}')
print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}\n')

Token indices sequence length is longer than the specified maximum sequence length for this model (818 > 512). Running this sequence through the model will result in indexing errors


---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
#1 is ready, is interested- will introduce an optional technology as per #2, will update in his or at her request. Will add on their new business services on disc/media only; #2 already gets PC software softwares to enhance skills in his



In this case, few shot did not provide much of an improvement over one shot inference. And, anything above 5 or 6 shot will typically not help much, either. Also, you need to ensure that you do not exceed the model's input-context length which, in 
our case, is 512 tokens. Anything above the context length will be ignored. 