# Dependecies

In [1]:
%pip install --upgrade pip
%pip install --disable-pip-version-check \
    torch==1.13.1 \
    torchdata==0.5.1 --quiet

%pip install \
    transformers==4.27.2 \
    datasets==2.15.0  --quiet

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


# Import

In [2]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

# Load FLAN

Load the [FLAN-T5 model](https://huggingface.co/docs/transformers/model_doc/flan-t5), creating an instance of the `AutoModelForSeq2SeqLM` class with the `.from_pretrained()` method. 

In [3]:
model_name='google/flan-t5-base'

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

In [4]:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

# Load HuggingFace dialogsum database

In [5]:
huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)

# Predict 1, 2, 3

You can change the configuration parameters of the `generate()` method to see a different output from the LLM. So far the only parameter that you have been setting was `max_new_tokens=50`, which defines the maximum number of tokens to generate. A full list of available parameters can be found in the [Hugging Face Generation documentation](https://huggingface.co/docs/transformers/v4.29.1/en/main_classes/text_generation#transformers.GenerationConfig). 

Putting the parameter `do_sample = True`, you activate various decoding strategies which influence the next token from the probability distribution over the entire vocabulary. You can then adjust the outputs changing `temperature` and other parameters (such as `top_k` and `top_p`). 


In [6]:
example_indices = [40, 200]
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']
    
    inputs = tokenizer(dialogue, return_tensors='pt')
# decode   
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"], 
            max_new_tokens=50,
        )[0], 
        skip_special_tokens=True
    )
# decode arguments
    output2 = tokenizer.decode(
        model.generate(
            inputs["input_ids"], 
            max_new_tokens=20, do_sample=True, temperature=1.0,
        )[0], 
        skip_special_tokens=True
    )

# prompt
    prompt = f"""
Dialogue:

{dialogue}

What was going on?
"""

    output_prompt = tokenizer.decode(
        model.generate(
            inputs["input_ids"], 
            max_new_tokens=50,
        )[0], 
        skip_special_tokens=True
    )    
    print("==========================================================================")
    print('Example ', i + 1)
    print("--------------------------------------------------------------------------")
    print(f'INPUT PROMPT:\n{dialogue}')
    print("--------------------------------------------------------------------------")
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print("--------------------------------------------------------------------------")
    print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING 1:\n{output}\n')
    print("--------------------------------------------------------------------------")
    print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING 2:\n{output2}\n')
    print("--------------------------------------------------------------------------")
    print(f'MODEL GENERATION - ZERO SHOT:\n{output_prompt}\n')

Example  1
--------------------------------------------------------------------------
INPUT PROMPT:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
--------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
--------------------------------------------------------------------------
MODEL GENERATION - WITHOUT PROMPT ENGINEERING 1:
Person1: It's ten to nine.

--------------------------------------------------------------------------
MODEL GENERATION - WITHOUT PROMPT ENGINEERING 2:
Person1#: It looks like it will take you about 20 minutes if you catch the train

#  One/ Multi Shot Inference

Let's build a function that takes a list of `example_indices_full`, generates a prompt with full examples, then at the end appends the prompt which you want the model to complete (`example_index_to_summarize`).  You will use the same FLAN-T5 prompt template from section [3.2](#3.2). 

In [7]:
def make_prompt(example_indices_full, example_index_to_summarize):
    prompt = ''
    for index in example_indices_full:
        dialogue = dataset['test'][index]['dialogue']
        summary = dataset['test'][index]['summary']
        
        # The stop sequence '{summary}\n\n\n' is important for FLAN-T5. Other models may have their own preferred stop sequence.
        prompt += f"""
Dialogue:

{dialogue}

What was going on?
{summary}


"""
    
    dialogue = dataset['test'][example_index_to_summarize]['dialogue']
    
    prompt += f"""
Dialogue:

{dialogue}

What was going on?
"""
        
    return prompt

## Shots test

In [8]:
example_indices_full = [40, 80, 120]
example_index_to_summarize = 200

shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)

inputs = tokenizer(shot_prompt, return_tensors='pt')
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0], 
    skip_special_tokens=True
)

print("--------------------------------------------------------------------------")
dialogue = dataset['test'][index]['dialogue']
print("--------------------------------------------------------------------------")
print(f'INPUT PROMPT:\n{dialogue}')
print("--------------------------------------------------------------------------")
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print("--------------------------------------------------------------------------")
print(f'MODEL GENERATION - ONE SHOT:\n{output}')

Token indices sequence length is longer than the specified maximum sequence length for this model (819 > 512). Running this sequence through the model will result in indexing errors


--------------------------------------------------------------------------
--------------------------------------------------------------------------
INPUT PROMPT:
#Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It would allow you to make up your own flyers and banners for advertising.
#Person2#: That would be a definite bonus.
#Person1#: You might also want to upgrade your hardware because it is pretty outdated now.
#Person2#: How can we do that?
#Person1#: You'd probably need a faster processor, to begin with. And you also need a more powerful hard disc, more memory and a faster modem. Do you have a CD-ROM drive?
#Person2#: No.
#Person1#: Then you might want to add a CD-ROM drive too, because most new software programs are coming out on Cds.
#Person2#: That sounds great. Thanks.
--------------------------------------------------------------------