# Generative AI use Case: Summarise Dialogue
welcome to the practicle side of this course. In this lab you will do the dialogue summerisation task using generative AI. You will explore how to input text affects the output of th model, and perform prompt engineering to direct it towrds the task you need. By comparing zero shot, one shot and few shot interfeence, you will take the first step towards prompt engineering and see how it can enhance the genrative output of LLMs

# Table of Content 

1. Set up Kernel and required Dependencies
2. Summarise Dialogue without prompt
3. Summarise Dialogue with an instruction prompt
    - 3.1 zEROSHOT INTERFERENCE WITH AN instruction prompt
    - 3.2 zEROSHOT INTERFERENCE WITH the prompt template from FLAN-T5
4. Summarise Dialogue with One Shot and Few Shot interference
    - 4.1 One Shot interference
    - 4.2 Few Shot interference
5. Generative Configuration Parameters for inference

## 1. Set up Kernel and required Dependencies

In [None]:
%pip install --upgrade pip
%pip install --disable-pip-version-check \
    tokenizers==0.12.1 \
    torch==1.13.1 \
    torchdata==0.5.1 --quiet

%pip install \
    transformers==4.27.2 \
    datasets==2.11.0 --quiet


Load the datasets, LLM, tokenizer annd configurator.

In [4]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

## 2. Summarise Dialogue without prompt

In thie use case you will be generating a summary of a dialogue with the pre-trined LLM FLAN-T5 from hugging face.

The dataset contains 10,000+ dialogues with the corresponding manually labelled summeries and topics

In [None]:
hugginface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(hugginface_dataset_name)

Print a couple of dialogues with thier baseline summeries

In [7]:
example_indices = [40,200]

dash_line = '-'.join('' for x in range(100))

for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print('INPUT DIALOGUE: ')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)
    print('BASELINE HUMAn SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)
    print()


---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT DIALOGUE: 
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMA SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------
Exa

Laod the FLAN-T5 model, creating an instance of the AutoModelForSeq2SeqLM class with the from_pretrained() method

In [8]:
model_name= 'google/flan-t5-base'

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

To perform encoding and decoding you need to work with the text in a tokenized for. tokenization is the process of splitting text into smaller units that can be processed by the LLMs models.

Doenalod the tokenizer for the FLAN-T5 model using AutoTokenizer.from_pretrained() method. Parameter use_fast switches on the fast tokenizer.

In [9]:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Test the tokenizer encoding and decoding a sample sentence:

In [10]:
sentence = "What time is it, Tom?"

sentence_encoded = tokenizer(sentence, return_tensors='pt')

sentence_decoded = tokenizer.decode(
    sentence_encoded["input_ids"][0],
    skip_special_tokens=True
    )

print('ENCODED SENTENCE: ')
print(sentence_encoded["input_ids"][0])
print('\nDECODED SENTENCE: ')
print(sentence_decoded)

ENCODED SENTENCE: 
tensor([ 363,   97,   19,   34,    6, 3059,   58,    1])

DECODED SENTENCE: 
What time is it, Tom?


Now it is time to explore how well the base LLM summerizes a dialogue without any prompt engineering. Prompt Engineering is an act of a human changing the prompt (input) to improve the response for a given task.

In [12]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    inputs = tokenizer(dialogue, return_tensors='pt')
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"],
            max_new_tokens=50
        )[0],
        skip_special_tokens=True
    )

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{dialogue}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - WITHOUT PROMPT ENGINEERING:
Person1: It's ten to nine.

-------------------------------

# 3. Summarise Dialogue with an instruction prompt

Prompt engineering is an important concept in using foundation models for text generation.

## 3.1 zEROSHOT INTERFERENCE WITH AN instruction prompt

In order to instruct the model to perform a task - summeraise a dialogue - you can take the dialogue and convet it into an instruction prompt. This is often called zero shot inference. 

Wrap the dialogue in a descriptive instrcutin and see how the generated text will change:

In [13]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    prompt = f"""
    Summarise the following conversation.

    {dialogue}

    Summary:
    """

    inputs = tokenizer(prompt, return_tensors='pt')
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"],
            max_new_tokens=50
        )[0],
        skip_special_tokens=True
    )

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{dialogue}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - WITHOUT PROMPT ENGINEERING:
The train is about to leave.

-----------------------------

This is a much better But the model still does not pick up on the nauce of the conversation though.

EXERCISE:
    - Experiment with the prompt text and see how the interfernce will be changed.
    - Try to rephrase the beginning of the prompt text from Summarize the following conversation. to something different and see how it influence the generated output.

In [14]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    prompt = f"""
    Dialogue:

    {dialogue}

    What is going on?
    """

    inputs = tokenizer(prompt, return_tensors='pt')
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"],
            max_new_tokens=50
        )[0],
        skip_special_tokens=True
    )

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{dialogue}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - WITHOUT PROMPT ENGINEERING:
Tom is late for the train.

-------------------------------

# 4. Summarise Dialogue with One Shot and Few Shot interference

One shot and few shot inference are the practice of providing an LLM with either one or more ful examples of prompt-reponse pairs that match your task - before your actual prompt that you want completed. This is called "in-context Learning" and puts your model into a state that understands your specific task.


## 4.1 One Shot interference
Let's build a function that takes a list of example_indices_full, generates a prompt withfull examples, then at the end appends the prompt which you want the model to complete (example_index_to_summarise). you will use the same FLAN-T5 prompt template from section 3.2

In [59]:
def make_prompt(example_indices_full, example_index_to_summarise):
    prompt = ''
    for index in example_indices_full:
        #print(index)
        dialogue = dataset['test'][index]['dialogue']
        summary = dataset['test'][index]['summary']

        # The stop sequence '{summary}\n\n\n' is important for FLAN-T5. Other models may have their own preferred stop sequence.

        prompt += f"""
Dialogue {index}:

{dialogue}

What is going on?
{summary}
    
        """

        dialogue = dataset['test'][example_index_to_summarise]['dialogue']

        prompt += f"""
Dialogue:

{dialogue}

What is going on?
        """

    return prompt

Construct the prompt to perform one shot inference:

In [52]:
example_indices_full = [40]
example_index_to_summarise = 200

one_shot_prompt = make_prompt(example_indices_full, example_index_to_summarise)

print(one_shot_prompt)

40

    Dialogue:

    #Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

    What is going on?
    #Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
    
        
    Dialogue:

    #Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It would allow you to make up your own flyers and banners for advertising.
#Person2#: That would be a definite bonus.
#Person1#: You might also want to upgrade your hardware because it is pretty outdated now.
#Person2#: How can we do that?
#Person1#: You'd probably need a faster proc

Now pass this prompt to perform the one shot inference:

In [53]:
summary = dataset['test'][example_index_to_summarise]['summary']

inputs = tokenizer(one_shot_prompt, return_tensors='pt')
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50
    )[0],
    skip_special_tokens=True
)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}')
print(dash_line)
print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - WITHOUT PROMPT ENGINEERING:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to add a CD-ROM drive.



## 4.2 Few Shot inference

Let's explore few shot inference by adding two more full dialogue-summary pairs to your prompt

In [60]:
example_indices_full = [40, 80, 120]
example_index_to_summarise = 200

few_shot_prompt = make_prompt(example_indices_full, example_index_to_summarise)

print(few_shot_prompt)


Dialogue 40:

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

What is going on?
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
    
        
Dialogue:

#Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It would allow you to make up your own flyers and banners for advertising.
#Person2#: That would be a definite bonus.
#Person1#: You might also want to upgrade your hardware because it is pretty outdated now.
#Person2#: How can we do that?
#Person1#: You'd probably need a faster processor, to begin with. An

Now pass this prompt to perform a few shot inference:

In [61]:
summary = dataset['test'][example_index_to_summarise]['summary']

inputs = tokenizer(few_shot_prompt, return_tensors='pt')
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50
    )[0],
    skip_special_tokens=True
)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}')
print(dash_line)
print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - WITHOUT PROMPT ENGINEERING:
#Person1 wants to upgrade his computer. #Person2 wants to upgrade his hardware. #Person1 wants to upgrade his computer.



# 5. Generative Configuration Parameters for inference

You can change the configuratio parameters of the generate() method to see a different output from the LLM. So far the only parameter that you have been setting was max_new_tokens=50, which defines the maximum number of tokens to generate.

A convient way of organizing the configuration parameters is use GenerationConfig class

Exercise:

Change the configuration parameters to investigate their influence on the output.

Putting the parameter do_sample=True, you activate various decoding strategies which influence the next token from the probabilty distribution over the entire vocabulary, you can then adjust the outputs changing tempature and opther parameters (such as top_k anf top_k)

Un comment the lines in the cell below and rerun the code. Try to analyse the reults.

In [62]:
#generation_config = GenerationConfig(max_new_tokens=50)
#generation_config = GenerationConfig(max_new_tokens=10)
#generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, tempature=0.1)
#generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, tempature=0.5)
generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, tempature=2.0)

inputs = tokenizer(few_shot_prompt, return_tensors='pt')
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        generation_config=generation_config,
    )[0],
    skip_special_tokens=True
)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}')
print(dash_line)
print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - WITHOUT PROMPT ENGINEERING:
#Person1 wants to install a more advanced version of software on his computer. The person who recommended would think about this.

