# Generative AI Task: Summarise Dialogue

This notebook has been used to perform the task of dialogue summarisation using Generative AI. Through the use of different techniques to the inference process, the exploration of how different prompts (input text) affect the completion (output) of the model, was performed. Prompt engineering was carried out, by comparing zero shot, one shot and few shot inferences, with the intention to see how to best enhance the generative output of the Large Language Model.

### Install the required dependencies

Given the scope of the task, we need to install packages to use PyTorch and Hugging Face transformers and datasets

In [1]:
!pip install torch==1.13.1
!pip install torchdata==0.5.1 --quiet
!pip install transformers==4.27.2
!pip install datasets==2.11.0 --quiet

Collecting torch==1.13.1
  Downloading torch-1.13.1-cp310-cp310-manylinux1_x86_64.whl (887.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m887.5/887.5 MB[0m [31m974.3 kB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-runtime-cu11==11.7.99 (from torch==1.13.1)
  Downloading nvidia_cuda_runtime_cu11-11.7.99-py3-none-manylinux1_x86_64.whl (849 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m849.3/849.3 kB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cudnn-cu11==8.5.0.96 (from torch==1.13.1)
  Downloading nvidia_cudnn_cu11-8.5.0.96-2-py3-none-manylinux1_x86_64.whl (557.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m557.1/557.1 MB[0m [31m993.7 kB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cublas-cu11==11.10.3.66 (from torch==1.13.1)
  Downloading nvidia_cublas_cu11-11.10.3.66-py3-none-manylinux1_x86_64.whl (317.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m317.1/317

In [3]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

In [1]:
!pip install -U datasets



We need to load a few resources that will be used: datasets, Large Language Model (LLM), tokeniser and configurator

### Summarise dialogue without Prompt Engineering

In this use case, a summary of the dialogue will be generated with the pre-trained Large Language Model (LLM) `FLAN-T5` from Hugging Face. The list of available models in the Hugging Face `transformers` package can be found [here](https://huggingface.co/docs/transformers/index)

We can now upload some simple dialogues from the [DialogSum](https://huggingface.co/datasets/knkarthick/dialogsum) Hugging Face dataset. This dataset contains over 10,000 dialogues with the corresponding manually labelled summaries and topics

In [4]:
huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/4.65k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/442k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

Now that we have loaded the dataset, we can print a few dialogues from the dataset, together with their baseline summaries

In [18]:
 example_dialogue_indice_slice: list = [40, 200,35,10,25]

dashed_line_ouptut_divider = '-'.join('' for x in range(116))

for example, index in enumerate(example_dialogue_indice_slice):
    print(dashed_line_ouptut_divider)
    print(f"Example: {example + 1}")
    print(dashed_line_ouptut_divider)

    print("INPUT DIALOGUE:")
    print(dataset['test'][index]['dialogue'])
    print(dashed_line_ouptut_divider)

    print("BASELINE HUMAN SUMMARY:")
    print(dataset['test'][index]['summary'])
    print(dashed_line_ouptut_divider)
    print()

-------------------------------------------------------------------------------------------------------------------
Example: 1
-------------------------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
-------------------------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
-------------------------------------------------------------------------------------------------------------------

---------------------------------------

Now we can load the [FLAN-T5 model](https://huggingface.co/docs/transformers/model_doc/flan-t5), and create an instance of the `AutoModelForSeq2SeqLM` class with the `.from_pretrained()` method

In [19]:
model_name = "google/flan-t5-base"

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

Before we can perform endcoding and decoding, we need to tokenize the text. `Tokenisation` is the process of splitting the texts into smaller units that can be processed by LLM models. This means that converting each word into a number representing a position in a dictionary of all the possible words that the model can work with.

We can download the tokenizer for the `FLAN-T5` model using `AutoTOkenizer.from_pretrained()` method. The `use_fast` parameter can be used to use a fast Rust-based tokenizer if supported, otherwise a normal Python-based tokeniser is returned instead

In [20]:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

We can now test that the tokenizer can encode and decode a simple sentence

In [21]:
sentence = "What time is it, Tom?"

# np = Numpy
# tf = TensorFlow
# pt = PyTorch

encoded_sentence = tokenizer(sentence, return_tensors="pt")

decoded_sentence = tokenizer.decode(
        encoded_sentence["input_ids"][0],
        skip_special_tokens=True
    )

print("ENCODED SENTENCE:")
print(encoded_sentence["input_ids"][0])
print("\nDECODED SENTENCE:")
print(decoded_sentence)


ENCODED SENTENCE:
tensor([ 363,   97,   19,   34,    6, 3059,   58,    1])

DECODED SENTENCE:
What time is it, Tom?


Now we can assess how well the base LLM summarises a dialogue without any prompt engineering

In [30]:
for example, index in enumerate(example_dialogue_indice_slice):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    inputs = tokenizer(dialogue, return_tensors="pt")
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )

    print(dashed_line_ouptut_divider)
    print(f"Example: {example + 1}")
    print(dashed_line_ouptut_divider)

    print(f"INPUT PROMPT:\n{dialogue}")
    print(dashed_line_ouptut_divider)
    print(f"BASELINE HUMAN SUMMARY:\n{summary}")
    print(dashed_line_ouptut_divider)

    print(f"MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n")

-------------------------------------------------------------------------------------------------------------------
Example: 1
-------------------------------------------------------------------------------------------------------------------
INPUT PROMPT:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
-------------------------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
-------------------------------------------------------------------------------------------------------------------
MODEL GENERATION - WITHOUT PROMPT ENGINEER

It is clear that the guesses made by the model make some sense, but it doesn't seem exactly right. We can continue to use Prompt engineering to help here

### Zero shot inference with an instruction Prompt

In [23]:
for example, index in enumerate(example_dialogue_indice_slice):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
    """

    # Input constructed prompt instead of the dialogue
    inputs = tokenizer(prompt, return_tensors="pt")
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )

    print(dashed_line_ouptut_divider)
    print("Example: {example + 1}")
    print(dashed_line_ouptut_divider)

    print(f"INPUT PROMPT:\n{prompt}")
    print(dashed_line_ouptut_divider)
    print(f"BASELINE HUMAN SUMMARY:\n{summary}")
    print(dashed_line_ouptut_divider)

    print(f"MODEL GENERATION - ZERO SHOT:\n{output}\n")

-------------------------------------------------------------------------------------------------------------------
Example: {example + 1}
-------------------------------------------------------------------------------------------------------------------
INPUT PROMPT:

Summarize the following conversation.

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

Summary:
    
-------------------------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
-------------------------------------------------------------------------------------------

### Zero shot inference with a Prompt template from FLAN-T5

We can use a [pre-built prompt template](https://github.com/google-research/FLAN/blob/main/flan/v2/templates.py) from `FLAN-T5` to help

In [24]:
for example, index in enumerate(example_dialogue_indice_slice):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    prompt = f"""
Dialogue:

{dialogue}

What was going on?
"""

    inputs = tokenizer(prompt, return_tensors="pt")
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )

    print(dashed_line_ouptut_divider)
    print("Example: {example + 1}")
    print(dashed_line_ouptut_divider)

    print(f"INPUT PROMPT:\n{prompt}")
    print(dashed_line_ouptut_divider)
    print(f"BASELINE HUMAN SUMMARY:\n{summary}")
    print(dashed_line_ouptut_divider)

    print(f"MODEL GENERATION - ZERO SHOT:\n{output}\n")

-------------------------------------------------------------------------------------------------------------------
Example: {example + 1}
-------------------------------------------------------------------------------------------------------------------
INPUT PROMPT:

Dialogue:

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

What was going on?

-------------------------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
-----------------------------------------------------------------------------------------------------------------

### One shot inference

We can build a function that takes a list of `example_indices_full`, generates a prompt with full exapmples, then at the end, it will append the prompt which you want the model to complete (`example_index_to_summarize`)

In [25]:
def make_prompt(example_indices_full, example_index_to_summarize):
    prompt = ""
    for index in example_indices_full:
        dialogue = dataset['test'][index]['dialogue']
        summary = dataset['test'][index]['summary']

        # The stop sequence '{summary}\n\n\n' is important for FLAN-T5...
        # ...other models may have their own preferred stop sequence
        prompt += f"""
Dialogue:

{dialogue}

What was going on?
{summary}


"""

    dialogue = dataset['test'][example_index_to_summarize]['dialogue']

    prompt += f"""
Dialogue:

{dialogue}

What was going on?
"""

    return prompt

We can now construct the prompt to perform one shot inference

In [26]:
example_indices_full = [40]
example_index_to_summarize = 200

one_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)

print(one_shot_prompt)


Dialogue:

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

What was going on?
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.



Dialogue:

#Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It would allow you to make up your own flyers and banners for advertising.
#Person2#: That would be a definite bonus.
#Person1#: You might also want to upgrade your hardware because it is pretty outdated now.
#Person2#: How can we do that?
#Person1#: You'd probably need a faster processor, to begin with. And you also ne

Now we can pass the prompt above to perform the one shot inference

In [27]:
summary = dataset['test'][example_index_to_summarize]['summary']

inputs = tokenizer(one_shot_prompt, return_tensors="pt")
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0],
    skip_special_tokens=True
)

print(dashed_line_ouptut_divider)
print(f"BASELINE HUMAN SUMMARY:\n{summary}\n")
print(dashed_line_ouptut_divider)
print(f"MODEL GENERATION - ONE SHOT:\n{output}")

-------------------------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

-------------------------------------------------------------------------------------------------------------------
MODEL GENERATION - ONE SHOT:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to add a CD-ROM drive.


### Few shot inference

We can add two more dialogue-summary pairs to the prompt, before performing few shot inference

In [28]:
example_indices_full = [40, 80, 120]
example_index_to_summarize = 200

few_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)

print(few_shot_prompt)


Dialogue:

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

What was going on?
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.



Dialogue:

#Person1#: May, do you mind helping me prepare for the picnic?
#Person2#: Sure. Have you checked the weather report?
#Person1#: Yes. It says it will be sunny all day. No sign of rain at all. This is your father's favorite sausage. Sandwiches for you and Daniel.
#Person2#: No, thanks Mom. I'd like some toast and chicken wings.
#Person1#: Okay. Please take some fruit salad and crackers for me.
#Person2#: Done. Oh, don't forget to take napkins disposable plates, cups and picnic blanket.
#Person1#: All set. 

As in the case of one shot inference, we can now pass the prompt to perform few shot inference

In [29]:
summary = dataset['test'][example_index_to_summarize]['summary']

inputs = tokenizer(few_shot_prompt, return_tensors="pt")
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0],
    skip_special_tokens=True
)

print(dashed_line_ouptut_divider)
print(f"BASELINE HUMAN SUMMARY:\n{summary}\n")
print(dashed_line_ouptut_divider)
print(f"MODEL GENERATION - FEW SHOT:\n{output}")

Token indices sequence length is longer than the specified maximum sequence length for this model (819 > 512). Running this sequence through the model will result in indexing errors


-------------------------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

-------------------------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to upgrade his hardware.


Given the output from the above cell, we can see that few shot inference did not provide much of an improvement over one shot inference. It is also important to remember to not exceed the model's input context length (`512 tokens`), as anything above the context length will be ignored

However, it is clear that passing in one full example (`one shot inference`), provides the model with more information to help improve the overall completion

### Changing Generative configuration parameters

We can change some of the configuration parameters, to influence the way that the model makes the final decision about next word generation. We can change parameters such as `do_sample`, `temperature`, `top_k` & `top_p`

In [None]:
generation_config = GenerationConfig(max_new_tokens=50)
# generation_config = GenerationConfig(max_new_tokens=10)
# generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=0.1)
# generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=0.5)
# generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=1.0)

inputs = tokenizer(few_shot_prompt, return_tensors="pt")
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        generation_config=generation_config,
    )[0],
    skip_special_tokens=True
)

print(dashed_line_ouptut_divider)
print(f"MODEL GENERATION - FEW SHOT:\n{output}")
print(dashed_line_ouptut_divider)
print(f"BASELINE HUMAN SUMMARY:\n{summary}\n")

-------------------------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to upgrade his hardware.
-------------------------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

