# Exploring Generative AI for Dialogue Summarization
## Effective Prompt Engineering for Dialogue Summarization with Generative AI Google FLAN-T5-Base (Part 1)
Let's delve into the practical side of our study. In this notebook, we'll be focusing on the dialogue summarization task using generative AI. Our objective is to observe how adjustments to the input text impact the model's output. Through prompt engineering, we'll guide the model to align with our specific task requirements. By experimenting with zero shot, one shot, and few shot inferences, we'll initiate the journey into prompt engineering, witnessing firsthand its influence on enhancing the generative output of Large Language Models.

# Table of Contents

- [ 1 - Configuring Kernel and Installing Dependencies](#1)
- [ 2 - Dialogue Summarization without Prompt Engineering](#2)
- [ 3 - Summarizing Dialogue Using an Instruction Prompt](#3)
  - [ 3.1 - Zero Shot Inference Using an Instruction Prompt](#3.1)
  - [ 3.2 - Zero Shot Inference Using the FLAN-T5 Prompt Template](#3.2)
- [ 4 - Summarizing Dialogue Using One Shot and Few Shot Inference](#4)
  - [ 4.1 - One Shot Inference](#4.1)
  - [ 4.2 - Few Shot Inference](#4.2)
- [ 5 - Configuration Parameters for Generative Inference](#5)


# 1 - Configuring Kernel and Installing Dependencies
Let's set up the kernel and install the necessary packages to leverage PyTorch, Hugging Face transformers, and datasets.

Note: Executing this cell may require a few minutes.


In [1]:
%pip install --upgrade pip
%pip install --disable-pip-version-check \
    torch==1.13.1 \
    torchdata==0.5.1 --quiet

%pip install \
    transformers==4.27.2 \
    datasets==2.11.0  --quiet

Collecting pip
  Downloading pip-23.2.1-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m36.9 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.1.2
    Uninstalling pip-23.1.2:
      Successfully uninstalled pip-23.1.2
Successfully installed pip-23.2.1
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
apache-beam 2.46.0 requires dill<0.3.2,>=0.3.1.1, but you have dill 0.3.6 which is incompatible.
apache-beam 2.46.0 requires pyarrow<10.0.0,>=3.0.0, but you have pyarrow 11.0.0 which is incompatible.
pathos 0.3.1 requires dill>=0.3.7, but you have dill 0.3.6 which is

Load the datasets, Large Language Model (LLM), tokenizer, and configurator. Don't stress if you haven't grasped all these components yet; they'll be explained and discussed later in the notebook.

In [2]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

In [3]:
# Specify the model name
model_name='google/flan-t5-base'

# Load the model
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

To handle encoding and decoding, it's crucial to engage with text in a tokenized format. Tokenization is the practice of breaking down texts into smaller units, facilitating processing by LLM models.

Retrieve the tokenizer for the FLAN-T5 model by employing the AutoTokenizer.from_pretrained() method. The use_fast parameter activates the fast tokenizer. Currently, we won't delve into the intricacies of this setting, but you can explore the tokenizer parameters further in the [documentation](https://huggingface.co/docs/transformers/v4.28.1/en/model_doc/auto#transformers.AutoTokenizer).

In [4]:
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Test the tokenizer encoding and decoding a simple sentence:

In [5]:
# Define a test sentence
sentence = "This is a test sentence."

# Encode the sentence using the tokenizer, returning PyTorch tensors
sentence_encoded = tokenizer(sentence, return_tensors='pt')

# Decode the encoded sentence, skipping special tokens
sentence_decoded = tokenizer.decode(
        sentence_encoded["input_ids"][0], 
        skip_special_tokens=True
    )

# Print the encoded sentence's representation
print('ENCODED SENTENCE:')
print(sentence_encoded["input_ids"][0])

# Print the decoded sentence
print('\nDECODED SENTENCE:')
print(sentence_decoded)




ENCODED SENTENCE:
tensor([ 100,   19,    3,    9,  794, 7142,    5,    1])

DECODED SENTENCE:
This is a test sentence.


Let's dive into assessing how effectively the base LLM summarizes a dialogue without incorporating any prompt engineering. 
In simpler terms, **prompt engineering** involves humans tweaking the input to enhance the model's response for a specific task.

<a name='3'></a>
# 3 - Summarizing Dialogue Using an Instruction Prompt



In [6]:
import pandas as pd
dataset=pd.read_csv('/kaggle/input/noaa-incidents-oilspill/incidents.csv')

In [7]:
oil_spill_desc=dataset['description']

In [8]:
oil_spill_desc=oil_spill_desc.values.astype('str')

In [9]:
oil_spill_desc

array(['Late on August 31, 2023, a crude oil spill occurred at Port Manatee in Tampa Bay. The estimated volume of release oil is 3500 gallons. Most of the oil is within the Port Manatee Basin but some has escaped the basin and entered Tampa Bay.  USCG is on the scene investigating. Currently, the source is uncertain. Imagery from NOAA NGS post-hurricane Idalia flights show some sheen in the Port basin, This imagery was obtained sometime during the morning on 1 Sept, 2023. NOAA is preparing a trajectory forecast, oil fate analysis, Resources at Risk, and an initial draft emergency consultation form for the USCG.',
       "A drug runner sailboat modified to operate as a semi-submarine grounded and was abandoned on Mona Island, Puerto Rico on or about 31 August, 2023. The vessel is leaking diesel fuel and is in prime sea turtle habitat. NOAA's Scientific Support Coordinator has discussed the issue with NOAA's Office of Protected Resources and the US Coast Guard.",
       'On 28-AUG-2023, 

In [10]:
dash_line='----------------------------------------------------------------------------'

In [None]:
# Iterate through example indices, where each index represents a specific example
for incident in oil_spill_desc:
        
    # Construct an instruction prompt for summarizing the dialogue 
    prompt = f"""
Summarize the following incident.

{incident}

Summary:
    """

    # Tokenize the constructed prompt and convert it to PyTorch tensors
    inputs = tokenizer(prompt, return_tensors='pt')
    
    # Generate an output using the model, limiting the new tokens to 50
    # This uses the LLM to generate a summary of the dialogue with the constructed prompt
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"], 
            max_new_tokens=50,
        )[0], 
        skip_special_tokens=True
    )
    
    # Show the results

    print(dash_line)
    print(f'INPUT PROMPT:\n{prompt}')
    print(dash_line)    
    print(f'MODEL GENERATION - ZERO SHOT:\n{output}\n')

----------------------------------------------------------------------------
INPUT PROMPT:

Summarize the following incident.

Late on August 31, 2023, a crude oil spill occurred at Port Manatee in Tampa Bay. The estimated volume of release oil is 3500 gallons. Most of the oil is within the Port Manatee Basin but some has escaped the basin and entered Tampa Bay.  USCG is on the scene investigating. Currently, the source is uncertain. Imagery from NOAA NGS post-hurricane Idalia flights show some sheen in the Port basin, This imagery was obtained sometime during the morning on 1 Sept, 2023. NOAA is preparing a trajectory forecast, oil fate analysis, Resources at Risk, and an initial draft emergency consultation form for the USCG.

Summary:
    
----------------------------------------------------------------------------
MODEL GENERATION - ZERO SHOT:
The USCG is investigating a crude oil spill at Port Manatee in Tampa Bay.

-----------------------------------------------------------------

Token indices sequence length is longer than the specified maximum sequence length for this model (530 > 512). Running this sequence through the model will result in indexing errors


----------------------------------------------------------------------------
INPUT PROMPT:

Summarize the following incident.

45 foot aluminium hull vessel grounded on Squibnocket Beach (SE facing side) overnight.  The vessel is carrying 800 gallons of diesel fuel (no release).  The owner attempted to haul the vessel off the beach prior to high tide, resulting in a breach of the hull.  Coast Guard is on scene along with a professional salvage company with hopes of refloating the vessel on the high tide.  If that is not possible, a more complex salvage plan will be required.Access is difficult and the area is important to two endangered species, the piping plover (currently with chicks) and the northeastern beach tiger beetle.  This is critical habitat for the beetle in particular.USFWS and MA Div. of Fisheries and Wildlife experts have been engaged.

Summary:
    
----------------------------------------------------------------------------
MODEL GENERATION - ZERO SHOT:
A vessel ca

This result shows improvement, but there's still room for enhancement. The model doesn't seem to capture the subtleties present in the conversations.

<a name='3.2'></a>
## 3.2 - Zero Shot Inference Using the FLAN-T5 Prompt Template

Now, let's switch things up a bit with a different prompt. FLAN-T5 offers various prompt templates tailored for specific tasks, and you can find them **[here](https://github.com/google-research/FLAN/tree/main/flan/v2)**. In the upcoming code, we'll employ one of the **[pre-built FLAN-T5 prompts](https://github.com/google-research/FLAN/blob/main/flan/v2/templates.py)**:

In [None]:
# Iterate through example indices, where each index represents a specific example
for i, index in enumerate(example_indices):
    # Retrieve dialogue and summary for the current example
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']
    
    # Construct a prompt for summarizing the dialogue using the FLAN-T5 template
    prompt = f"""
Dialogue:

{dialogue}

What was going on?
"""

    # Tokenize the constructed prompt and convert it to PyTorch tensors
    inputs = tokenizer(prompt, return_tensors='pt')
    
    # Generate an output using the model, limiting the new tokens to 50
    # This uses the LLM to generate a summary of the dialogue with the constructed prompt
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"], 
            max_new_tokens=50,
        )[0], 
        skip_special_tokens=True
    )
    
    # Show the results
    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{prompt}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
    print(dash_line)
    print(f'MODEL GENERATION - ZERO SHOT:\n{output}\n')

Notice that this prompt from FLAN-T5 did help a bit, but still struggles to pick up on the nuance of the conversation. This is what you will try to solve with the few shot inferencing.

<a name='4'></a>
# 4 - Summarizing Dialogue Using One Shot and Few Shot Inference

In the realms of **one-shot and few-shot inference**, the approach involves presenting an LLM with either a single or a handful of complete examples of prompt-response pairs that align with your task. This practice, known as "in-context learning," establishes a state in the model that comprehends the specifics of your task. You can delve deeper into this concept by reading **[this blog from HuggingFace](https://huggingface.co/blog/few-shot-learning-gpt-neo-and-inference-api)**.

<a name='4.1'></a>
## 4.1 - One Shot Inference

We'll construct a function that accepts a list of **`example_indices_full`**, creates a prompt with full examples, and finally appends the prompt you want the model to complete (**`example_index_to_summarize`**). For this, we'll use the same FLAN-T5 prompt template from section [3.2](#3.2.). 

In [None]:
def make_prompt(full_examples_indices, index_to_summarize):
    """
    Construct a prompt for one-shot or few-shot inference.

    Parameters
    ----------
    full_examples_indices : list
        A list containing indices for complete dialogues to be included in the prompt. These dialogues serve as examples 
        for the model to learn from (for one-shot or few-shot inference).
    index_to_summarize : int
        The index for the dialogue that the model is expected to give a summary for.

    Returns
    -------
    str
        A prompt string that is constructed as per the given parameters - full dialogues examples followed by a dialogue 
        that needs to be summarized.
    """
    prompt = ''

    # Go through each index in the full examples list
    for index in full_examples_indices:
        dialogue = dataset['test'][index]['dialogue']
        summary = dataset['test'][index]['summary']

        # Add each dialogue and its summary to the prompt string, followed by a stop sequence. The stop sequence 
        # '{summary}\n\n\n' is essential for FLAN-T5 model. Other models may have their own different stop sequence.
        prompt += f"""
Dialogue:

{dialogue}

What was going on?
{summary}


"""

    # Now add the dialogue that needs to be summarized by the model
    dialogue_to_summarize = dataset['test'][index_to_summarize]['dialogue']

    # Append this new dialogue to the prompt string
    prompt += f"""
Dialogue:

{dialogue_to_summarize}

What was going on?
"""

    # Return the constructed prompt
    return prompt

Create the prompt for one-shot inference:

In [None]:
# Define index for full example to be included in the prompt as a one-shot example
full_examples_indices = [40]
# Define the index for the dialogue that the model is expected to give a summary for
example_index_to_summarize = 200

# Create the prompt for one-shot inference
one_shot_prompt = make_prompt(full_examples_indices, example_index_to_summarize)

print(one_shot_prompt)

Now, let's use this prompt for one-shot inference and observe the results (Generate a summary using the LLM with the prompt you just created):

In [None]:
# Retrieve the human-generated summary for the 'example_index_to_summarize' example
summary = dataset['test'][example_index_to_summarize]['summary']

# Tokenize the one-shot prompt and convert it to PyTorch tensors
inputs = tokenizer(one_shot_prompt, return_tensors='pt')

# Generate an output using the model, limiting the new tokens to 50
# This uses the LLM to generate a summary of the dialogue with the one-shot prompt
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0], 
    skip_special_tokens=True
)

# Show the results
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - ONE SHOT:\n{output}')

<a name='4.2'></a>
## 4.2 - Few Shot Inference

Now, let's explore few-shot inference by incorporating two additional full dialogue-summary pairs into our prompt.

In [None]:
# Define indices for full examples to be included in the prompt as a few-shot examples 
full_examples_indices = [40, 80, 120]
# Define the index for the dialogue that the model is expected to give a summary for
example_index_to_summarize = 200

# Create the prompt for few-shot inference
few_shot_prompt = make_prompt(full_examples_indices, example_index_to_summarize)

print(few_shot_prompt)

Now pass this prompt to perform a few shot inference:

In [None]:
# Retrieve the human-generated summary for the specified example
summary = dataset['test'][example_index_to_summarize]['summary']

# Tokenize the few-shot prompt and convert it to PyTorch tensors
inputs = tokenizer(few_shot_prompt, return_tensors='pt')

# Generate an output using the model, limiting the new tokens to 50
# This uses the LLM to generate a summary of the dialogue with the few-shot prompt
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0], 
    skip_special_tokens=True
)

# Show the results
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')

In this scenario, using few-shot inference didn't yield a significant improvement over one-shot inference. Moreover, going beyond 5 or 6 shots generally doesn't offer much help either. It's crucial to be mindful of not exceeding the model's input-context length, which, in our case, is 512 tokens. Any content beyond this context length will be disregarded.

However, it's noticeable that including at least one full example (one shot) furnishes the model with additional information, resulting in a qualitative enhancement in the overall summary.