# Exploring Generative AI for Dialogue Summarization
## Effective Prompt Engineering for Dialogue Summarization with Generative AI Google FLAN-T5-Base (Part 1)
Let's delve into the practical side of our study. In this notebook, we'll be focusing on the dialogue summarization task using generative AI. Our objective is to observe how adjustments to the input text impact the model's output. Through prompt engineering, we'll guide the model to align with our specific task requirements. By experimenting with zero shot, one shot, and few shot inferences, we'll initiate the journey into prompt engineering, witnessing firsthand its influence on enhancing the generative output of Large Language Models.

# Table of Contents

- [ 1 - Configuring Kernel and Installing Dependencies](#1)
- [ 2 - Dialogue Summarization without Prompt Engineering](#2)
- [ 3 - Summarizing Dialogue Using an Instruction Prompt](#3)
  - [ 3.1 - Zero Shot Inference Using an Instruction Prompt](#3.1)
  - [ 3.2 - Zero Shot Inference Using the FLAN-T5 Prompt Template](#3.2)
- [ 4 - Summarizing Dialogue Using One Shot and Few Shot Inference](#4)
  - [ 4.1 - One Shot Inference](#4.1)
  - [ 4.2 - Few Shot Inference](#4.2)
- [ 5 - Configuration Parameters for Generative Inference](#5)


# 1 - Configuring Kernel and Installing Dependencies
Let's set up the kernel and install the necessary packages to leverage PyTorch, Hugging Face transformers, and datasets.

Note: Executing this cell may require a few minutes.


In [1]:
%pip install --upgrade pip
%pip install --disable-pip-version-check \
    torch==1.13.1 \
    torchdata==0.5.1 --quiet

%pip install \
    transformers==4.27.2 \
    datasets==2.11.0  --quiet

Collecting pip
  Downloading pip-23.2.1-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m26.0 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.1.2
    Uninstalling pip-23.1.2:
      Successfully uninstalled pip-23.1.2
Successfully installed pip-23.2.1
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
apache-beam 2.46.0 requires dill<0.3.2,>=0.3.1.1, but you have dill 0.3.6 which is incompatible.
apache-beam 2.46.0 requires pyarrow<10.0.0,>=3.0.0, but you have pyarrow 11.0.0 which is incompatible.
pathos 0.3.1 requires dill>=0.3.7, but you have dill 0.3.6 which is

Load the datasets, Large Language Model (LLM), tokenizer, and configurator. Don't stress if you haven't grasped all these components yet; they'll be explained and discussed later in the notebook.

In [2]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

<a name='2'></a>
# 2 - Dialogue Summarization without Prompt Engineering

In this scenario, the goal is to generate a summary of a dialogue using the pre-trained Large Language Model (LLM) FLAN-T5 from Hugging Face. You can explore the list of available models in the Hugging Face transformers package [here](https://huggingface.co/docs/transformers/index). 

Let's load some straightforward dialogues from the DialogSum Hugging Face dataset. This dataset comprises over 10,000 dialogues, each accompanied by manually labeled summaries and topics.

Load the dataset from Hugging Face. The dataset is already preprocessed and split into train, validation, and test sets. You will use the test set to evaluate the model performance.

In [3]:
huggingface_dataset_name = "knkarthick/dialogsum"

# Load the dataset using Hugging Face's datasets library
dataset = load_dataset(huggingface_dataset_name)

Downloading readme:   0%|          | 0.00/4.65k [00:00<?, ?B/s]



Downloading and preparing dataset csv/knkarthick--dialogsum to /root/.cache/huggingface/datasets/knkarthick___csv/knkarthick--dialogsum-cd36827d3490488d/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1...


Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/442k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Dataset csv downloaded and prepared to /root/.cache/huggingface/datasets/knkarthick___csv/knkarthick--dialogsum-cd36827d3490488d/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

Print a couple of dialogues with their baseline summaries.

In [4]:
example_indices = [40, 200]

dash_line = '-'.join('' for x in range(100))

for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print('INPUT DIALOGUE:')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)
    print('BASELINE HUMAN SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)
    print()

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------
Exa

Load the pre-trained model FLAN-T5 from Hugging Face.
Incorporate the FLAN-T5 model into our workflow, we'll begin by creating an instance of the `AutoModelForSeq2SeqLM` class using the `.from_pretrained()` method. This allows us to easily tap into the model's pre-trained capabilities.


In [5]:
# Specify the model name
model_name='google/flan-t5-base'

# Load the model
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

To handle encoding and decoding, it's crucial to engage with text in a tokenized format. Tokenization is the practice of breaking down texts into smaller units, facilitating processing by LLM models.

Retrieve the tokenizer for the FLAN-T5 model by employing the AutoTokenizer.from_pretrained() method. The use_fast parameter activates the fast tokenizer. Currently, we won't delve into the intricacies of this setting, but you can explore the tokenizer parameters further in the [documentation](https://huggingface.co/docs/transformers/v4.28.1/en/model_doc/auto#transformers.AutoTokenizer).

In [6]:
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Test the tokenizer encoding and decoding a simple sentence:

In [7]:
# Define a test sentence
sentence = "This is a test sentence."

# Encode the sentence using the tokenizer, returning PyTorch tensors
sentence_encoded = tokenizer(sentence, return_tensors='pt')

# Decode the encoded sentence, skipping special tokens
sentence_decoded = tokenizer.decode(
        sentence_encoded["input_ids"][0], 
        skip_special_tokens=True
    )

# Print the encoded sentence's representation
print('ENCODED SENTENCE:')
print(sentence_encoded["input_ids"][0])

# Print the decoded sentence
print('\nDECODED SENTENCE:')
print(sentence_decoded)


ENCODED SENTENCE:
tensor([ 100,   19,    3,    9,  794, 7142,    5,    1])

DECODED SENTENCE:
This is a test sentence.


Let's dive into assessing how effectively the base LLM summarizes a dialogue without incorporating any prompt engineering. 
In simpler terms, **prompt engineering** involves humans tweaking the input to enhance the model's response for a specific task.

In [8]:
# Iterate through example indices (We defined them above), where each index represents a specific example
for i, index in enumerate(example_indices):

    # Retrieve dialogue and summary for the current example
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    # Tokenize the dialogue and convert it to a vector of PyTorch tensors
    inputs = tokenizer(dialogue, return_tensors='pt')

    # Generate an output using the model, limiting the new tokens to 50
    # This uses the LLM to generate a summary of the dialogue without any prompt engineering
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"], 
            max_new_tokens=50,
        )[0], 
        skip_special_tokens=True
    )

    # Show the results
    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{dialogue}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - WITHOUT PROMPT ENGINEERING:
Person1: It's ten to nine.

-------------------------------

The model's guesses make some sense, but it seems unsure about the task. It's doesn't seem to understand that we want it to summarize the dialogue. It looks like it's just creating the next sentence in the dialogue without knowing exactly what it's supposed to do. 
Let's see if we can help it out a bit by providing some instructions on what we want it to do with the dialogue (Prompt Engineering).

<a name='3'></a>
# 3 - Summarizing Dialogue Using an Instruction Prompt

Digging into prompt engineering is crucial when working with foundational models for text generation. For a brief introduction to prompt engineering, you might find [this blog](https://www.amazon.science/blog/emnlp-prompt-engineering-is-the-new-feature-engineering) from Amazon Science interesting.

<a name='3.1'></a>
## 3.1 - Zero Shot Inference Using an Instruction Prompt

When you want to guide the model to perform a specific task, like summarizing a dialogue, one approach is to transform the dialogue into an instruction prompt. This technique is commonly known as **zero-shot inference**. For insights into what zero-shot learning is and why it's significant for LLM models, you might find **[this blog from AWS](https://aws.amazon.com/blogs/machine-learning/zero-shot-prompting-for-the-flan-t5-foundation-model-in-amazon-sagemaker-jumpstart/)** helpful.

So here we will wrap the dialogue in a clear instruction and observe how the generated text responds:

In [9]:
# Iterate through example indices, where each index represents a specific example
for i, index in enumerate(example_indices):
    # Retrieve dialogue and summary for the current example
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    # Construct an instruction prompt for summarizing the dialogue 
    prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
    """

    # Tokenize the constructed prompt and convert it to PyTorch tensors
    inputs = tokenizer(prompt, return_tensors='pt')
    
    # Generate an output using the model, limiting the new tokens to 50
    # This uses the LLM to generate a summary of the dialogue with the constructed prompt
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"], 
            max_new_tokens=50,
        )[0], 
        skip_special_tokens=True
    )
    
    # Show the results
    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{prompt}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)    
    print(f'MODEL GENERATION - ZERO SHOT:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Summarize the following conversation.

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

Summary:
    
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - ZERO SHOT:
The train is about to

This result shows improvement, but there's still room for enhancement. The model doesn't seem to capture the subtleties present in the conversations.

**Optional:**
- Explore variations in the prompt text to observe changes in inferences. Test whether ending the prompt with an empty string versus Summary: affects the generated output.
- Experiment with rephrasing the initial part of the prompt text from Summarize the following conversation. to something else, and observe its impact on the generated output.

<a name='3.2'></a>
## 3.2 - Zero Shot Inference Using the FLAN-T5 Prompt Template

Now, let's switch things up a bit with a different prompt. FLAN-T5 offers various prompt templates tailored for specific tasks, and you can find them **[here](https://github.com/google-research/FLAN/tree/main/flan/v2)**. In the upcoming code, we'll employ one of the **[pre-built FLAN-T5 prompts](https://github.com/google-research/FLAN/blob/main/flan/v2/templates.py)**:

In [10]:
# Iterate through example indices, where each index represents a specific example
for i, index in enumerate(example_indices):
    # Retrieve dialogue and summary for the current example
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']
    
    # Construct a prompt for summarizing the dialogue using the FLAN-T5 template
    prompt = f"""
Dialogue:

{dialogue}

What was going on?
"""

    # Tokenize the constructed prompt and convert it to PyTorch tensors
    inputs = tokenizer(prompt, return_tensors='pt')
    
    # Generate an output using the model, limiting the new tokens to 50
    # This uses the LLM to generate a summary of the dialogue with the constructed prompt
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"], 
            max_new_tokens=50,
        )[0], 
        skip_special_tokens=True
    )
    
    # Show the results
    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{prompt}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
    print(dash_line)
    print(f'MODEL GENERATION - ZERO SHOT:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Dialogue:

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

What was going on?

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - ZERO SHOT:
Tom is late for the train.

--------------

Notice that this prompt from FLAN-T5 did help a bit, but still struggles to pick up on the nuance of the conversation. This is what you will try to solve with the few shot inferencing.

<a name='4'></a>
# 4 - Summarizing Dialogue Using One Shot and Few Shot Inference

In the realms of **one-shot and few-shot inference**, the approach involves presenting an LLM with either a single or a handful of complete examples of prompt-response pairs that align with your task. This practice, known as "in-context learning," establishes a state in the model that comprehends the specifics of your task. You can delve deeper into this concept by reading **[this blog from HuggingFace](https://huggingface.co/blog/few-shot-learning-gpt-neo-and-inference-api)**.

<a name='4.1'></a>
## 4.1 - One Shot Inference

We'll construct a function that accepts a list of **`example_indices_full`**, creates a prompt with full examples, and finally appends the prompt you want the model to complete (**`example_index_to_summarize`**). For this, we'll use the same FLAN-T5 prompt template from section [3.2](#3.2.). 

In [11]:
def make_prompt(full_examples_indices, index_to_summarize):
    """
    Construct a prompt for one-shot or few-shot inference.

    Parameters
    ----------
    full_examples_indices : list
        A list containing indices for complete dialogues to be included in the prompt. These dialogues serve as examples 
        for the model to learn from (for one-shot or few-shot inference).
    index_to_summarize : int
        The index for the dialogue that the model is expected to give a summary for.

    Returns
    -------
    str
        A prompt string that is constructed as per the given parameters - full dialogues examples followed by a dialogue 
        that needs to be summarized.
    """
    prompt = ''

    # Go through each index in the full examples list
    for index in full_examples_indices:
        dialogue = dataset['test'][index]['dialogue']
        summary = dataset['test'][index]['summary']

        # Add each dialogue and its summary to the prompt string, followed by a stop sequence. The stop sequence 
        # '{summary}\n\n\n' is essential for FLAN-T5 model. Other models may have their own different stop sequence.
        prompt += f"""
Dialogue:

{dialogue}

What was going on?
{summary}


"""

    # Now add the dialogue that needs to be summarized by the model
    dialogue_to_summarize = dataset['test'][index_to_summarize]['dialogue']

    # Append this new dialogue to the prompt string
    prompt += f"""
Dialogue:

{dialogue_to_summarize}

What was going on?
"""

    # Return the constructed prompt
    return prompt

Create the prompt for one-shot inference:

In [12]:
# Define index for full example to be included in the prompt as a one-shot example
full_examples_indices = [40]
# Define the index for the dialogue that the model is expected to give a summary for
example_index_to_summarize = 200

# Create the prompt for one-shot inference
one_shot_prompt = make_prompt(full_examples_indices, example_index_to_summarize)

print(one_shot_prompt)


Dialogue:

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

What was going on?
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.



Dialogue:

#Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It would allow you to make up your own flyers and banners for advertising.
#Person2#: That would be a definite bonus.
#Person1#: You might also want to upgrade your hardware because it is pretty outdated now.
#Person2#: How can we do that?
#Person1#: You'd probably need a faster processor, to begin with. And you also ne

Now, let's use this prompt for one-shot inference and observe the results (Generate a summary using the LLM with the prompt you just created):

In [13]:
# Retrieve the human-generated summary for the 'example_index_to_summarize' example
summary = dataset['test'][example_index_to_summarize]['summary']

# Tokenize the one-shot prompt and convert it to PyTorch tensors
inputs = tokenizer(one_shot_prompt, return_tensors='pt')

# Generate an output using the model, limiting the new tokens to 50
# This uses the LLM to generate a summary of the dialogue with the one-shot prompt
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0], 
    skip_special_tokens=True
)

# Show the results
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - ONE SHOT:\n{output}')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - ONE SHOT:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to add a CD-ROM drive.


<a name='4.2'></a>
## 4.2 - Few Shot Inference

Now, let's explore few-shot inference by incorporating two additional full dialogue-summary pairs into our prompt.

In [14]:
# Define indices for full examples to be included in the prompt as a few-shot examples 
full_examples_indices = [40, 80, 120]
# Define the index for the dialogue that the model is expected to give a summary for
example_index_to_summarize = 200

# Create the prompt for few-shot inference
few_shot_prompt = make_prompt(full_examples_indices, example_index_to_summarize)

print(few_shot_prompt)


Dialogue:

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

What was going on?
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.



Dialogue:

#Person1#: May, do you mind helping me prepare for the picnic?
#Person2#: Sure. Have you checked the weather report?
#Person1#: Yes. It says it will be sunny all day. No sign of rain at all. This is your father's favorite sausage. Sandwiches for you and Daniel.
#Person2#: No, thanks Mom. I'd like some toast and chicken wings.
#Person1#: Okay. Please take some fruit salad and crackers for me.
#Person2#: Done. Oh, don't forget to take napkins disposable plates, cups and picnic blanket.
#Person1#: All set. 

Now pass this prompt to perform a few shot inference:

In [15]:
# Retrieve the human-generated summary for the specified example
summary = dataset['test'][example_index_to_summarize]['summary']

# Tokenize the few-shot prompt and convert it to PyTorch tensors
inputs = tokenizer(few_shot_prompt, return_tensors='pt')

# Generate an output using the model, limiting the new tokens to 50
# This uses the LLM to generate a summary of the dialogue with the few-shot prompt
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0], 
    skip_special_tokens=True
)

# Show the results
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')

Token indices sequence length is longer than the specified maximum sequence length for this model (819 > 512). Running this sequence through the model will result in indexing errors


---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to upgrade his hardware.


In this scenario, using few-shot inference didn't yield a significant improvement over one-shot inference. Moreover, going beyond 5 or 6 shots generally doesn't offer much help either. It's crucial to be mindful of not exceeding the model's input-context length, which, in our case, is 512 tokens. Any content beyond this context length will be disregarded.

However, it's noticeable that including at least one full example (one shot) furnishes the model with additional information, resulting in a qualitative enhancement in the overall summary.

Feel free to experiment with few-shot inference:

Select different dialogues by modifying the indices in the example_indices_full list and the example_index_to_summarize value.
Adjust the number of shots, ensuring it remains within the model's 512 context lengths for fair comparison.

Observe how well few-shot inference performs with other examples.

<a name='5'></a>
# 5 - Configuration Parameters for Generative Inference
In this section, we'll delve into the various configuration parameters that play a crucial role in generative inference.






Feel free to alter the configuration parameters of the **`generate()`** method to observe varied outputs from the LLM. Up until now, the only parameter set was **`max_new_tokens=50`**, determining the maximum number of tokens to generate. For a comprehensive list of available parameters, refer to the **[Hugging Face Generation documentation](https://huggingface.co/docs/transformers/v4.29.1/en/main_classes/text_generation#transformers.GenerationConfig)**.

A convenient method for organizing configuration parameters is by using the **`GenerationConfig`** class.

In [16]:
# Define a GenerationConfig with specific parameters
generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=0.5)

# Tokenize the few-shot prompt and convert it to PyTorch tensors
inputs = tokenizer(few_shot_prompt, return_tensors='pt')

# Generate an output using the model, limiting the new tokens to 50
# This uses the LLM to generate a summary of the dialogue with the few-shot prompt
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        generation_config=generation_config,
    )[0], 
    skip_special_tokens=True
)

# Show the results
print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')

---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
#Person1 wants to upgrade his system and hardware.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.



Additionally, modify the configuration parameters to explore their impact on the output. By setting the parameter do_sample = True, you enable different decoding strategies that affect the selection of the next token from the probability distribution across the entire vocabulary. You can further adjust the outputs by tweaking parameters like temperature, as well as others such as top_k and top_p.

Comments regarding the parameter choices in the above code cell:

- Opting for **`max_new_tokens=10`** will result in overly concise output text, potentially truncating the dialogue summary.
- Enabling **`do_sample = True`** and adjusting the temperature value provides increased flexibility in the output.

As you can see, prompt engineering can take you a long way for this use case, but there are some limitations. While one-shot and few-shot inference strategies offer valuable insights, exceeding a certain number of shots may not necessarily improve performance, and it's crucial to stay within the model's context length.

The choice of parameters, such as do_sample, temperature, and max_new_tokens, significantly influences the generated output. Fine-tuning these parameters allows you to tailor the summary to your specific requirements, providing flexibility in adapting the model's behavior.

In the second notebook, we will explore how fine-tuning can be employed to enhance your LLM's understanding of a particular use case, unlocking even more potential for nuanced and accurate generative text. Stay tuned for a deeper dive into fine-tuning and its impact on model performance!