<a href="https://githubtocolab.com/Hbvsa/LLMs/blob/main/Prompting_examples_and_simple_chat_bot/prompt_tactics_examples_corrected.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Prompt engineering for the summarization of dialogues using the FLAN-T5 model


# Table of Contents

- [ 1 - Summarize Dialogue without Prompt Engineering](#1)
- [ 2 - Summarize Dialogue with an Instruction Prompt](#2)
- [ 3 - Summarize Dialogue with One Shot and Few Shot Inference](#3)
  - [ 3.1 - One Shot Inference](#3.1)
  - [ 3.2 - Few Shot Inference](#3.2)
- [ 4 - Generative Configuration Parameters for Inference](#4)
- [ 5 - Finetuning the LLM](#5)
  - [ 5.1 - Tokenize the train, test and validation datasets with the instruction prompt](#5.1)
  - [ 5.2 - LoRA Finetuning](#5.2)
- [ 6 - Evalute the LoRA model versus baseline](#6)
  - [ 6.1 - Qualitatively](#6.1)
  - [ 6.2 - Quantitatively with ROGUE scores](#6.2)

In [1]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
dataset = load_dataset("knkarthick/dialogsum")

<a name='1'></a>
## 1 - Summarization without Prompt Engineering

Generating a summary of a dialogue with the pre-trained Large Language Model (LLM) FLAN-T5 from Hugging Face with the [DialogSum](https://huggingface.co/datasets/knkarthick/dialogsum) Hugging Face dataset. The models available in the Hugging Face `transformers` package can be found [here](https://huggingface.co/docs/transformers/index).

Explore the dataset examples

In [None]:
dataset.shape

{'train': (12460, 4), 'validation': (500, 4), 'test': (1500, 4)}

In [None]:
dataset['train'][0]

{'id': 'train_0',
 'dialogue': "#Person1#: Hi, Mr. Smith. I'm Doctor Hawkins. Why are you here today?\n#Person2#: I found it would be a good idea to get a check-up.\n#Person1#: Yes, well, you haven't had one for 5 years. You should have one every year.\n#Person2#: I know. I figure as long as there is nothing wrong, why go see the doctor?\n#Person1#: Well, the best way to avoid serious illnesses is to find out about them early. So try to come at least once a year for your own good.\n#Person2#: Ok.\n#Person1#: Let me see here. Your eyes and ears look fine. Take a deep breath, please. Do you smoke, Mr. Smith?\n#Person2#: Yes.\n#Person1#: Smoking is the leading cause of lung cancer and heart disease, you know. You really should quit.\n#Person2#: I've tried hundreds of times, but I just can't seem to kick the habit.\n#Person1#: Well, we have classes and some medications that might help. I'll give you more information before you leave.\n#Person2#: Ok, thanks doctor.",
 'summary': "Mr. Smith'

In [None]:
dash_line = '-'.join('' for x in range(100))
for i, sample in enumerate(dataset['test']):
  print("Example",i)
  print(dash_line)
  print("Dialogue")
  print(dash_line)
  print(sample['dialogue'])
  print(dash_line)
  print("Summary")
  print(dash_line)
  print(sample['summary'])
  break

Example 0
---------------------------------------------------------------------------------------------------
Dialogue
---------------------------------------------------------------------------------------------------
#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to communica

Load the [FLAN-T5 model](https://huggingface.co/docs/transformers/model_doc/flan-t5)

In [None]:
model_name='google/flan-t5-small'
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

To perform encoding and decoding, you need to work with text in a tokenized form. Download the tokenizer for the FLAN-T5 model using `AutoTokenizer.from_pretrained()` method.

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

Test the tokenizer encoding and decoding a simple sentence:

In [None]:
sentence = "Is skarner jungle good in this meta?"

sentence_encoded = tokenizer(sentence, return_tensors='pt')

sentence_decoded = tokenizer.decode(
        sentence_encoded["input_ids"][0],
        skip_special_tokens=True
    )

print('ENCODED SENTENCE:')
print(sentence_encoded["input_ids"][0])
print('\nDECODED SENTENCE:')
print(sentence_decoded)

ENCODED SENTENCE:
tensor([   27,     7,     3,     7,  4031,   687, 19126,   207,    16,    48,
        10531,    58,     1])

DECODED SENTENCE:
Is skarner jungle good in this meta?


Without prompt engineering the models does not understand the task very well.

In [None]:
for i, sample in enumerate(dataset['test']):

    dialogue = sample['dialogue']
    summary = sample['summary']

    inputs = tokenizer(dialogue, return_tensors='pt')
    summary_generated = model.generate(
            inputs["input_ids"],
            max_new_tokens=50,
        )[0]

    output = tokenizer.decode(summary_generated,skip_special_tokens=True)

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{dialogue}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n')

    if i ==0:#change according to how many examples you want
      break

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:
#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to com

<a name='2'></a>
## 2 - Summarize Dialogue with an Instruction Prompt
Inject an instruction prompt to help the model understand the required task. We can see compared to the first example that the model did improve.



In [None]:
for i, sample in enumerate(dataset['test']):

    dialogue = sample['dialogue']
    summary = sample['summary']

    prompt = f"""
Summarize the following dialogue.
{dialogue}
Summary:
"""

    inputs = tokenizer(prompt, return_tensors='pt')
    summary_generated = model.generate(
            inputs["input_ids"],
            max_new_tokens=50,
        )[0]

    output = tokenizer.decode(summary_generated,skip_special_tokens=True)

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{prompt}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - ZERO SHOT:\n{output}\n')

    if i ==0:#change according to how many examples you want
      break

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Summarize the following dialogue.
#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many emp

<a name='3'></a>
## 3 - Summarize Dialogue with One Shot and Few Shot Inference
**One shot and few shot inference** is a method used to provide the LLM with examples of the task we require it to perform. This is also called "in-context learning" which gives the model the context to understand the specific task.


<a name='3.1'></a>
### 3.1 - One Shot Inference

Function which takes `example_samples` and generates a prompt with those completed examples. At the end of the examples adds the dialogue you want to summarize from `sample_to_summarize`.

In [None]:
def make_prompt(example_samples, sample_to_summarize):



    #Initialize prompt
    prompt = ''

    #Add examples
    for index in example_samples:
        dialogue = dataset['test'][index]['dialogue']
        summary = dataset['test'][index]['summary']
        prompt += f"""
Dialogue:
{dialogue}
Summarize the dialogue.
{summary}
"""
    #Add the dialogue of the sample you want to summarize and the instruction
    dialogue = dataset['test'][sample_to_summarize]['dialogue']

    prompt += f"""
Dialogue:
{dialogue}
Summarize the dialogue.
"""
    # return all the examples plus the dialogue you want to summarize
    return prompt

Construct the prompt to perform one shot inference:

In [None]:
example_samples = [10]
sample_to_summarize = 200
one_shot_prompt = make_prompt(example_samples, sample_to_summarize)
print(one_shot_prompt)


Dialogue:
#Person1#: Happy Birthday, this is for you, Brian.
#Person2#: I'm so happy you remember, please come in and enjoy the party. Everyone's here, I'm sure you have a good time.
#Person1#: Brian, may I have a pleasure to have a dance with you?
#Person2#: Ok.
#Person1#: This is really wonderful party.
#Person2#: Yes, you are always popular with everyone. and you look very pretty today.
#Person1#: Thanks, that's very kind of you to say. I hope my necklace goes with my dress, and they both make me look good I feel.
#Person2#: You look great, you are absolutely glowing.
#Person1#: Thanks, this is a fine party. We should have a drink together to celebrate your birthday
Summarize the dialogue.
#Person1# attends Brian's birthday party. Brian thinks #Person1# looks great and charming.

Dialogue:
#Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It woul

Now pass this prompt to perform the one shot inference:

In [None]:
summary = dataset['test'][sample_to_summarize]['summary']
inputs = tokenizer(one_shot_prompt, return_tensors='pt')
generated_summary = model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0]

output = tokenizer.decode(generated_summary, skip_special_tokens=True)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - ONE SHOT:\n{output}')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - ONE SHOT:
#Person1#: I'd like to upgrade my computer.


<a name='3.2'></a>
### 3.2 - Few Shot Inference

The performance of the model by including extra examples does not seem to improve that much. Although including at least one example is good. More then 5 or 6 is not normally used.

In [None]:
example_samples = [10, 20,30]
sample_to_summarize = 200
few_shot_prompt = make_prompt(example_samples, sample_to_summarize)
print(few_shot_prompt)


Dialogue:
#Person1#: Happy Birthday, this is for you, Brian.
#Person2#: I'm so happy you remember, please come in and enjoy the party. Everyone's here, I'm sure you have a good time.
#Person1#: Brian, may I have a pleasure to have a dance with you?
#Person2#: Ok.
#Person1#: This is really wonderful party.
#Person2#: Yes, you are always popular with everyone. and you look very pretty today.
#Person1#: Thanks, that's very kind of you to say. I hope my necklace goes with my dress, and they both make me look good I feel.
#Person2#: You look great, you are absolutely glowing.
#Person1#: Thanks, this is a fine party. We should have a drink together to celebrate your birthday
Summarize the dialogue.
#Person1# attends Brian's birthday party. Brian thinks #Person1# looks great and charming.

Dialogue:
#Person1#: What's wrong with you? Why are you scratching so much?
#Person2#: I feel itchy! I can't stand it anymore! I think I may be coming down with something. I feel lightheaded and weak.
#Per

Now pass this prompt to perform a few shot inference:

In [None]:
summary = dataset['test'][sample_to_summarize]['summary']
inputs = tokenizer(few_shot_prompt, return_tensors='pt')
generated_summary = model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0]

output = tokenizer.decode(generated_summary,skip_special_tokens=True)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')

Token indices sequence length is longer than the specified maximum sequence length for this model (965 > 512). Running this sequence through the model will result in indexing errors


---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
#Person1#: I'd like to upgrade my computer.


<a name='4'></a>
## 4 - Generation parameters

Changing the generation parameters. The temperature controls how the probability distribution for the generation of tokens is being distributed. A higher temperature increases lower probability tokens for more creativity but also hallucinations.

In [None]:
#generation_config = GenerationConfig(max_new_tokens=50)
# generation_config = GenerationConfig(max_new_tokens=10)
# generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=0.1)
# generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=0.5)
generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=0.5)

inputs = tokenizer(few_shot_prompt, return_tensors='pt')
model_generation = model.generate(
        inputs["input_ids"],
        generation_config=generation_config,
    )[0]

output = tokenizer.decode(model_generation,skip_special_tokens=True)

print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')

---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
Person1 is going to see Alice later.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# greets Mrs. Todd and then they say goodbye to each other.



<a name='3'></a>
## 5 - Finetuning the LLM

In [3]:
from transformers import TrainingArguments, Trainer
import torch
import time
import evaluate
import pandas as pd
import numpy as np

In [None]:
model_name='google/flan-t5-small'
model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)
def print_number_of_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"

print(print_number_of_trainable_model_parameters(model))

trainable model parameters: 76961152
all model parameters: 76961152
percentage of trainable model parameters: 100.00%


<a name='5.1'></a>
###5.1 -Tokenize the train, test and validation datasets with the instruction prompt

In [None]:
def tokenize_function(sample):
    start_prompt = 'Summarize the following conversation.\n\n'
    end_prompt = '\n\nSummary: '
    #Add the instruction prompts
    prompt = [start_prompt + dialogue + end_prompt for dialogue in sample["dialogue"]]
    #Tokenize the inputs and labels/responses
    sample['input_ids'] = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids
    sample['labels'] = tokenizer(sample["summary"], padding="max_length", truncation=True, return_tensors="pt").input_ids

    return sample

#the map function distributes the function across all samples across all splits
tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(['id', 'topic', 'dialogue', 'summary',])

<a name='5.2'></a>
###5.2 - LoRA finetuning

In [None]:
from peft import LoraConfig, get_peft_model, TaskType

lora_config = LoraConfig(
    r=32, # Rank
    lora_alpha=32,
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5
)

In [None]:
peft_model = get_peft_model(model,
                            lora_config)
print(print_number_of_trainable_model_parameters(peft_model))

trainable model parameters: 1376256
all model parameters: 78337408
percentage of trainable model parameters: 1.76%


In [None]:
output_dir = f'./peft-dialogue-summary-training-{str(int(time.time()))}'

peft_training_args = TrainingArguments(
    output_dir=output_dir,
    auto_find_batch_size=True,
    learning_rate=1e-3, # Higher learning rate than full fine-tuning.
    num_train_epochs=1,
)

peft_trainer = Trainer(
    model=peft_model,
    args=peft_training_args,
    train_dataset=tokenized_datasets["train"],
)

In [None]:
peft_trainer.train()

Step,Training Loss
500,1.9427
1000,1.8109
1500,1.7897


TrainOutput(global_step=1558, training_loss=1.8454599245827985, metrics={'train_runtime': 712.9349, 'train_samples_per_second': 17.477, 'train_steps_per_second': 2.185, 'total_flos': 2368874804674560.0, 'train_loss': 1.8454599245827985, 'epoch': 1.0})

In [None]:
peft_model_path=r'Desktop/GithubLLMs/LLMs/peft-dialogue-summary-checkpoint-local'

peft_trainer.model.save_pretrained(peft_model_path)
tokenizer.save_pretrained(peft_model_path)

('Desktop/GithubLLMs/LLMs/peft-dialogue-summary-checkpoint-local\\tokenizer_config.json',
 'Desktop/GithubLLMs/LLMs/peft-dialogue-summary-checkpoint-local\\special_tokens_map.json',
 'Desktop/GithubLLMs/LLMs/peft-dialogue-summary-checkpoint-local\\tokenizer.json')

<a name='6'></a>
##6 - Evaluting LoRA model

In [6]:
from transformers import TrainingArguments, Trainer
import torch
import time
import evaluate
import pandas as pd
import numpy as np

In [7]:
#The normal model
model_name='google/flan-t5-small'
tokenizer = AutoTokenizer.from_pretrained(model_name)
original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)

In [4]:
#The peft model
from peft import PeftModel, PeftConfig
peft_model_base = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small", torch_dtype=torch.bfloat16)
peft_model = PeftModel.from_pretrained(peft_model_base,
                                       'Desktop/GithubLLMs/LLMs/peft-dialogue-summary-checkpoint-local',
                                       torch_dtype=torch.bfloat16,
                                       is_trainable=False)

<a name='6.1'></a>
###6.1 - Qualitatively

In [8]:
index = 200
dialogue = dataset['test'][index]['dialogue']
baseline_human_summary = dataset['test'][index]['summary']

prompt = f"""
Summarize the following conversation.

{dialogue}

Summary: """

input_ids = tokenizer(prompt, return_tensors="pt").input_ids

original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

instruct_model_outputs = peft_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
instruct_model_text_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)


In [9]:
print("---------------------------------------------")
print(f'BASELINE HUMAN SUMMARY:\n{baseline_human_summary}')
print("---------------------------------------------")
print(f'ORIGINAL MODEL:\n{original_model_text_output}')
print("---------------------------------------------")
print(f'INSTRUCT MODEL:\n{instruct_model_text_output}')

---------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.
---------------------------------------------
ORIGINAL MODEL:
How would you like to upgrade your computer?
---------------------------------------------
INSTRUCT MODEL:
#Person1# considers adding a painting program to #Person2#'s software. #Person2# thinks #Person2#'s need is a faster processor, more memory and a faster modem. #Person2# thinks #Person2#'s need is a CD-ROM drive.


<a name='6.2'></a>
###6.2 - Quantitavely with ROGUE scores

In [28]:
dialogues = dataset['test'][0:10]['dialogue']
human_baseline_summaries = dataset['test'][0:10]['summary']

original_model_summaries = []
peft_model_summaries = []

import torch
# Check if CUDA is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Move your models to CUDA device
original_model = original_model.to(device)
peft_model = peft_model.to(device)

# Loop over your dialogues
for idx, dialogue in enumerate(dialogues):
    prompt = f"""
Summarize the following conversation.

{dialogue}

Summary: """

    input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)  # Move input tensor to CUDA

    # Generate summaries using models
    with torch.no_grad():  # Ensure no gradients are computed during inference
        original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200))
        original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

        peft_model_outputs = peft_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200))
        peft_model_text_output = tokenizer.decode(peft_model_outputs[0], skip_special_tokens=True)

    original_model_summaries.append(original_model_text_output)
    peft_model_summaries.append(peft_model_text_output)

# Once inference is done, move your models back to CPU if needed
original_model = original_model.to('cpu')
peft_model = peft_model.to('cpu')

zipped_summaries = list(zip(human_baseline_summaries, original_model_summaries, peft_model_summaries))

df = pd.DataFrame(zipped_summaries, columns = ['human_baseline_summaries', 'original_model_summaries', 'peft_model_summaries'])
df

Unnamed: 0,human_baseline_summaries,original_model_summaries,peft_model_summaries
0,Ms. Dawson helps #Person1# to write a memo to ...,Is this all correct?,#Person1# asks Ms. Dawson to take a dictation ...
1,In order to prevent employees from wasting tim...,Is this all correct?,#Person1# asks Ms. Dawson to take a dictation ...
2,Ms. Dawson takes a dictation for #Person1# abo...,Is this all correct?,#Person1# asks Ms. Dawson to take a dictation ...
3,#Person2# arrives late because of traffic jam....,Talk to your boss.,#Person1# thinks #Person2#'s car is a lot less...
4,#Person2# decides to follow #Person1#'s sugges...,Talk to your boss.,#Person1# thinks #Person2#'s car is a lot less...
5,#Person2# complains to #Person1# about the tra...,Talk to your boss.,#Person1# thinks #Person2#'s car is a lot less...
6,#Person1# tells Kate that Masha and Hero get d...,"Kate, you know, I'm not sure.",Kate and Masha are getting divorced. #Person1#...
7,#Person1# tells Kate that Masha and Hero are g...,"Kate, you know, I'm not sure.",Kate and Masha are getting divorced. #Person1#...
8,#Person1# and Kate talk about the divorce betw...,"Kate, you know, I'm not sure.",Kate and Masha are getting divorced. #Person1#...
9,#Person1# and Brian are at the birthday party ...,"Brian, how are you?",Brian invites #Person2# to have a dance with B...


In [29]:
rouge = evaluate.load('rouge')

original_model_results = rouge.compute(
    predictions=original_model_summaries,
    references=human_baseline_summaries[0:len(original_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)


peft_model_results = rouge.compute(
    predictions=peft_model_summaries,
    references=human_baseline_summaries[0:len(peft_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

print('ORIGINAL MODEL:')
print(original_model_results)
print('PEFT MODEL:')
print(peft_model_results)

Downloading builder script: 100%|██████████| 6.27k/6.27k [00:00<?, ?B/s]


ORIGINAL MODEL:
{'rouge1': 0.07110365882105012, 'rouge2': 0.0, 'rougeL': 0.07008000855826943, 'rougeLsum': 0.07055675001327175}
PEFT MODEL:
{'rouge1': 0.3429774614636471, 'rouge2': 0.07247332595485587, 'rougeL': 0.23558768111323858, 'rougeLsum': 0.23565232991721374}
