# Project name: Text Summarization.
# LLM Model: google/flan-t5-base.
# Dataset: knkarthick/dialogsum from HuggingFaceHub

# Project outline:

### 1. Project objective
### 2. Dataset details
### 3. Prompt engineering
### 4. Full fine tuning
### 5. Fine tuning using PEFT technique
### 6. Evaluation metric
### 7. Conclusion


# 1. Project objective:

Summarization of conversational dialogue using Large language models and also includes understanding of fine tuning, Full-Fine tuning of LLMs and PEFT techniques to downstream LLMs for specific applications.

# 2. Dataset details:

These datasets contain face-to-face spoken dialogues that cover a wide range of daily-life topics, including schooling, work, medication, shopping, leisure, travel. Most conversations take place between friends, colleagues, and between service providers and customers.

Dataset contains 4 fields.

1. Dialogue: text of dialogue.
2. summary: human written summary of the dialogue.
3. topic: human written topic/one liner of the dialogue.
4. id: unique file id of an example.

There are total 14460 conversations(dialogue-summary) and split is as follows.

Train samples: 12460

Validation samples: 500

Test samples: 1500

# 3. Prompt engineering:

Looking for better dialogue summarization using different prompts.

1. Zero-shot inference
2. One-shot inference
3. Few-shot inference

In [2]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
import torch
import torchdata
import evaluate
import time
import numpy as np
import pandas as pd

caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so: undefined symbol: _ZN3tsl6StatusC1EN10tensorflow5error4CodeESt17basic_string_viewIcSt11char_traitsIcEENS_14SourceLocationE']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so: undefined symbol: _ZTVN10tensorflow13GcsFileSystemE']


In [3]:
# Huggingface dataset id and model name.
Huggingface_dataset='knkarthick/dialogsum'
model_name='google/flan-t5-base'

dataset=load_dataset(Huggingface_dataset)
original_model=AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
tokenizer=AutoTokenizer.from_pretrained(model_name)

# Since not having enough GPU memory, saving and running original, instruct and peft model from a local directory. 
output_dir='/kaggle/working/original_model'
original_model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

Downloading readme:   0%|          | 0.00/4.56k [00:00<?, ?B/s]

Downloading and preparing dataset csv/knkarthick--dialogsum to /root/.cache/huggingface/datasets/knkarthick___csv/knkarthick--dialogsum-c8fac5d84cd35861/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1...


Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/442k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Dataset csv downloaded and prepared to /root/.cache/huggingface/datasets/knkarthick___csv/knkarthick--dialogsum-c8fac5d84cd35861/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

('/kaggle/working/original_model/tokenizer_config.json',
 '/kaggle/working/original_model/special_tokens_map.json',
 '/kaggle/working/original_model/spiece.model',
 '/kaggle/working/original_model/added_tokens.json',
 '/kaggle/working/original_model/tokenizer.json')

In [8]:
def print_number_of_trainable_model_parameters(model):
    "Function returns total parameters and trainable parameters"
    
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    
    return f"Trainable model parameters: {trainable_model_params}\nAll model parameters: {all_model_params}\nPercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"

print(print_number_of_trainable_model_parameters(original_model))

Trainable model parameters: 247577856
All model parameters: 247577856
Percentage of trainable model parameters: 100.00%


### 3.1. Prompt engineering: Zero-shot inference.

In [9]:
def prompt_eng_zero_shot(index_number):
    " This function takes conversation/row index number and returns zero-shot inference output "
    
    outputs=[]
    
    summary=dataset['test'][index_number]['summary']
    dialogue=dataset['test'][index_number]['dialogue']

    prompt1 = f"""Summarize the following conversation.
                  {dialogue}
                  summary:
                  """

    prompt2 = f"""Extract the key takeaways from below conversation.
                  {dialogue}
                  summary:
                  """

    prompt3 = f"""Conclusion of below dialogue.
                  {dialogue}
                  summary:
                  """

    for prompt in [prompt1, prompt2, prompt3]:
        inputs=tokenizer(prompt, return_tensors='pt').input_ids
        output=tokenizer.decode(original_model.generate(inputs, max_new_tokens=200)[0], skip_special_tokens=True)
        outputs.append(output)
    
    return print(f"Prompt1 summary: {outputs[0]} \nPrompt2 summary: {outputs[1]}  \nPrompt3 summary: {outputs[2]}")

prompt_eng_zero_shot(index_number=300)

Prompt1 summary: The President of the United States is a man of faith in Trump. 
Prompt2 summary: The president of the United States is a man of faith in Trump.  
Prompt3 summary: Person1: I cannot imagine if Trump were to be our President again. Person2: I have nothing but faith in Trump.


### 3.2. Prompt engineering: One-shot inference

In [10]:
def prompt_eng_one_shot(index_number_full, index_number_summarize):
    " This function takes conversation/row index number and returns one-shot inference output "
    
    outputs=[]
    
    summary1=dataset['test'][index_number_full]['summary']
    dialogue1=dataset['test'][index_number_full]['dialogue']
    
    summary2=dataset['test'][index_number_summarize]['summary']
    dialogue2=dataset['test'][index_number_summarize]['dialogue']
    
    prompt1 = f"""Summarize the following conversation.
                  {dialogue1}
                  summary:{summary1}
                  
                  Summarize the following conversation.
                  {dialogue2}
                  summary:
                  """

    prompt2 = f"""Extract the key takeaways from below conversation.
                  {dialogue1}
                  summary:{summary1}
                  
                  Extract the key takeaways from below conversation.
                  {dialogue2}
                  summary:
                  """

    prompt3 = f"""Conclusion of below dialogue.
                  {dialogue1}
                  summary:{summary1}
                  
                  Conclusion of below dialogue
                  {dialogue2}
                  summary:
                  """

    for prompt in [prompt1, prompt2, prompt3]:
        inputs=tokenizer(prompt, return_tensors='pt').input_ids
        output=tokenizer.decode(original_model.generate(inputs, max_new_tokens=200)[0], skip_special_tokens=True)
        outputs.append(output)
    
    return print(f"Prompt1 summary: {outputs[0]} \nPrompt2 summary: {outputs[1]}  \nPrompt3 summary: {outputs[2]}")

prompt_eng_one_shot(index_number_full=200, index_number_summarize=300)

Token indices sequence length is longer than the specified maximum sequence length for this model (513 > 512). Running this sequence through the model will result in indexing errors


Prompt1 summary: #Person1: I am not sure if Trump is the right person to be our President again. #Person2: I am not sure if he is the right person to be our President again. #Person1: I am not sure if he is the right person to be our President again. #Person2: I am not sure if he is the right person to be our President again. 
Prompt2 summary: Person1: I cannot imagine if Trump were to be our President again.  
Prompt3 summary: Person1: I cannot imagine if Trump were to be our President again.


### 3.3. Prompt engineering: Few-shot inference

In few-shot inference, we have to supply more than one dialogue-summary pair hence considering 3 pairs.

If model is not giving better results for 5 or 6 dialogue-summary pairs then its better to go for fine tuning technique.

In [11]:
def prompt_eng_few_shot(index_number_full, index_number_summarize):
    
    " This function takes conversation/row index number and returns few-shot inference output "
    
    outputs=[]
    
    summary1=dataset['test'][index_number_full[0]]['summary']
    dialogue1=dataset['test'][index_number_full[0]]['dialogue']
    
    summary2=dataset['test'][index_number_full[1]]['summary']
    dialogue2=dataset['test'][index_number_full[1]]['dialogue']
    
    summary3=dataset['test'][index_number_full[2]]['summary']
    dialogue3=dataset['test'][index_number_full[2]]['dialogue']
    
    summary4=dataset['test'][index_number_summarize]['summary']
    dialogue4=dataset['test'][index_number_summarize]['dialogue']
    
    prompt1 = f"""Summarize the following conversation.
                  {dialogue1}
                  summary:{summary1}
                  
                  Summarize the following conversation.
                  {dialogue2}
                  summary:{summary2}
                  
                  Summarize the following conversation.
                  {dialogue3}
                  summary:{summary3}
                  
                  Summarize the following conversation.
                  {dialogue4}
                  summary:
                  """

    prompt2 = f"""Extract the key takeaways from below conversation.
                  {dialogue1}
                  summary:{summary1}
                  
                  Extract the key takeaways from below conversation.
                  {dialogue2}
                  summary:{summary2}
                  
                  Extract the key takeaways from below conversation.
                  {dialogue3}
                  summary:{summary3}
                  
                  Extract the key takeaways from below conversation.
                  {dialogue4}
                  summary:
                  """

    prompt3 = f"""Conclusion of below dialogue
                  {dialogue1}
                  summary:{summary1}
                  
                  Conclusion of below dialogue
                  {dialogue2}
                  summary:{summary2}
                  
                  Conclusion of below dialogue
                  {dialogue3}
                  summary:{summary3}
                  
                  Conclusion of below dialogue
                  {dialogue4}
                  summary:
                  """

    for prompt in [prompt1, prompt2, prompt3]:
        inputs=tokenizer(prompt, return_tensors='pt').input_ids
        output=tokenizer.decode(original_model.generate(inputs, max_new_tokens=200)[0], skip_special_tokens=True)
        outputs.append(output)
    
    return print(f"Prompt1 summary: {outputs[0]} \nPrompt2 summary: {outputs[1]}  \nPrompt3 summary: {outputs[2]}")

prompt_eng_few_shot(index_number_full=[100, 200, 400], index_number_summarize=300)

Prompt1 summary: The President of the United States is a man of faith in Trump. 
Prompt2 summary: The President of the United States is a man of faith in Trump.  
Prompt3 summary: Person1 is proud to say that he is our President, and he will be really happy if he could be re-elected.


### Takeway from prompt engineering technique:
Zero-shot: completions are single line and informative.

One-shot: more or less all completions are very similar for different prompts.

Few-shot: Completions are more or less similar to zero shot inferences.

# 4. Full fine tuning:
In Full fine tuning, understanding the behaviour of model after full fine tuning, which means there is a chances that most of the parameters will get updated hence could expect coherent and reliable completions.

There is enough resources(GPU) requirement involved in full fine tuning activity, hence considered small dataset to train model. There is already exist a fully trained original model hence that model will be used for inferences.

In [12]:
def tokenize_function(example):
    
    "This function helps to tokenize prompt-summary pairs and save their tensor id in dataset."
    
    start_prompt='Summarize the following conversation. \n\n'
    end_prompt='\n\nSummary:'
    prompt=[start_prompt+dialogue+end_prompt for dialogue in example['dialogue']]
    example['input_ids']=tokenizer(prompt, padding="max_length", truncation=True, return_tensors='pt').input_ids
    example['labels']=tokenizer(prompt, padding="max_length",truncation=True, return_tensors='pt').input_ids
    
    return example

tokenized_dataset=dataset.map(tokenize_function, batched=True)
tokenized_dataset=tokenized_dataset.remove_columns(['id', 'topic', 'dialogue', 'summary'])
tokenized_datasets = tokenized_dataset.filter(lambda example, index: index % 100 == 0, with_indices=True)

# Training dataset shape.
print(f"Shapes of the datasets:")
print(f"Training: {tokenized_datasets['train'].shape}")
print(f"Validation: {tokenized_datasets['validation'].shape}")
print(f"Test: {tokenized_datasets['test'].shape}")

Map:   0%|          | 0/12460 [00:00<?, ? examples/s]

Map:   0%|          | 0/1500 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

Filter:   0%|          | 0/12460 [00:00<?, ? examples/s]

Filter:   0%|          | 0/1500 [00:00<?, ? examples/s]

Filter:   0%|          | 0/500 [00:00<?, ? examples/s]

Shapes of the datasets:
Training: (125, 2)
Validation: (5, 2)
Test: (15, 2)


In [13]:
output_dir=f'./dialogue-summary-training-{str(int(time.time()))}'

# Defining training configurations.
training_args=TrainingArguments(output_dir=output_dir, learning_rate=1e-5, num_train_epochs=1, weight_decay=0.01, logging_steps=1, max_steps=1)
trainer=Trainer(model=original_model, args=training_args, train_dataset=tokenized_datasets['train'], eval_dataset=tokenized_datasets['validation'])

# original model training.
trainer.train()

# Saving trained model and tokenizer in a local directory.
model_path='/kaggle/working/full_fine_model'
trainer.model.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

  ········································


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Step,Training Loss
1,27.625


('/kaggle/working/full_fine_model/tokenizer_config.json',
 '/kaggle/working/full_fine_model/special_tokens_map.json',
 '/kaggle/working/full_fine_model/spiece.model',
 '/kaggle/working/full_fine_model/added_tokens.json',
 '/kaggle/working/full_fine_model/tokenizer.json')

In [15]:
def original_instruct_model_prediction(index): 
    " This model takes datset index as input and returns original and instruct model completions."
    
    # Below instruct model trained on 1500 datapoints and which is different than a model trained above. Above instruct model is trained on 125 datapoints, which really doesn't affect almost parameters.
    instruct_model=AutoModelForSeq2SeqLM.from_pretrained('/kaggle/input/instruct-model', torch_dtype=torch.bfloat16) 
    original_model=AutoModelForSeq2SeqLM.from_pretrained('/kaggle/working/original_model', torch_dtype=torch.bfloat16)
    
    dialogue = dataset['test'][index]['dialogue']
    human_baseline_summary = dataset['test'][index]['summary']

    prompt = f"""
    Summarize the following conversation.

    {dialogue}

    Summary:
    """

    input_ids = tokenizer(prompt, return_tensors="pt").input_ids

    original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
    original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

    instruct_model_outputs = instruct_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
    instruct_model_text_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)
    
    return print(f"Human_baseline_summary:{human_baseline_summary} \n\nOriginal_model_output:{original_model_text_output} \n\nInstruct_model_output:{instruct_model_text_output}")

original_instruct_model_prediction(index=200)

Human_baseline_summary:#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system. 

Original_model_output:#Person1#: I'm thinking of upgrading my computer. 

Instruct_model_output:#Person1# suggests #Person2# upgrading #Person2#'s system, hardware, and CD-ROM drive. #Person2# thinks it's great.


# 5. Fine tuning using PEFT technique:

Advantage of using PEFT method is:

1. Requires less GPU memory.
2. Only few % of parameters will get updated which is most required factor.
3. Overcoming catastrophic forgetting.
4. PEFT model results are very low deviated from instruct model and very much evolved from original/base model.

In [16]:
from peft import LoraConfig, get_peft_model, TaskType, PeftConfig

lora_config=LoraConfig(r=32, lora_alpha=32, target_modules=['q','v'], lora_dropout=0.05, bias='none', task_type=TaskType.SEQ_2_SEQ_LM)

peft_model=get_peft_model(original_model, lora_config)

print(print_number_of_trainable_model_parameters(peft_model))

# Train PEFT model.
output_dir=f'./peft-dialogue-summary-training-{str(int(time.time()))}'

peft_traning_args=TrainingArguments(output_dir=output_dir, auto_find_batch_size=True, learning_rate=1e-3, num_train_epochs=1, logging_steps=1, max_steps=1)
peft_trainer=Trainer(model=peft_model, args=peft_traning_args, train_dataset=tokenized_datasets['train'])
peft_trainer.train()

# Saving peft_model in a local directory.
peft_model_path='/kaggle/working/peft_model'
peft_trainer.model.save_pretrained(peft_model_path)
tokenizer.save_pretrained(peft_model_path)

Trainable model parameters: 3538944
All model parameters: 251116800
Percentage of trainable model parameters: 1.41%


Step,Training Loss
1,27.625


('/kaggle/working/peft_model/tokenizer_config.json',
 '/kaggle/working/peft_model/special_tokens_map.json',
 '/kaggle/working/peft_model/spiece.model',
 '/kaggle/working/peft_model/added_tokens.json',
 '/kaggle/working/peft_model/tokenizer.json')

In [17]:
from peft import PeftModel, PeftConfig

# Load peft config for pre-trained checkpoint etc.
peft_model_id = "/kaggle/input/peft-model/peft_model"
config = PeftConfig.from_pretrained(peft_model_id)

# load base LLM model and tokenizer
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path,  torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load the Lora model
peft_model = PeftModel.from_pretrained(model, peft_model_id, is_trainable=False)
peft_model.eval()

print("Peft model loaded")

Peft model loaded


In [18]:
def original_instruct_peft_model_prediction(index): 
    " This model takes datset index as input and returns original, instruct and peft model completions."
    
    instruct_model=AutoModelForSeq2SeqLM.from_pretrained('/kaggle/input/instruct-model', torch_dtype=torch.bfloat16)
    original_model=AutoModelForSeq2SeqLM.from_pretrained('/kaggle/working/original_model', torch_dtype=torch.bfloat16)
    
    dialogue = dataset['test'][index]['dialogue']
    human_baseline_summary = dataset['test'][index]['summary']

    prompt = f"""
    Summarize the following conversation.

    {dialogue}

    Summary:
    """

    input_ids = tokenizer(prompt, return_tensors="pt").input_ids

    original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
    original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

    instruct_model_outputs = instruct_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
    instruct_model_text_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)
    
    peft_model_outputs = peft_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
    peft_model_text_output = tokenizer.decode(peft_model_outputs[0], skip_special_tokens=True)
    
    return print(f"Human_baseline_summary:{human_baseline_summary} \n\nOriginal_model_output:{original_model_text_output} \n\nInstruct_model_output:{instruct_model_text_output} \n\nPeft_model_output:{peft_model_text_output}")

original_instruct_peft_model_prediction(index=200)

Human_baseline_summary:#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system. 

Original_model_output:#Person1#: I'm thinking of upgrading my computer. 

Instruct_model_output:#Person1# suggests #Person2# upgrading #Person2#'s system, hardware, and CD-ROM drive. #Person2# thinks it's great. 

Peft_model_output:#Person1# recommends adding a painting program to #Person2#'s software and upgrading hardware. #Person2# also wants to upgrade the hardware because it's outdated now.


# 6. Evaluation metric:

In [19]:
rouge = evaluate.load('rouge')

def generate_model_summaries(index):
    
    "This function takes input as index range and returns instruct and peft model completions."
    
    instruct_model=AutoModelForSeq2SeqLM.from_pretrained('/kaggle/input/instruct-model', torch_dtype=torch.bfloat16)
    original_model=AutoModelForSeq2SeqLM.from_pretrained('/kaggle/working/original_model', torch_dtype=torch.bfloat16)
    peft_model = PeftModel.from_pretrained(model, peft_model_id, is_trainable=False)
    
    dialogues = dataset['test'][index[0]:index[1]]['dialogue']
    human_baseline_summaries = dataset['test'][index[0]:index[1]]['summary']

    original_model_summaries = []
    instruct_model_summaries = []
    peft_model_summaries = []

    for idx, dialogue in enumerate(dialogues):
        
        prompt = f"""
                 Summarize the following conversation.

                {dialogue}

                Summary: """

        input_ids = tokenizer(prompt, return_tensors="pt").input_ids

        human_baseline_text_output = human_baseline_summaries[idx]
        original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200))
        original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

        instruct_model_outputs = instruct_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200))
        instruct_model_text_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)

        peft_model_outputs = peft_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200))
        peft_model_text_output = tokenizer.decode(peft_model_outputs[0], skip_special_tokens=True)

        original_model_summaries.append(original_model_text_output)
        instruct_model_summaries.append(instruct_model_text_output)
        peft_model_summaries.append(peft_model_text_output)
        zipped_summaries = list(zip(human_baseline_summaries, original_model_summaries, instruct_model_summaries, peft_model_summaries))
    
    original_model_results = rouge.compute(predictions=original_model_summaries, references=human_baseline_summaries[0:len(original_model_summaries)], use_aggregator=True, use_stemmer=True)
    instruct_model_results = rouge.compute(predictions=instruct_model_summaries, references=human_baseline_summaries[0:len(instruct_model_summaries)], use_aggregator=True, use_stemmer=True)
    peft_model_results = rouge.compute(predictions=peft_model_summaries, references=human_baseline_summaries[0:len(peft_model_summaries)], use_aggregator=True, use_stemmer=True)
    
    return zipped_summaries, original_model_results, instruct_model_results, peft_model_results
    
zipped_summaries, original_model_results, instruct_model_results, peft_model_results = generate_model_summaries([0,100])    

df = pd.DataFrame(zipped_summaries, columns = ['human_baseline_summaries', 'original_model_summaries', 'instruct_model_summaries', 'peft_model_summaries'])

print(f'ORIGINAL MODEL:{original_model_results} \n\nINSTRUCT MODEL:{instruct_model_results} \n\nPEFT MODEL:{peft_model_results}')

Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

ORIGINAL MODEL:{'rouge1': 0.22008050937450813, 'rouge2': 0.060234779819727094, 'rougeL': 0.1905827047052721, 'rougeLsum': 0.19015343359756315} 

INSTRUCT MODEL:{'rouge1': 0.3840066445307293, 'rouge2': 0.14574678121958184, 'rougeL': 0.3057682999785146, 'rougeLsum': 0.3057114144860572} 

PEFT MODEL:{'rouge1': 0.3966762315764254, 'rouge2': 0.14731879414729326, 'rougeL': 0.3149559315358753, 'rougeLsum': 0.31460735226341496}


In [20]:
print("Percentage improvement of PEFT MODEL over ORIGINAL MODEL")

improvement = (np.array(list(peft_model_results.values())) - np.array(list(original_model_results.values())))
for key, value in zip(peft_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')
    
print("Percentage improvement of PEFT MODEL over INSTRUCT MODEL")

improvement = (np.array(list(peft_model_results.values())) - np.array(list(instruct_model_results.values())))
for key, value in zip(peft_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

Percentage improvement of PEFT MODEL over ORIGINAL MODEL
rouge1: 17.66%
rouge2: 8.71%
rougeL: 12.44%
rougeLsum: 12.45%
Percentage improvement of PEFT MODEL over INSTRUCT MODEL
rouge1: 1.27%
rouge2: 0.16%
rougeL: 0.92%
rougeLsum: 0.89%


# 7. Conclusion:

1. From above improvement values, there is no big difference in performance between instruct and peft model considering several other advantages(GPU memory and etc) of peft_model. Hence PEFT model will be our final model.

2. Most cases PEFT model gives good performance in comparison with Instruct model(little compromise in performance) with good amount of samples/datapoints(~1500).

In [None]:
### Note: There are some computational challenges like GPU and memory constriants may lead model to less accurate. 