<a href="https://colab.research.google.com/github/Yazanjian/text-summarization/blob/master/PEFT_Fine_Tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Fine-Tune an LLM model for Dialogue Summarization

In this notebook, we will fine-tune a pre-trained generative model for the purpose of text summarization. The model we will be using is FLAN-T5, available on HugginFace, which already performs text summarization. We will be using Parameter Efficient Fine-Tuning (PEFT) approach. More specifically, we will be using Low Rank Adaptation (LoRA) to improve the inferences.

For model evaluation, ROUGE metrics will be utilized.

In [3]:
!pip install --upgrade huggingface-cli


Collecting huggingface-cli
  Downloading huggingface_cli-0.1-py3-none-any.whl (1.0 kB)
Installing collected packages: huggingface-cli
Successfully installed huggingface-cli-0.1


In [4]:
pip install datasets --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m542.0/542.0 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m401.2/401.2 kB[0m [31m13.8 MB/s[0m eta [36m0:00:00[0m
[?25h

In [5]:
pip install evaluate rouge_score loralib peft --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.1/199.1 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.6/302.6 kB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone


In [6]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
import torch
import time
import evaluate
import pandas as pd
import numpy as np

In [7]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Load the Dataset and LLM model.
In this section we are going to downlaod the DialogSum dataset and load the pre-trained model FLAN-T5.


In [8]:
huggingface_dataset = 'knkarthick/dialogsum'
dataset = load_dataset(huggingface_dataset)
dataset

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/4.65k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/442k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/12460 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/500 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1500 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
})

In [9]:
dataset['train'][0]['dialogue']

"#Person1#: Hi, Mr. Smith. I'm Doctor Hawkins. Why are you here today?\n#Person2#: I found it would be a good idea to get a check-up.\n#Person1#: Yes, well, you haven't had one for 5 years. You should have one every year.\n#Person2#: I know. I figure as long as there is nothing wrong, why go see the doctor?\n#Person1#: Well, the best way to avoid serious illnesses is to find out about them early. So try to come at least once a year for your own good.\n#Person2#: Ok.\n#Person1#: Let me see here. Your eyes and ears look fine. Take a deep breath, please. Do you smoke, Mr. Smith?\n#Person2#: Yes.\n#Person1#: Smoking is the leading cause of lung cancer and heart disease, you know. You really should quit.\n#Person2#: I've tried hundreds of times, but I just can't seem to kick the habit.\n#Person1#: Well, we have classes and some medications that might help. I'll give you more information before you leave.\n#Person2#: Ok, thanks doctor."

In [10]:
model_id = "google/flan-t5-base"
tokenizer = AutoTokenizer.from_pretrained(model_id)
original_model = AutoModelForSeq2SeqLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)




tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [11]:
def print_number_of_trainable_parameters(model):
  trainable_model_params = 0
  all_model_params = 0
  for _, param in model.named_parameters():
    all_model_params += param.numel()
    if param.requires_grad:
      trainable_model_params += param.numel()
  return(f"Trainable model params: {trainable_model_params}.\nAll model params: {all_model_params}.\nPercentage of trainable params is: {(float(trainable_model_params)/float(all_model_params))*100}%")

In [12]:
print(print_number_of_trainable_parameters(original_model))

Trainable model params: 247577856.
All model params: 247577856.
Percentage of trainable params is: 100.0%


In [13]:
def tokenize(sample):
  start_prompt = "Summarize the following conversation.\n\n"
  end_prompt = "\n\nSummary:"
  prompt = [start_prompt + dialogue + end_prompt for dialogue in sample['dialogue']]
  sample['input_ids'] = tokenizer(prompt, padding="max_length", truncation=True,return_tensors="pt").input_ids
  sample['labels'] =  tokenizer(sample['summary'], padding="max_length", truncation=True,return_tensors="pt").input_ids

  return sample

In [14]:
tokenized_dataset = dataset.map(tokenize, batched=True)
tokenized_dataset = tokenized_dataset.remove_columns(['id', 'dialogue', 'summary', 'topic'])

Map:   0%|          | 0/12460 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

Map:   0%|          | 0/1500 [00:00<?, ? examples/s]

In [15]:
dataset['validation']['summary'][0]

'#Person2# has trouble breathing. The doctor asks #Person2# about it and will send #Person2# to a pulmonary specialist.'

In [16]:
tokenizer.decode(tokenized_dataset['validation']['labels'][0], skip_special_tokens=True)

'#Person2# has trouble breathing. The doctor asks #Person2# about it and will send #Person2# to a pulmonary specialist.'

## Now we will be using PEFT for fine-tune the model.

Setup LoRA configurations and then get the PEFT model based on the origianl FLAN-T5 model.

In [17]:
from peft import LoraConfig, get_peft_model, TaskType

lora_config = LoraConfig(
    r=32,
    lora_alpha=32,
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_2_SEQ_LM #Flan-T5
)

In [None]:
peft_model = get_peft_model(original_model, lora_config)
print(print_number_of_trainable_parameters(peft_model))

Trainable model params: 3538944.
All model params: 251116800.
Percentage of trainable params is: 1.4092820552029972%


We can notice that only 1.4% of the original model's parameters wil be trained, which will reduce the needed time and resources significantly.

In [None]:
output_dir = f'/content/drive/MyDrive/PEFT-FLAN-T5/peft-dialogue-summary-training-{str(int(time.time()))}'


In [None]:
peft_training_args = TrainingArguments(
    output_dir=output_dir,
    auto_find_batch_size=True,
    learning_rate=1e-3,
    num_train_epochs=5,
    logging_steps=1,
    max_steps=100
)

peft_trainer = Trainer(
    model=peft_model,
    args=peft_training_args,
    train_dataset=tokenized_dataset['train']
)

max_steps is given, it will override any value given in num_train_epochs


In [None]:
peft_trainer.train()

Step,Training Loss
1,49.25
2,46.75
3,44.75
4,38.25
5,34.25
6,30.0
7,27.25
8,25.5
9,24.0
10,20.875


TrainOutput(global_step=100, training_loss=5.72296875, metrics={'train_runtime': 234.0494, 'train_samples_per_second': 3.418, 'train_steps_per_second': 0.427, 'total_flos': 556503190732800.0, 'train_loss': 5.72296875, 'epoch': 0.06418485237483953})

In [None]:
peft_trainer.model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)



('/content/drive/MyDrive/PEFT-FLAN-T5/peft-dialogue-summary-training-1715442013/tokenizer_config.json',
 '/content/drive/MyDrive/PEFT-FLAN-T5/peft-dialogue-summary-training-1715442013/special_tokens_map.json',
 '/content/drive/MyDrive/PEFT-FLAN-T5/peft-dialogue-summary-training-1715442013/spiece.model',
 '/content/drive/MyDrive/PEFT-FLAN-T5/peft-dialogue-summary-training-1715442013/added_tokens.json',
 '/content/drive/MyDrive/PEFT-FLAN-T5/peft-dialogue-summary-training-1715442013/tokenizer.json')

##Load the model after training.

Mst of the time we need to load the saved PEFT model to actually test it. This is exactly what we will be doing in the following cells.

In [18]:
from peft import PeftModel, PeftConfig

model_id = "google/flan-t5-base"
tokenizer = AutoTokenizer.from_pretrained(model_id)
peft_base_model = AutoModelForSeq2SeqLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)
output_dir = '/content/drive/MyDrive/PEFT-FLAN-T5/peft-dialogue-summary-training-1715442013'

peft_model_saved = PeftModel.from_pretrained(peft_base_model, output_dir,  torch_dtype=torch.bfloat16, is_trainable=False)

In [19]:
print(print_number_of_trainable_parameters(peft_model_saved))

Trainable model params: 0.
All model params: 251116800.
Percentage of trainable params is: 0.0%


## Evaluate the model.
In this section we are going to evaluate the model qualitatively and quantitatively using ROUGE.

### Human Evaluation

In [20]:
random_index = 50
dialogue = dataset['test'][random_index]['dialogue']
summary =  dataset['test'][random_index]['summary']

prompt = f""" Summarize the following conversation.

{dialogue}

Summary: """

input_ids = tokenizer(prompt, return_tensors="pt").input_ids

original_model_gen = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
original_model_txt = tokenizer.batch_decode(original_model_gen, skip_special_tokens=True)

peft_model_gen = peft_model_saved.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
peft_model_txt = tokenizer.batch_decode(peft_model_gen, skip_special_tokens=True)

In [21]:
print(f"Base Human Summary:\n{dataset['test'][random_index]['summary']}")
print("======================================================")
print(f"Baseline model summary:\n{original_model_txt}")
print("======================================================")
print(f"PEFT model summary:\n{peft_model_txt}")

Base Human Summary:
#Person1# is about to make a prank. #Person2# thinks it's cruel at first but then joins.
Baseline model summary:
['#Person1#: Okay.']
PEFT model summary:
["@Person1#: Yeah. Just pull on this strip. #Person1#: Yeah. But it's fun."]


## ROUGE Evaluation

In [33]:
dialogues = dataset['test'][0:10]['dialogue']
human_original_txts =  dataset['test'][0:10]['summary']

baseline_model_txts = []
tuned_model_txts = []

for i, dialogue in enumerate(dialogues):
  prompt = f""" Summarize the following conversation.

{dialogue}

Summary: """

  original_input_ids_ = tokenizer(prompt, return_tensors="pt").input_ids
  original_model_gen = original_model.generate(input_ids=original_input_ids_, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
  original_model_txt = tokenizer.batch_decode(original_model_gen, skip_special_tokens=True)

  tuned_input_ids = tokenizer(prompt, return_tensors="pt").input_ids
  peft_model_gen = peft_model_saved.generate(input_ids=tuned_input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
  peft_model_txt = tokenizer.batch_decode(peft_model_gen, skip_special_tokens=True)

  baseline_model_txts.append(original_model_txt[0])
  tuned_model_txts.append(peft_model_txt[0])




In [39]:
all_txts = list(zip(human_original_txts, baseline_model_txts, tuned_model_txts))

df = pd.DataFrame(all_txts, columns=['Human Summary', 'Original Model', 'PEFT Model'])
df

Unnamed: 0,Human Summary,Original Model,PEFT Model
0,Ms. Dawson helps #Person1# to write a memo to ...,#Person1#: I need to take a dictation for you.,"#Person1#: Ms. Dawson, I need you to take a di..."
1,In order to prevent employees from wasting tim...,#Person1#: I need to take a dictation for you.,"#Person1#: Ms. Dawson, I need you to take a di..."
2,Ms. Dawson takes a dictation for #Person1# abo...,#Person1#: I need to take a dictation for you.,"#Person1#: Ms. Dawson, I need you to take a di..."
3,#Person2# arrives late because of traffic jam....,The traffic jam at the Carrefour intersection ...,@Person1#: I got stuck in traffic again.
4,#Person2# decides to follow #Person1#'s sugges...,The traffic jam at the Carrefour intersection ...,@Person1#: I got stuck in traffic again.
5,#Person2# complains to #Person1# about the tra...,The traffic jam at the Carrefour intersection ...,@Person1#: I got stuck in traffic again.
6,#Person1# tells Kate that Masha and Hero get d...,Masha and Hero are getting divorced.,"#Person1#: Kate, you never believe what's happ..."
7,#Person1# tells Kate that Masha and Hero are g...,Masha and Hero are getting divorced.,"#Person1#: Kate, you never believe what's happ..."
8,#Person1# and Kate talk about the divorce betw...,Masha and Hero are getting divorced.,"#Person1#: Kate, you never believe what's happ..."
9,#Person1# and Brian are at the birthday party ...,"#Person1#: Happy birthday, Brian. #Person2#: I...","@Person1#: I'm so happy you remember, please c..."



Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.




Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.



In [36]:
rouge = evaluate.load('rouge')

original_model_results = rouge.compute(predictions=baseline_model_txts, references=human_original_txts, use_aggregator=True, use_stemmer=True)
tuned_model_results = rouge.compute(predictions=tuned_model_txts, references=human_original_txts, use_aggregator=True, use_stemmer=True)


print(f"Baseline model:\n{original_model_results}")
print("======================================================")
print(f"PEFT model:\n{tuned_model_results}")

Baseline model:
{'rouge1': 0.2434679951690821, 'rouge2': 0.11633304916169365, 'rougeL': 0.22016350798959497, 'rougeLsum': 0.21975466059705184}
PEFT model:
{'rouge1': 0.19334273834273832, 'rouge2': 0.03031727379553466, 'rougeL': 0.1794479506979507, 'rougeLsum': 0.17840170314308246}


Based on the previous results, we can notice that the current PEFT model is not performing as required and this is most likely due to lack of training.