<a href="https://colab.research.google.com/github/apa017/hugging-face-learn/blob/main/04_FineTuning_PEFT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Parameter Efficient Fine Tuning (PEFT)

In this notebook we will perform fine-tuning on a BART model to perform summarization tasks.

In our case we will perform **LoRA (Low Rank Adaptation fine-tuning)**, which is a form of **parameter-efficient fine-tuning**.

To do that we use the `PEFT` module from HuggingFace's `transformer` class.

<br>

### WARNING

Online tools like [Google Colab](https://colab.research.google.com/) allow for use of GPU over CPU.

Running a fine-tuning locally (i.e. on CPU) requires lot of fine and is computationally intensive.

For this reason it is recommended to execute this notebook on Cloud or having provided GPU.

<hr>

## Notebook Settings

Install the following packages on this runtime.

In [None]:
!pip install transformers datasets evaluate transformers[torch]
!pip install py7zr
!pip install peft    # this is the class we will be using to perform finetuning



## Load Model & Dataset

In this notebook we will use our finetuned model, but any other pretrained model can be used.

In [None]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from datasets import load_dataset

In [None]:
# load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("Kain17/bart-cnn-samsum-finetuned")
model = AutoModelForSeq2SeqLM.from_pretrained("Kain17/bart-cnn-samsum-finetuned")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [None]:
# load dataset
dataset = load_dataset("samsum")
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary'],
        num_rows: 14732
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary'],
        num_rows: 819
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary'],
        num_rows: 818
    })
})

## Prepare Dataset

We prepare the dataset for ingestion.

In [None]:
# helper function
def tokenize_inputs(example):

  ## Create a structure for the prompt
  start_prompt = "Summarize the following conversation:\n\n"
  end_prompt = "\n\nSummary: "

  ## Cpnstruct the prompt
  prompt = [start_prompt + dialogue + end_prompt for dialogue in example["dialogue"]]

  ## Tokenize the prompt
  example['input_ids'] = tokenizer(prompt,
                                   padding="max_length",
                                   truncation=True,
                                   return_tensors="pt",
                                   max_length=1024).input_ids

  ## Tokenize the label
  example['labels'] = tokenizer(example["summary"],
                                padding="max_length",
                                truncation=True,
                                return_tensors="pt",
                                max_length=1024).input_ids

  ## return tokenized example
  return example



  ### EXECUTION AND SELECTION OF 100 EXAMPLES ###

# Set the padding token to be the same as the end-of-sequence (eos) token
# This ensures that padding is consistent with the tokenizer's handling of sequences.
tokenizer.pad_token = tokenizer.eos_token

# Apply the `tokenize_inputs` function to each dataset example.
# The `batched=True` argument ensures that the function is applied to batches of examples at a time (faster for large datasets).
tokenized_datasets = dataset.map(tokenize_inputs, batched=True)

# Remove unnecessary columns from the dataset, keeping only the tokenized data.
tokenized_datasets = tokenized_datasets.remove_columns(['id', 'dialogue', 'summary'])

# Filter the dataset to keep only every 100th example.
# The `with_indices=True` allows the lambda function to access both the example and its index.
# This results in a much smaller subset of the original dataset.
tokenized_datasets = tokenized_datasets.filter(lambda example, index: index % 100 == 0, with_indices=True)

In [None]:
# Check result
tokenized_datasets

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 148
    })
    test: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 9
    })
    validation: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 9
    })
})

## Create PEFT Model using LoRA

We will prepare the LoRA configuration for finetuning, and we will use `peft` to create a "peft model".

In [None]:
from peft import LoraConfig, get_peft_model, TaskType

In [None]:
# provide LoRA config

lora_config = LoraConfig(
    r=32,                             # rank of low matrices (small = less parameters trained)
    lora_alpha=32,                    # scaling factor: how much LoRA layers influence output (large = significant changes)
    lora_dropout=0.05,                # regularization (prevents overfitting). 0.05 == 5% dropout rate
    bias="none",                      # determines bias terms: can also be `lora_only` or `all` (see documentation)
    task_type=TaskType.SEQ_2_SEQ_LM   # language model task
    )

In [None]:
# initiate peft model using LoRA config
peft_model = get_peft_model(model, peft_config=lora_config)

peft_model

PeftModelForSeq2SeqLM(
  (base_model): LoraModel(
    (model): BartForConditionalGeneration(
      (model): BartModel(
        (shared): BartScaledWordEmbedding(50264, 1024, padding_idx=1)
        (encoder): BartEncoder(
          (embed_tokens): BartScaledWordEmbedding(50264, 1024, padding_idx=1)
          (embed_positions): BartLearnedPositionalEmbedding(1026, 1024)
          (layers): ModuleList(
            (0-11): 12 x BartEncoderLayer(
              (self_attn): BartSdpaAttention(
                (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
                (v_proj): lora.Linear(
                  (base_layer): Linear(in_features=1024, out_features=1024, bias=True)
                  (lora_dropout): ModuleDict(
                    (default): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (default): Linear(in_features=1024, out_features=32, bias=False)
                  )
                  (lora_B): 

## Initiate Training

We connect to the Hugging Face Hub and initiate the training.

Also here, much like full parameter fine-tuning, we use `Trainer` module from the `transformers` class.

In [None]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
from transformers import TrainingArguments, Trainer

In [None]:
# Set Training Arguments
training_args = TrainingArguments(
    output_dir = "./bart-cnn-samsum-peft", # local storing
    hub_model_id = "Kain17/bart-cnn-samsum-peft", # HF hub identifier
    learning_rate = 1e-5,
    num_train_epochs = 5,
    weight_decay=0.01,
    auto_find_batch_size=True,
    evaluation_strategy="epoch",
    logging_steps=10
)

# Initialize trainer for peft model
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"]
)



<br>

We can see how many parameters will be subject to training against the whole number of parameters:

In [None]:
# Check what are the trainable parameters for peft model
peft_model.print_trainable_parameters()

trainable params: 4,718,592 || all params: 411,009,024 || trainable%: 1.1481


In [None]:
# Train the peft model
trainer.train()

Epoch,Training Loss,Validation Loss
1,0.0636,0.105758
2,0.0704,0.105603
3,0.0754,0.105477
4,0.0571,0.105616
5,0.0633,0.105597


TrainOutput(global_step=185, training_loss=0.06710950345606417, metrics={'train_runtime': 559.7062, 'train_samples_per_second': 1.322, 'train_steps_per_second': 0.331, 'total_flos': 1625110767206400.0, 'train_loss': 0.06710950345606417, 'epoch': 5.0})

## Push model to HF Hub

In [None]:
trainer.push_to_hub()

events.out.tfevents.1727496215.1d977c67e930.1880.0:   0%|          | 0.00/11.8k [00:00<?, ?B/s]

events.out.tfevents.1727496268.1d977c67e930.1880.1:   0%|          | 0.00/12.5k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/18.9M [00:00<?, ?B/s]

events.out.tfevents.1727496418.1d977c67e930.8102.0:   0%|          | 0.00/12.0k [00:00<?, ?B/s]

Upload 6 LFS files:   0%|          | 0/6 [00:00<?, ?it/s]

events.out.tfevents.1727496485.1d977c67e930.8102.1:   0%|          | 0.00/17.3k [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/Kain17/bart-cnn-samsum-peft/commit/335cfd94518d1ca71ecda85e8912780b9e3741e2', commit_message='End of training', commit_description='', oid='335cfd94518d1ca71ecda85e8912780b9e3741e2', pr_url=None, pr_revision=None, pr_num=None)

## Test the fine-tuned model

In [None]:
from peft import PeftModel, PeftConfig

In [None]:
# helper function (summary generation)

def generate_summary(input, llm):
  prompt = f"""
  Summarize the following conversation:

  {sample}

  Summary:
  """
  # Tokenize Input
  input_ids = tokenizer(sample, return_tensors="pt")

  # Produce tokenized output
  tokenized_output = llm.generate(
      input_ids=input_ids["input_ids"],
      min_length=30,
      max_length=200
  )

  # Decode Tokenized output
  output = tokenizer.decode(tokenized_output[0], skip_special_tokens=True)

  return output

In [None]:
# Reload the model as peft model base
tokenizer = AutoTokenizer.from_pretrained("Kain17/bart-cnn-samsum-finetuned")
peft_model_base = AutoModelForSeq2SeqLM.from_pretrained("Kain17/bart-cnn-samsum-finetuned")

# Load peft model
loaded_peft_model = PeftModel.from_pretrained(
    peft_model_base,
    "Kain17/bart-cnn-samsum-peft",
    is_trainable=False
)

adapter_config.json:   0%|          | 0.00/661 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/18.9M [00:00<?, ?B/s]

In [None]:
sample = dataset['test'][0]['dialogue']
label = dataset['test'][0]['summary']

# Create output from sample
output = generate_summary(sample, llm=loaded_peft_model)


# print results
print("SAMPLE: ")
print(sample)
print('-'*50)
print("GENERATED OUTPUT: ")
print(output)
print('-'*50)
print('GROUND TRUTH: ')
print(label)


SAMPLE: 
Hannah: Hey, do you have Betty's number?
Amanda: Lemme check
Hannah: <file_gif>
Amanda: Sorry, can't find it.
Amanda: Ask Larry
Amanda: He called her last time we were at the park together
Hannah: I don't know him well
Hannah: <file_gif>
Amanda: Don't be shy, he's very nice
Hannah: If you say so..
Hannah: I'd rather you texted him
Amanda: Just text him 🙂
Hannah: Urgh.. Alright
Hannah: Bye
Amanda: Bye bye
--------------------------------------------------
GENERATED OUTPUT: 
Hannah asked Betty for Larry's number. Betty's number is Betty's. Hannah doesn't know Larry well enough to call him, so she texted him.
--------------------------------------------------
GROUND TRUTH: 
Hannah needs Betty's number but Amanda doesn't have it. She needs to contact Larry.


<br>

## Final Considerations

- The model trained with PEFT performs a bit better than the fully-trained model. For example, this model learned exactly who does what without second-guessing.
- On the other hand, the output is still not excellent. The fine-tuning should be repeating (for example by increasing the number of epochs)

<hr>

###### End of the Notebook