<a href="https://colab.research.google.com/github/IshaanKetchup/ML-tools-and-techniques/blob/main/Fine%20Tuning%20LLMs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##PEFT - Parameter Efficient Fine Tuning

###Choosing which layers to fine tune

In [1]:
# Load pre-trained BERT model
from transformers import BertForSequenceClassification

model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=3)

# Step 1: Freeze all layers except the last one (classification head)
for param in model.base_model.parameters():
    param.requires_grad = False

# If you'd like to fine-tune additional layers (e.g., the last 2 layers), you can unfreeze those layers as well
for param in model.base_model.encoder.layer[-2:].parameters():
    param.requires_grad = True

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


###Set-up fine tuning with PEFT

Hugging Face provides pretrained models like BERT that are used in this example.

We use their Trainer and TrainingArguments classes to handle the fine-tuning process, which allows us to specify parameters such as the number of epochs, batch size, and datasets to use.

####Instructions for fine-tuning with PEFT
1. Freeze the layers of the model (as shown in the previous code block).
2. Set up the fine-tuning process using Hugging Face’s Trainer class and TrainingArguments, continuing from Step 1.
3. Fine-tune the model based on the trainer setup, which is also shown in this code block.

In [3]:
from transformers import Trainer, TrainingArguments

# Step 1: Set training arguments for fine-tuning the model
training_args = TrainingArguments(
    output_dir='./results',             # Directory where results will be stored
    num_train_epochs=3,                 # Number of epochs (full passes through the dataset)
    per_device_train_batch_size=16,     # Batch size per GPU/CPU during training
    eval_strategy="epoch",        # Evaluate the model at the end of each epoch
)

# Step 2: Fine-tune only the final classification head (since earlier layers were frozen)
trainer = Trainer(
    model=model,                        # Pre-trained BERT model with frozen layers
    args=training_args,                 # Training arguments
    train_dataset=train_data,           # Training data for fine-tuning
    eval_dataset=val_data,              # Validation data to evaluate performance during training
)

# Step 3: Train the model using PEFT (this performs PEFT because layers were frozen in Step 1)
trainer.train()

NameError: name 'train_data' is not defined

###Monitor and Evaluate Performance

In [None]:
# Evaluate the model
results = trainer.evaluate(eval_dataset=test_data)
print(f"Test Accuracy: {results['eval_accuracy']}")

###Optimize PEFT for your task

In [None]:
# Example of adjusting learning rate for PEFT optimization
training_args = TrainingArguments(
    output_dir='./results',
    learning_rate=5e-5,  # Experiment with different learning rates
    num_train_epochs=5,
    per_device_train_batch_size=16,
)

##Low Rank Adaptation LoRA
Traditional fine-tuning methods require adjusting all the parameters in a model, which is resource-intensive, especially for large transformer-based models like BERT, RoBERTa, and GPT. As models grow larger, the computational and memory costs of full fine-tuning increase substantially. LoRA addresses these challenges by applying low-rank adaptations within specific layers, focusing on fine-tuning only a subset of parameters that represent a low-rank approximation of the original model's weight matrices.

The benefits of LoRA:
* Reduced Memory Usage
* Lower Computational Cost
* Faster Training and Experimentation

####Explanation:
* print(name): prints each model component to help locate the attention layers where LoRA can be applied.
* module.apply(LoRALayer): applies the LoRA modification to the identified attention layers.
* param.requires_grad = False: ensures all other parameters remain frozen, meaning only LoRA-modified layers will be fine-tuned.

In [None]:
from lora import LoRALayer
from transformers import BertForSequenceClassification

# Load a pre-trained BERT model for classification tasks
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=3)

# Print model layers to identify attention layers where LoRA can be applied
for name, module in model.named_modules():
    print(name)  # This output helps locate attention layers

# Apply LoRA to attention layers
for name, module in model.named_modules():
    if 'attention' in name:
        module.apply(LoRALayer)

# Freeze other layers to update only LoRA-modified parameters
for param in model.base_model.parameters():
    param.requires_grad = False

####Fine tune with LoRA

In [None]:
from transformers import Trainer, TrainingArguments

# Configure training parameters
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=16,
    evaluation_strategy="epoch",
)

# Set up the Trainer to handle fine-tuning
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_data,
    eval_dataset=val_data,
)

# Begin training
trainer.train()

####Evaluate

In [None]:
# Evaluate the LoRA fine-tuned model on the test set
results = trainer.evaluate(eval_dataset=test_data)
print(f"Test Accuracy: {results['eval_accuracy']}")

####Optimize LoRA

In [None]:
# Example of adjusting the rank in LoRA
from lora import adjust_lora_rank

# Set a lower rank for fine-tuning, experiment with values for optimal performance
adjust_lora_rank(model, rank=2)

##Quantized LoRA
QLoRA enhances the fine-tuning process by applying quantization, which reduces the precision of the model's weights (e.g., from 32-bit to 8-bit or even 4-bit), lowering the memory and computational requirements. Quantizing a model involves approximating the model's weight values to lower-precision numbers, significantly reducing the memory footprint while preserving much of the model's performance. This makes fine-tuning feasible on smaller hardware such as consumer graphics processing units (GPUs).

In this example, the pretrained GPT-2 model is quantized to 8 bits, drastically reducing its memory requirements. LoRA is then applied to specific layers, such as attention heads, to ensure that only a small subset of parameters is fine-tuned.

In [None]:
from transformers import GPT2ForSequenceClassification
from qlora import QuantizeModel, LoRALayer

# Load the pre-trained GPT-2 model
model = GPT2ForSequenceClassification.from_pretrained('gpt2')

# Quantize the model
quantized_model = QuantizeModel(model, bits=8)

# Apply LoRA to specific layers (e.g., attention layers)
for name, module in quantized_model.named_modules():
    if 'attention' in name:
        module.apply(LoRALayer)

In [None]:
from transformers import Trainer, TrainingArguments

# Set up training arguments
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=16,
    evaluation_strategy="epoch",
)

# Fine-tune the QLoRA-enhanced model
trainer = Trainer(
    model=quantized_model,
    args=training_args,
    train_dataset=train_data,
    eval_dataset=val_data,
)

# Train the model
trainer.train()

In [None]:
# Evaluate the model on the test set
results = trainer.evaluate(eval_dataset=test_data)
print(f"Test Accuracy: {results['eval_accuracy']}")

In [None]:
from qlora import adjust_qlora_rank

# Adjust the rank of the low-rank matrices
adjust_qlora_rank(quantized_model, rank=4)  # Experiment with different rank values