<a href="https://colab.research.google.com/github/SushmaMahankali/Fine-Tuning-LLMs-using-LoRA/blob/main/Fine_Tuning_LLMs_to_Write_Positive_Reviews_using_LoRA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Step 1: Setting Up the Environment

In [None]:
!pip install -q transformers datasets PEFT trl accelerate

I'm using:

transformers & datasets: For loading our base model and the IMDB data.

peft: The library that contains the functions to use LoRA.

accelerate: A helper for running this efficiently on our GPU.

Step 2: The “Before” Snapshot – How Does a Base Model Behave?

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from datasets import load_dataset

# The model we want to fine-tune
model_name = "gpt2"

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Set the padding token if it's not already set
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token # Use the end-of-sequence token as the padding token

# Load the model
model = AutoModelForCausalLM.from_pretrained(model_name)

# A prompt to test the model
prompt = "The movie started with a captivating scene that"

# Tokenize the input
inputs = tokenizer(prompt, return_tensors="pt")

# Generate a completion
# We're moving the model and inputs to the GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
inputs = inputs.to(device)

# Generate text
generate_ids = model.generate(inputs.input_ids, max_length=50)
response = tokenizer.decode(generate_ids[0], skip_special_tokens=True)

print("--- Base Model Response ---")
print(response)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


--- Base Model Response ---
The movie started with a captivating scene that was shot in the middle of the night. The scene was shot in the middle of the night, and the camera was on the ground. The scene was shot in the middle of the night, and the


Step 3: Data Preparation

In [None]:
# Load the IMDB dataset
dataset = load_dataset("imdb", split="train")

# Filter for only positive reviews (label 1)
positive_reviews = dataset.filter(lambda example: example["label"] == 1)

# To make this demo run quickly, let's just use a small subset of the data
small_dataset = positive_reviews.select(range(500)) # Using 500 examples for speed

# We need to format our examples into a single text string for the SFTTrainer
def format_review(example):
    # For this simple task, the text itself is our training data
    return {"text": "Review: " + example["text"] + " TL;DR: Positive."}

formatted_dataset = small_dataset.map(format_review)

Step 4: Installing LoRA

we will define our LoraConfig that tells the peft library how and where to inject its tiny adapter layers:

In [None]:
from peft import LoraConfig, get_peft_model

# Create the LoRA configuration
lora_config = LoraConfig(
    r=8,  # The rank of the update matrices. A small number is usually sufficient.
    lora_alpha=16, # A scaling factor. A good rule of thumb is to set this to 2*r.
    target_modules=["c_attn"], # The specific layers to adapt. For GPT-2, this is the attention layer.
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM"
)

# Wrap the base model with the PEFT model
peft_model = get_peft_model(model, lora_config)

# Let's see how many parameters we are actually training!
peft_model.print_trainable_parameters()

trainable params: 294,912 || all params: 124,734,720 || trainable%: 0.2364




Step 5: The Training Session



In [None]:
# Make sure to run the cells above to define `peft_model` and `tokenizer`

from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling

# Safety for training
peft_model.config.use_cache = False
tokenizer.padding_side = "right"
if tokenizer.pad_token_id is None:
    tokenizer.pad_token = tokenizer.eos_token
    peft_model.config.pad_token_id = tokenizer.pad_token_id

# Tokenize dataset
def tokenize_fn(batch):
    return tokenizer(
        batch["text"],
        truncation=True,
        padding="max_length",
        max_length=512,
    )

tokenized_ds = formatted_dataset.map(
    tokenize_fn,
    batched=True,
    remove_columns=formatted_dataset.column_names,
)

# Causal LM collator (no MLM)
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,
)

training_args = TrainingArguments(
    output_dir="./gpt2-imdb-finetune",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=2,
    learning_rate=2e-4,
    num_train_epochs=2,
    logging_steps=50,
    fp16=True,
    bf16=False,
    remove_unused_columns=False,
    # You can disable WANDB logging if not needed by setting report_to="none"
    # report_to="none",
)

trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=tokenized_ds,
    data_collator=data_collator,
)

print("Starting training...")
trainer.train()
print("Training complete!")

Starting training...


Step,Training Loss
50,3.8228
100,3.7402
150,3.7441
200,3.6697


Step,Training Loss
50,3.8228
100,3.7402
150,3.7441
200,3.6697
250,3.717


Training complete!


Step 6: The “After” Snapshot – Our Specialized Model

In [None]:
# Let's test the fine-tuned model with the same prompt
print("\n--- Fine-Tuned Model Response ---")

# The trainer wraps the model, so we use trainer.model
fine_tuned_model = trainer.model

# Generate text using the fine-tuned model
generate_ids = fine_tuned_model.generate(inputs.input_ids, max_length=50)
response = tokenizer.decode(generate_ids[0], skip_special_tokens=True)

print(response)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



--- Fine-Tuned Model Response ---
The movie started with a captivating scene that was shot in a dark room. The scene was shot in a dark room with a white background. The scene was shot in a dark room with a white background. The scene was shot in a dark room
