# T5 Model Fine-tuning with LoRA for Summarization

This notebook demonstrates how to fine-tune a T5 model with LoRA (Low-Rank Adaptation) for text summarization tasks. The process uses parameter-efficient fine-tuning to minimize computational requirements while achieving good results.

## What is LoRA?

LoRA is a technique that freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture. This significantly reduces the number of trainable parameters.

## Steps in this Notebook:
1. Install required dependencies
2. Load the base T5 model
3. Apply LoRA configuration
4. Load and preprocess the CNN/DailyMail dataset
5. Train the model
6. Save and export the fine-tuned model

In [None]:
#  Install Only Missing Dependencies
!pip install -q peft rouge-score
!pip install -q evaluate

# STEP 2: Import Libraries
import torch
from transformers import (
    AutoModelForSeq2SeqLM, AutoTokenizer, DataCollatorForSeq2Seq,
    TrainingArguments, Trainer
)
from datasets import load_dataset
from peft import LoraConfig, get_peft_model
import evaluate
import shutil

## Load Base Model & Apply LoRA Configuration

We'll use the google/flan-t5-small model as our base model, which is a good balance between quality and computational requirements.

In [None]:
# STEP 3: Load Base Model + Tokenizer
model_name = "google/flan-t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

#  STEP 4: Apply LoRA Config
lora_config = LoraConfig(
    r=16,                      # Rank of the update matrices
    lora_alpha=32,             # Scaling factor
    target_modules=["q", "v"], # Which modules to apply LoRA to
    lora_dropout=0.1,          # Dropout probability for LoRA layers
    bias="none",               # Add bias to LoRA layers
    task_type="SEQ_2_SEQ_LM"   # Task type
)
model = get_peft_model(model, lora_config)

# Print trainable parameters info
model.print_trainable_parameters()

## Load and Prepare Dataset

We'll use the CNN/DailyMail dataset, which contains news articles paired with multi-sentence summaries. For efficient training, we'll use a subset of 50,000 examples.

In [None]:
#  STEP 5: Load Dataset (subset for demo)
dataset = load_dataset("cnn_dailymail", "3.0.0", split="train[:50000]")
print(f"Dataset loaded: {len(dataset)} examples")
print("\nSample example:")
print(f"Article excerpt: {dataset[0]['article'][:200]}...")
print(f"Summary: {dataset[0]['highlights']}")

In [None]:
#  STEP 6: Preprocess Dataset
def preprocess(examples):
    inputs = examples["article"]
    targets = examples["highlights"]
    model_inputs = tokenizer(
        inputs, max_length=512, truncation=True
    )
    labels = tokenizer(
        targets, max_length=150, truncation=True
    )
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

tokenized_dataset = dataset.map(
    preprocess, batched=True, remove_columns=["article", "highlights", "id"]
)

#  STEP 7: Split train/test
split_dataset = tokenized_dataset.train_test_split(test_size=0.1)
train_ds = split_dataset['train']
eval_ds = split_dataset['test']

print(f"Training set: {len(train_ds)} examples")
print(f"Validation set: {len(eval_ds)} examples")

## Configure Training Parameters

Now we'll set up the training configuration and data collator.

In [None]:
#  STEP 8: Setup Data Collator
data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)

#  STEP 9: Training Arguments
output_dir = "./results"
training_args = TrainingArguments(
    output_dir=output_dir,
    eval_strategy="steps",      # Evaluation strategy: steps or epoch
    eval_steps=500,             # Evaluation steps  
    logging_steps=50,           # Logging steps
    learning_rate=3e-4,         # Learning rate     
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=8,
    num_train_epochs=4,         # Number of training epochs       
    weight_decay=0.01,          # Weight decay
    save_total_limit=1,         # Limit the total amount of checkpoints
    fp16=True,                  # Use mixed precision (if available)       
    save_strategy="epoch",      # Save strategy
    logging_dir="./logs",       # Directory for logs
    report_to="none"            # No reporting
)

## Train the Model

Now we're ready to set up the trainer and start training the model.

In [None]:
#  STEP 10: Trainer Setup
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_ds,
    eval_dataset=eval_ds,
    tokenizer=tokenizer,
    data_collator=data_collator
)

#  STEP 11: Train Model
print("Starting training...")
trainer.train()
print("Training completed!")

## Save and Export the Model

Finally, we'll save the fine-tuned model and prepare it for download.

In [None]:
#  STEP 12: Save Fine-Tuned Model + Tokenizer
save_path = "./finetuned_model"
model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)

print("✅ Model fine-tuned and saved to:", save_path)

#  STEP 13: Zip model for download (optional)
try:
    zip_path = "./finetuned_model.zip"
    shutil.make_archive("./finetuned_model", 'zip', save_path)
    print("✅ Model zipped successfully to:", zip_path)
except Exception as e:
    print(f"Could not create zip file: {e}")

## Test the Fine-tuned Model

Let's test our model on a sample article to see how well it performs.

In [None]:
# Load the fine-tuned model
from peft import PeftModel, PeftConfig

# Get a test article
test_article = dataset[10000]['article']
print("Original article excerpt:")
print(test_article[:500] + "...\n")

print("Original summary:")
print(dataset[10000]['highlights'] + "\n")

# Generate summary with our fine-tuned model
inputs = tokenizer(test_article, return_tensors="pt", max_length=512, truncation=True)
output = model.generate(
    input_ids=inputs["input_ids"], 
    max_new_tokens=150, 
    min_length=30,
    no_repeat_ngram_size=3
)
summary = tokenizer.decode(output[0], skip_special_tokens=True)

print("Generated summary:")
print(summary)

## Conclusion

In this notebook, we've:
1. Fine-tuned a T5 model using LoRA for summarization
2. Used the CNN/DailyMail dataset for training
3. Saved the model for later use

The fine-tuned model can now be integrated into the Smart Notes Summarizer project.