# Lab 5: (IA)³ - Fine-Tuning a GPT-2 Model with Extreme Parameter Efficiency
---
## Notebook 2: The Training Process

**Goal:** In this notebook, you will fine-tune a `gpt2` model to generate positive movie reviews using **(IA)³**. You will see firsthand how few parameters are needed for this method to be effective.

**You will learn to:**
-   Load and prepare the `imdb` dataset.
-   Load a pre-trained GPT-2 model.
-   Deeply understand and configure `peft.IA3Config`.
-   Apply (IA)³ scaling vectors to the model.
-   Fine-tune the model by training *only* the (IA)³ vectors.


### Step 1: Load Dataset and Preprocess

This step is identical to the Prefix Tuning lab. We will load the `imdb` dataset, filter for positive reviews, and tokenize the text.


In [None]:
from datasets import load_dataset
from transformers import AutoTokenizer

model_checkpoint = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token

# --- Load and Filter Dataset ---
dataset = load_dataset("imdb", split="train[:500]")
positive_dataset = dataset.filter(lambda example: example["label"] == 1)
positive_dataset = positive_dataset.train_test_split(test_size=0.1)

# --- Preprocessing Function ---
def preprocess_function(examples):
    outputs = tokenizer(examples["text"], truncation=True, padding="max_length", max_length=256)
    outputs["labels"] = outputs["input_ids"]
    return outputs

# --- Apply Preprocessing ---
tokenized_datasets = positive_dataset.map(preprocess_function, batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(["text", "label"])
tokenized_datasets.set_format("torch")

print("✅ Dataset loaded and preprocessed.")


### Step 2: Load the Base Model

We load the standard `gpt2` model for causal language modeling.


In [None]:
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(model_checkpoint)

print("✅ Base GPT-2 model loaded.")


### Step 3: Configure (IA)³

Here we configure (IA)³. Instead of adding new layers or reparameterizing weight matrices, (IA)³ learns vectors that scale the internal activations of the model. This is an extremely lightweight approach.

#### Key Hugging Face `peft` Components:

-   `peft.IA3Config`: The configuration class for this method.
    -   `task_type="CAUSAL_LM"`: We specify the task type.
    -   `target_modules`: This is crucial for (IA)³. The original paper found that the most effective approach is to apply the scaling vectors to the **key** and **value** projections in the attention layers, and to the **feed-forward** network. For a `gpt2` model in `transformers`, these are typically named `c_attn` (which handles Q, K, and V) and `mlp.c_proj` or similar. We will target these.
    -   `feedforward_modules`: This tells `peft` which modules are part of the feed-forward network, which is necessary for (IA)³ to work correctly. For GPT-2, this is `mlp.c_proj`.
-   `peft.get_peft_model`: Applies the configuration.


In [None]:
from peft import get_peft_model, IA3Config, TaskType

# --- (IA)³ Configuration ---
ia3_config = IA3Config(
    task_type=TaskType.CAUSAL_LM,
    target_modules=["c_attn", "mlp.c_proj"],
    feedforward_modules=["mlp.c_proj"],
)

# --- Create PeftModel ---
peft_model = get_peft_model(model, ia3_config)

# --- Print Trainable Parameters ---
# Notice how incredibly small the number of trainable parameters is!
peft_model.print_trainable_parameters()


### Step 4: Set Up Training

The final step is to configure and run the training process. This is identical to the setup in the Prefix Tuning lab.


In [None]:
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling

# --- Training Arguments ---
training_args = TrainingArguments(
    output_dir="./gpt2-ia3-imdb",
    auto_find_batch_size=True,
    learning_rate=1e-3, # A higher learning rate can be effective for these lightweight methods
    num_train_epochs=5,
    logging_steps=50,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

# --- Create Trainer ---
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False),
)

# --- Start Training ---
print("🚀 Starting training with (IA)³...")
trainer.train()
print("✅ Training complete!")
