# Lab 2: Adapter Layers - Fine-Tuning a BERT Model for Classification
---
## Notebook 2: The Training Process

**Goal:** In this notebook, you will fine-tune a `bert-base-uncased` model on a sequence classification task (predicting if two sentences are paraphrases) using **Adapter Layers**.

**You will learn to:**
-   Load a dataset for sequence classification and preprocess it with a tokenizer.
-   Load a pre-trained BERT model intended for `SequenceClassification`.
-   Deeply understand and configure `peft.AdapterConfig`.
-   Apply the Adapter configuration to the base model.
-   Fine-tune the model by training *only* the adapter weights using the `transformers.Trainer`.


### Step 1: Load Dataset and Preprocess

First, we'll load the GLUE MRPC (Microsoft Research Paraphrase Corpus) dataset. Each item in the dataset consists of two sentences and a label indicating whether they are semantically equivalent.

#### Key Hugging Face Components:

-   `datasets.load_dataset`: Fetches a dataset from the Hugging Face Hub.
-   `transformers.AutoTokenizer`: Loads the appropriate tokenizer for our `bert-base-uncased` model.
-   `tokenizer()`: The tokenizer will convert the sentence pairs into the format BERT expects: `[CLS] sentence1 [SEP] sentence2 [SEP]`, along with `token_type_ids` and an `attention_mask`.
-   `dataset.map()`: A powerful method to apply a processing function to every example in the dataset. We use `batched=True` for efficient processing.


In [None]:
from datasets import load_dataset
from transformers import AutoTokenizer

# --- Load Dataset ---
dataset = load_dataset("glue", "mrpc")
model_checkpoint = "bert-base-uncased"

# --- Load Tokenizer ---
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)

# --- Preprocessing Function ---
def preprocess_function(examples):
    return tokenizer(examples["sentence1"], examples["sentence2"], truncation=True, padding="max_length")

# --- Apply Preprocessing ---
encoded_dataset = dataset.map(preprocess_function, batched=True)

# The Trainer expects columns named 'labels', but the dataset has 'label'. Let's rename it.
encoded_dataset = encoded_dataset.rename_column("label", "labels")

# We only need a few columns for training.
encoded_dataset.set_format("torch", columns=["input_ids", "attention_mask", "token_type_ids", "labels"])

print("✅ Dataset loaded and preprocessed.")
print(encoded_dataset["train"][0])


### Step 2: Load the Base Model

Next, we load the `bert-base-uncased` model. Since this is a classification task, we use `AutoModelForSequenceClassification`.

#### Key Hugging Face Component:
- `transformers.AutoModelForSequenceClassification`: This class automatically loads a pre-trained model with a classification head on top.
    - `num_labels`: We need to tell the model how many classes we are predicting. In the MRPC dataset, there are two labels (0 for not a paraphrase, 1 for is a paraphrase).


In [None]:
from transformers import AutoModelForSequenceClassification

num_labels = 2
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)

print("✅ Base BERT model loaded.")
# print(model) # Uncomment to see the model architecture


### Step 3: Configure Adapters

Here, we'll use the `peft` library to configure our Adapter layers. Unlike LoRA, which modifies existing weights via reparameterization, Adapters add new layers to the model.

#### Key Hugging Face `peft` Components:

-   `peft.TaskType.SEQ_CLS`: We explicitly tell `peft` that this is a Sequence Classification task.
-   `peft.get_peft_model`: This function works for all PEFT methods. It takes the base model and the configuration, and returns the modified `PeftModel`.
-   `peft.AdapterConfig`: This is the specific config for classic Adapter layers (also known as Houlsby Adapters).
    - `mh_adapter`: Whether to add an adapter layer to the multi-head attention block.
    - `output_adapter`: Whether to add an adapter layer after the feed-forward block.
    - `reduction_factor`: This is the most important parameter. It controls the bottleneck dimension of the adapter. The bottleneck size will be `d_model / reduction_factor`. So a larger reduction factor means fewer trainable parameters.
    - `non_linearity`: The activation function to use in the adapter, e.g., "relu" or "gelu".


In [None]:
from peft import get_peft_model, AdapterConfig, TaskType

# --- Adapter Configuration ---
adapter_config = AdapterConfig(
    task_type=TaskType.SEQ_CLS,
    mh_adapter=True,
    output_adapter=True,
    reduction_factor=16,
    non_linearity="relu"
)

# --- Create PeftModel ---
peft_model = get_peft_model(model, adapter_config)

# --- Print Trainable Parameters ---
peft_model.print_trainable_parameters()


### Step 4: Set Up Training and Evaluation

The final step is to configure the training process. This is very similar to the LoRA lab, but we'll add an evaluation step to see how well our model is learning.

#### Key Hugging Face Components:

-   `transformers.TrainingArguments`: We configure this as before, but add:
    -   `evaluation_strategy="epoch"`: Tells the `Trainer` to run an evaluation at the end of each epoch.
-   `compute_metrics`: We'll define a function to calculate evaluation metrics (accuracy and F1 score) and pass it to the `Trainer`.
-   `transformers.Trainer`: We instantiate the trainer with the model, arguments, datasets, tokenizer, and our new metrics function.


In [None]:
import numpy as np
from transformers import TrainingArguments, Trainer
import datasets as nlp_datasets

# --- Metrics Calculation Function ---
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    
    # Load the metric scripts
    accuracy_metric = nlp_datasets.load_metric("accuracy")
    f1_metric = nlp_datasets.load_metric("f1")
    
    # Calculate metrics
    accuracy = accuracy_metric.compute(predictions=predictions, references=labels)
    f1 = f1_metric.compute(predictions=predictions, references=labels)
    
    return {
        "accuracy": accuracy["accuracy"],
        "f1": f1["f1"],
    }

# --- Training Arguments ---
training_args = TrainingArguments(
    output_dir="./bert-adapters-mrpc",
    learning_rate=1e-3, # Adapters often use a higher learning rate than full fine-tuning
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

# --- Create Trainer ---
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset["validation"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)

# --- Start Training ---
print("🚀 Starting training with Adapters...")
trainer.train()
print("✅ Training complete!")


### Step 3: Configure Adapters

Here, we'll use the `peft` library to configure our Adapter layers. Unlike LoRA, which modifies existing weights via reparameterization, Adapters add new layers to the model.

#### Key Hugging Face `peft` Components:

-   `peft.PeftConfig`: This is the base class for PEFT configurations. We will use a specific variant for Adapters.
-   `peft.TaskType.SEQ_CLS`: We explicitly tell `peft` that this is a Sequence Classification task.
-   `peft.get_peft_model`: This function works for all PEFT methods. It takes the base model and the configuration, and returns the modified `PeftModel`.

For this specific case, the `peft` library has a more direct configuration using `peft.AdaptionPromptConfig` that can be used for Adapter layers, but we'll use a general `PeftConfig` to show the common workflow. However, the current recommended way for adapters in `peft` is to use the `AdapterConfig`. Let's use that.

-   `peft.AdapterConfig`: This is the specific config for classic Adapter layers (also known as Houlsby Adapters).
    - `adapter_len`: The bottleneck dimension of the adapter layers. This is the equivalent of LoRA's rank `r`.
    - `adapter_layers`: The number of adapter layers to insert.
