# Level 2 Fine-Tuning: 3-Class Classification

This notebook fine-tunes the `microsoft/Phi-4-mini-instruct` model for a 3-class classification task. The goal is to train the model to distinguish between "Correct" (Label 0), "Conceptual Error" (Label 1), and "Computational Error" (Label 2) solutions.

Following the rigorous methodology of the Level 1 experiment, this notebook will conduct a two-part experiment:
1.  **Linear Probe Baseline**: Training only a classification head on top of the frozen, pre-trained model.
2.  **Full LoRA Fine-Tuning**: Training LoRA adapters and a new classification head simultaneously.

This notebook is streamlined and code-focused. For detailed explanations of the concepts and code blocks, please refer to the extensively annotated Level 1 fine-tuning notebook.

## 1. Environment Setup

### 1.1 Mount google drive
We begin by setting up the necessary environment in Google Colab. This includes mounting Google Drive for persistent storage, installing the required Python libraries for model training and data handling, and preparing our dataset.

## 1. Setup

In [1]:
# --- 1: Environment Setup ---

# Mount google drive
from google.colab import drive
drive.mount('/content/drive')

# Install Required Libraries

!pip install -Uq transformers
!pip install -Uq peft
!pip install -Uq trl
!pip install -Uq accelerate
!pip install -Uq datasets
!pip install -Uq bitsandbytes

!pip install flash-attn==2.7.4.post1 \
  --extra-index-url https://download.pytorch.org/whl/cu124 \
  --no-build-isolation

# Unzip the Level 2 dataset from Google Drive to the local Colab environment
# Note: Adjust the path if your ZIP file is located elsewhere.
!unzip -oq /content/drive/MyDrive/level-2-three-class.zip -d /content/
print("\nLevel 2 dataset successfully unzipped.")

Mounted at /content/drive
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m113.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m88.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m54.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m37.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━

In [4]:
# Unzip the Level 2 dataset from Google Drive to the local Colab environment
# Note: Adjust the path if your ZIP file is located elsewhere.
!unzip -oq /content/drive/MyDrive/level-2-three-class.zip -d /content/
print("\nLevel 2 dataset successfully unzipped.")


Level 2 dataset successfully unzipped.


## 2. Configuration

In [2]:
# --- 2: Project Configuration ---

class Config:
    # Model ID from Hugging Face Hub
    MODEL_ID = "microsoft/Phi-4-mini-instruct"

    # Local path to the unzipped Level 2 dataset
    DATASET_PATH = "/content/level-2-three-class"

    # Number of labels for the 3-class classification task
    NUM_LABELS = 3

## 3. Data Loading & Preprocessing

In [5]:
# --- 3: Data Loading & Preprocessing ---
from datasets import load_from_disk, concatenate_datasets, DatasetDict
from transformers import AutoTokenizer

# Load the raw dataset from disk
raw_dataset = load_from_disk(Config.DATASET_PATH)

# Load and configure the tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    Config.MODEL_ID,
    trust_remote_code=True
)
tokenizer.padding_side = "left"
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Define and apply the preprocessing function
def preprocess_function(examples):
    system_prompt = "Analyze the following mathematical problem and solution to determine if the solution is correct or flawed."
    input_texts = [
        f"{system_prompt}\n\n### Problem:\n{q}\n\n### Solution:\n{s}"
        for q, s in zip(examples["question"], examples["solution"])
    ]
    return tokenizer(
        input_texts,
        truncation=True,
        max_length=512,
        padding=False
    )

tokenized_dataset = raw_dataset.map(
    preprocess_function,
    batched=True,
    remove_columns=["question", "solution"]
)

# Combine training and validation splits
full_train_dataset = concatenate_datasets(
    [tokenized_dataset["train"], tokenized_dataset["validation"]]
)
final_dataset = DatasetDict({
    "train": full_train_dataset,
    "test": tokenized_dataset["test"]
})

print("--- Final Dataset for Training and Evaluation ---")
print(final_dataset)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/15.5M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/249 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/587 [00:00<?, ?B/s]

Map:   0%|          | 0/2426 [00:00<?, ? examples/s]

Map:   0%|          | 0/302 [00:00<?, ? examples/s]

Map:   0%|          | 0/305 [00:00<?, ? examples/s]

--- Final Dataset for Training and Evaluation ---
DatasetDict({
    train: Dataset({
        features: ['index', 'label', 'input_ids', 'attention_mask'],
        num_rows: 2728
    })
    test: Dataset({
        features: ['index', 'label', 'input_ids', 'attention_mask'],
        num_rows: 305
    })
})


## 4. Model Architecture

In [6]:
# --- 4: Define the Custom Classifier Class ---
import torch.nn as nn

class GPTSequenceClassifier(nn.Module):
    def __init__(self, base_model, num_labels):
        super().__init__()
        self.base = base_model
        hidden_size = base_model.config.hidden_size
        self.classifier = nn.Linear(hidden_size, num_labels, bias=True)
        self.num_labels = num_labels

    def forward(self, input_ids=None, attention_mask=None, labels=None, **kwargs):
        outputs = self.base(
            input_ids=input_ids,
            attention_mask=attention_mask,
            output_hidden_states=True,
            **kwargs,
            )
        last_hidden_state = outputs.hidden_states[-1]
        pooled_output = last_hidden_state[:, -1, :]
        logits = self.classifier(pooled_output)
        loss = None
        if labels is not None:
            loss = nn.functional.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))
        return {"loss": loss, "logits": logits} if loss is not None else {"logits": logits}

print("GPTSequenceClassifier class defined successfully.")

GPTSequenceClassifier class defined successfully.


## 5. Common Training Components

In [7]:
# --- 5: Define Common Training Components ---
import numpy as np
from transformers import TrainingArguments
from transformers.trainer_utils import EvalPrediction

# Define shared TrainingArguments
training_args = TrainingArguments(
    output_dir="/content/training_output_level2",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=4,
    optim="paged_adamw_8bit",
    learning_rate=2e-4,
    lr_scheduler_type="cosine",
    warmup_ratio=0.1,
    bf16=True,
    gradient_checkpointing=False,
    logging_strategy="steps",
    logging_steps=25,
    save_strategy="epoch",
    save_total_limit=1,
    report_to="none",
    save_safetensors=False,
)

# Define shared evaluation metric function
def compute_metrics(p: EvalPrediction):
    logits = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
    preds = np.argmax(logits, axis=1)
    return {"accuracy": (preds == p.label_ids).mean().item()}

print("Common TrainingArguments and compute_metrics function defined.")

Common TrainingArguments and compute_metrics function defined.


## 6. Experiment 1: Linear Probe Baseline

In [8]:
# --- 6.1 & 6.2: Define Frozen Backbone and Initialize Probe Trainer ---
import torch
import copy
from transformers import AutoModelForCausalLM, BitsAndBytesConfig, Trainer, DataCollatorWithPadding

# --- Define and Freeze Backbone ---
DTYPE = torch.bfloat16
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=DTYPE,
)

backbone_probe = AutoModelForCausalLM.from_pretrained(
    Config.MODEL_ID,
    quantization_config=quantization_config,
    device_map="auto",
    trust_remote_code=True,
    attn_implementation="flash_attention_2",
)

for param in backbone_probe.parameters():
    param.requires_grad = False
backbone_probe.config.pad_token_id = tokenizer.pad_token_id

# --- Initialize Trainer ---
model_probe = GPTSequenceClassifier(backbone_probe, Config.NUM_LABELS)

# Verify trainable parameters
print("--- Trainable Status for Linear Probe Model ---")
total_params = sum(p.numel() for p in model_probe.parameters())
trainable_params = sum(p.numel() for p in model_probe.parameters() if p.requires_grad)
print(f"Trainable params: {trainable_params:,} || All params: {total_params:,} || Trainable %: {100 * trainable_params / total_params:.4f}")

probe_training_args = copy.deepcopy(training_args)
probe_training_args.output_dir = "/content/training_output_level2/linear_probe"

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

trainer_probe = Trainer(
    model=model_probe,
    args=probe_training_args,
    train_dataset=final_dataset["train"],
    eval_dataset=final_dataset["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

print("\nTrainer for linear probing initialized successfully.")

config.json: 0.00B [00:00, ?B/s]

configuration_phi3.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-4-mini-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-4-mini-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.77G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

--- Trainable Status for Linear Probe Model ---
Trainable params: 9,219 || All params: 2,225,418,243 || Trainable %: 0.0004

Trainer for linear probing initialized successfully.


  trainer_probe = Trainer(


## 6.3 Train and Evaluate Baseline Model

In [9]:
# --- 6.3: Train the Classifier Head (Linear Probe) ---
print("--- Starting training for the linear probe baseline ---")
trainer_probe.train()
print("\n--- Linear probe training complete ---")

--- Starting training for the linear probe baseline ---


Step,Training Loss
25,4.533
50,4.4415
75,4.2827
100,4.2281
125,4.2564
150,4.2544
175,4.0058
200,4.0501
225,4.2366
250,4.0727



--- Linear probe training complete ---


In [10]:
# --- 6.4: Evaluate the Linear Probe Baseline ---
print("\n--- Evaluating the linear probe model on the test set ---")
probe_results = trainer_probe.evaluate()
print("\n--- Linear Probe Baseline Performance ---")
print(probe_results)


--- Evaluating the linear probe model on the test set ---



--- Linear Probe Baseline Performance ---
{'eval_loss': 1.0438300371170044, 'eval_accuracy': 0.4360655737704918, 'eval_runtime': 6.897, 'eval_samples_per_second': 44.222, 'eval_steps_per_second': 5.655, 'epoch': 3.0}


In [11]:
# --- 6.5: Save Baseline Model Predictions ---
import torch
import pandas as pd

print("\n--- Generating and saving baseline model predictions for the test set ---")
pred_outputs_probe = trainer_probe.predict(final_dataset["test"])
logits_probe = pred_outputs_probe.predictions[0] if isinstance(pred_outputs_probe.predictions, tuple) else pred_outputs_probe.predictions
probs_probe = torch.softmax(torch.tensor(logits_probe), dim=-1).numpy()

df_probe = pd.DataFrame(probs_probe, columns=[f"p(class={i})" for i in range(Config.NUM_LABELS)])
df_probe["index"] = final_dataset["test"]["index"]
df_probe["true_label"] = final_dataset["test"]["label"]

cols = ["index", "true_label"] + [c for c in df_probe.columns if c.startswith("p(")]
df_probe = df_probe[cols]

output_path = "/content/probe_baseline_predictions_level2.csv"
df_probe.to_csv(output_path, index=False)
print(f"Baseline predictions saved to {output_path}")


--- Generating and saving baseline model predictions for the test set ---


Baseline predictions saved to /content/probe_baseline_predictions_level2.csv


In [12]:
# delete unused variables
del backbone_probe
del model_probe
del trainer_probe
del probe_results
del pred_outputs_probe
del logits_probe
del probs_probe
del df_probe

In [13]:
# clear gpu cache
torch.cuda.empty_cache()

## 7. Experiment 2: Full LoRA Fine-Tuning

In [14]:
# --- 7.1 & 7.2: Define LoRA-Enabled Model and Initialize Trainer ---
from peft import LoraConfig, get_peft_model, TaskType

# --- Define LoRA-Enabled Model ---
backbone_lora = AutoModelForCausalLM.from_pretrained(
    Config.MODEL_ID,
    quantization_config=quantization_config,
    device_map="auto",
    trust_remote_code=True,
    attn_implementation="flash_attention_2",
)
backbone_lora.config.pad_token_id = tokenizer.pad_token_id

lora_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    target_modules="all-linear",
)

lora_backbone = get_peft_model(backbone_lora, lora_config)
model_lora = GPTSequenceClassifier(lora_backbone, Config.NUM_LABELS)

# Verify trainable parameters
print("--- Trainable Status for LoRA Fine-Tuning Model ---")
total_params = sum(p.numel() for p in model_lora.parameters())
trainable_params = sum(p.numel() for p in model_lora.parameters() if p.requires_grad)
print(f"Trainable params: {trainable_params:,} || All params: {total_params:,} || Trainable %: {100 * trainable_params / total_params:.4f}")


# --- Initialize Trainer ---
lora_training_args = copy.deepcopy(training_args)
lora_training_args.output_dir = "/content/training_output_level2/lora_finetune"

trainer_lora = Trainer(
    model=model_lora,
    args=lora_training_args,
    train_dataset=final_dataset["train"],
    eval_dataset=final_dataset["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

print("\nTrainer for full LoRA fine-tuning initialized successfully.")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

--- Trainable Status for LoRA Fine-Tuning Model ---
Trainable params: 23,077,891 || All params: 2,248,486,915 || Trainable %: 1.0264

Trainer for full LoRA fine-tuning initialized successfully.


  trainer_lora = Trainer(


In [15]:
# --- 7.3: Fine-Tune the LoRA Model ---
print("--- Starting full LoRA fine-tuning ---")
trainer_lora.train()
print("\n--- Full LoRA fine-tuning complete ---")

The input hidden states seems to be silently casted in float32, this might be related to the fact you have upcasted embedding or layer norm layers in float32. We will cast back the input in torch.bfloat16.


--- Starting full LoRA fine-tuning ---


Step,Training Loss
25,4.6142
50,4.2348
75,3.952
100,3.9349
125,3.2044
150,1.9341
175,1.7407
200,0.986
225,0.9819
250,0.9186



--- Full LoRA fine-tuning complete ---


In [16]:
# --- 7.4: Evaluate the Fine-Tuned LoRA Model ---
print("\n--- Evaluating the fine-tuned LoRA model on the test set ---")
lora_results = trainer_lora.evaluate()
print("\n--- LoRA Fine-Tuned Performance ---")
print(lora_results)


--- Evaluating the fine-tuned LoRA model on the test set ---



--- LoRA Fine-Tuned Performance ---
{'eval_loss': 0.39422768354415894, 'eval_accuracy': 0.8327868852459016, 'eval_runtime': 9.9002, 'eval_samples_per_second': 30.808, 'eval_steps_per_second': 3.939, 'epoch': 3.0}


In [17]:
# --- 7.5: Save Fine-Tuned Model Predictions ---
print("\n--- Generating and saving fine-tuned model predictions for the test set ---")
pred_outputs_lora = trainer_lora.predict(final_dataset["test"])
logits_lora = pred_outputs_lora.predictions[0] if isinstance(pred_outputs_lora.predictions, tuple) else pred_outputs_lora.predictions
probs_lora = torch.softmax(torch.tensor(logits_lora), dim=-1).numpy()

df_lora = pd.DataFrame(probs_lora, columns=[f"p(class={i})" for i in range(Config.NUM_LABELS)])
df_lora["index"] = final_dataset["test"]["index"]
df_lora["true_label"] = final_dataset["test"]["label"]

cols = ["index", "true_label"] + [c for c in df_lora.columns if c.startswith("p(")]
df_lora = df_lora[cols]

output_path = "/content/lora_finetuned_predictions_level2.csv"
df_lora.to_csv(output_path, index=False)
print(f"Fine-tuned predictions saved to {output_path}")


--- Generating and saving fine-tuned model predictions for the test set ---


Fine-tuned predictions saved to /content/lora_finetuned_predictions_level2.csv
