# Lightweight Fine-Tuning Project

TODO:  describe choices 

* PEFT technique: LoRA
* Model: gpt2
* Evaluation approach: Huggingface Evaluate
* Fine-tuning dataset:imdb

## Loading and Evaluating a Foundation Model

TODO: In the cells below, we load chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [2]:
#!pip install scikit-learn
#from transformers import AutoModelForCausalLM

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling,DataCollatorWithPadding

from datasets import load_dataset

import numpy as np
from sklearn.metrics import accuracy_score, precision_recall_fscore_support



In [3]:
ds = load_dataset("imdb")

Downloading readme:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

Downloading data: 100%|██████████| 21.0M/21.0M [00:01<00:00, 20.5MB/s]
Downloading data: 100%|██████████| 20.5M/20.5M [00:00<00:00, 28.2MB/s]
Downloading data: 100%|██████████| 42.0M/42.0M [00:01<00:00, 31.9MB/s]


Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

In [51]:
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token  # GPT-2 has no pad token by default



In [52]:

model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)


model.config.pad_token_id = tokenizer.eos_token_id

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [6]:
def tokenize_function(example):
    return tokenizer(example["text"], padding = "max_length", truncation = True, max_length = 512)

In [7]:
from transformers import DataCollatorWithPadding 
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)


In [8]:
ds_train = ds["train"]
ds_test = ds["test"]

In [9]:
tokenized_test = ds_test.map(tokenize_function, batched = True)

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

In [10]:
tokenized_test = tokenized_test.rename_column("label", "labels")  # Trainer expects "labels"

In [11]:
tokenized_test

Dataset({
    features: ['text', 'labels', 'input_ids', 'attention_mask'],
    num_rows: 25000
})

In [12]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, predictions, average='binary')
    acc = accuracy_score(labels, predictions)
    return {
        'accuracy': acc,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }


In [13]:
training_args = TrainingArguments(
    output_dir="./results",
    per_device_eval_batch_size=16,
    report_to="none"
)

# Create Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
)

In [14]:
eval_results = trainer.evaluate(eval_dataset=tokenized_test)


You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


In [15]:
print("Base GPT-2 performance without fine-tuning:")
print(f"Accuracy: {eval_results['eval_accuracy']:.4f}")
print(f"Precision: {eval_results['eval_precision']:.4f}")
print(f"Recall: {eval_results['eval_recall']:.4f}")
print(f"F1 Score: {eval_results['eval_f1']:.4f}")

Base GPT-2 performance without fine-tuning:
Accuracy: 0.5004
Precision: 0.5625
Recall: 0.0036
F1 Score: 0.0072


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, we create a PEFT model from loaded model, run a training loop, and save the PEFT model weights.

In [16]:
from peft import LoraConfig,TaskType
lora_config = LoraConfig(
    r=8,                       # Rank
    lora_alpha=32,             # Alpha parameter for LoRA scaling
    target_modules=["c_attn", "c_proj"],  # Which modules to apply LoRA
    lora_dropout=0.1,          # Dropout probability for LoRA layers
    bias="none",               # Bias configuration
    task_type=TaskType.SEQ_CLS # Sequence classification task
)


In [17]:
from peft import get_peft_model
lora_model = get_peft_model(model, lora_config)

print("Trainable parameters: ", lora_model.print_trainable_parameters)




Trainable parameters:  <bound method PeftModel.print_trainable_parameters of PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): GPT2ForSequenceClassification(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): Linear(
                in_features=768, out_features=2304, bias=True
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=2304, bias=False)
                )
       

In [18]:
#ds_train = ds["train"]
tokenized_train = ds_train.map(tokenize_function, batched = True)
tokenized_train = tokenized_train.rename_column("label", "labels")  # Trainer expects "labels"

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

In [19]:
training_args = TrainingArguments(
    output_dir="./results/gpt2-lora-imdb",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    gradient_accumulation_steps=4,
    num_train_epochs=1,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    fp16=True
)

trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_test,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)

In [42]:
import torch
import gc
gc.collect()
torch.cuda.empty_cache()

In [21]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
0,1.4707,0.425818,0.82544,0.809099,0.892664,0.73984


TrainOutput(global_step=781, training_loss=1.0957841665582986, metrics={'train_runtime': 1971.7203, 'train_samples_per_second': 12.679, 'train_steps_per_second': 0.396, 'total_flos': 6592711800913920.0, 'train_loss': 1.0957841665582986, 'epoch': 1.0})

In [77]:
import torch

if torch.cuda.is_available():
    print(f"GPU is available: {torch.cuda.get_device_name(0)}")
    print(f"Current CUDA device index: {torch.cuda.current_device()}")
else:
    print("GPU is not available. Training will run on CPU.")


GPU is available: Tesla T4
Current CUDA device index: 0


In [37]:
lora_model.save_pretrained("lora-gpt")


###  ⚠️ IMPORTANT ⚠️
HINTS!
Due to cloud workspace storage constraints, you should not store the model weights in the same directory but rather use `/tmp` to avoid workspace crashes which are irrecoverable.
Ensure to save it in /tmp always.

In [45]:
# Saving the model
#lora_model.save("/tmp/your_model_name")
lora_model.save_pretrained("/tmp/lora-gpt")


## Performing Inference with a PEFT Model

TODO: In the cells below, we load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Comparing the results to the results from prior to fine-tuning.

In [43]:
from peft import AutoPeftModelForSequenceClassification

In [47]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda


In [48]:
adapter_path = "/tmp/lora-gpt"
base_model_name = "gpt2"


In [50]:
print(f"Loading PEFT model from: {adapter_path}")
loaded_lora_model = AutoPeftModelForSequenceClassification.from_pretrained(adapter_path)
loaded_lora_model.to(device) # Move model to GPU if available
print("Model loaded successfully.")

Loading PEFT model from: /tmp/lora-gpt


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model loaded successfully.


In [57]:

eval_args = TrainingArguments(
    output_dir="./results/lora_eval", 
    per_device_eval_batch_size=1,  
    report_to="none"               
)

lora_eval_trainer = Trainer(
    model=loaded_lora_model,        
    args=eval_args,
    compute_metrics=compute_metrics, 
    tokenizer=tokenizer            
)


In [58]:
print("\nEvaluating the fine-tuned LoRA model...")
lora_eval_results = lora_eval_trainer.evaluate(eval_dataset=tokenized_test) # Your test data from cell [10]



Evaluating the fine-tuned LoRA model...


In [66]:
print("Base GPT-2 performance without fine-tuning:")
print(f"Accuracy: {eval_results['eval_accuracy']:.4f}")
print(f"Precision: {eval_results['eval_precision']:.4f}")
print(f"Recall: {eval_results['eval_recall']:.4f}")
print(f"F1 Score: {eval_results['eval_f1']:.4f}")

print("\n\n\n")

print("LoRA GPT2 performance with PEFT:")
print(f"Accuracy: {lora_eval_results['eval_accuracy']:.4f}")
print(f"Precision: {lora_eval_results['eval_precision']:.4f}")
print(f"Recall: {lora_eval_results['eval_recall']:.4f}")
print(f"F1 Score: {lora_eval_results['eval_f1']:.4f}")

Base GPT-2 performance without fine-tuning:
Accuracy: 0.5004
Precision: 0.5625
Recall: 0.0036
F1 Score: 0.0072




LoRA GPT2 performance with PEFT:
Accuracy: 0.5249
Precision: 0.8228
Recall: 0.0635
F1 Score: 0.1179
