# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: LoRa
* Model: DistilBERT
* Evaluation approach: accuracy
* Fine-tuning dataset:  Amazon Polarity

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset
from torch.utils.data import Dataset

  from .autonotebook import tqdm as notebook_tqdm


In [2]:

# foundation model (DistilBERT) 
model_name = "distilbert-base-uncased"  
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

dataset = load_dataset("amazon_polarity", split="train[:500]")

# Tokenize the dataset
def tokenize_batch(batch):
    return tokenizer(batch["content"], padding=True, truncation=True)

tokenized_dataset = dataset.map(tokenize_batch, batched=True)

# Split the tokenized dataset into training and evaluation sets
split_ratio = 0.8
split_index = int(len(tokenized_dataset["input_ids"]) * split_ratio)

train_dataset = {
    "input_ids": tokenized_dataset["input_ids"][:split_index],
    "attention_mask": tokenized_dataset["attention_mask"][:split_index],
    "label": tokenized_dataset["label"][:split_index],
}
eval_dataset = {
    "input_ids": tokenized_dataset["input_ids"][split_index:],
    "attention_mask": tokenized_dataset["attention_mask"][split_index:],
    "label": tokenized_dataset["label"][split_index:],
}

tokenizer_config.json: 100%|██████████| 28.0/28.0 [00:00<00:00, 109kB/s]
config.json: 100%|██████████| 483/483 [00:00<00:00, 1.78MB/s]
vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 3.72MB/s]
tokenizer.json: 100%|██████████| 466k/466k [00:00<00:00, 6.51MB/s]
model.safetensors: 100%|██████████| 268M/268M [00:02<00:00, 114MB/s]  
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Downloading readme: 100%|██████████| 6.81k/6.81k [00:00<00:00, 6.60MB/s]
Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]
Downloading data:   0%|          | 0.00/117M [00:00<?, ?B/s][A
Downloading data:   4%|▎         | 4.19M/117M [00:00<00:06, 17.9MB/s][A
Downloading data:  11%|█         | 12.6M

In [3]:
# Convert datasets to PyTorch Dataset objects
class CustomDataset(Dataset):
    def __init__(self, input_ids, attention_mask, label):
        self.input_ids = input_ids
        self.attention_mask = attention_mask
        self.label = label

    def __len__(self):
        return len(self.input_ids)

    def __getitem__(self, idx):
        return {
            "input_ids": self.input_ids[idx],
            "attention_mask": self.attention_mask[idx],
            "label": self.label[idx],
        }

In [4]:
train_dataset = CustomDataset(**train_dataset)
eval_dataset = CustomDataset(**eval_dataset)

# Define the evaluation function for the Trainer
def compute_metrics(p):
    return {"accuracy": (p.predictions.argmax(axis=1) == p.label_ids).mean()}

# Trainer for foundation model evaluation
training_args_foundation = TrainingArguments(
    output_dir="./foundation_output",
    per_device_eval_batch_size=8,
)
trainer_foundation = Trainer(
    model=model,
    args=training_args_foundation,
    compute_metrics=compute_metrics,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)
foundation_results = trainer_foundation.evaluate()


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [5]:
# EFT model 
peft_model = AutoModelForSequenceClassification.from_pretrained(model_name)  

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [6]:
# training arguments for PEFT
training_args_peft = TrainingArguments(
    output_dir="./peft_output",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    num_train_epochs=5,
)

In [7]:
# Trainer for PEFT
trainer_peft = Trainer(
    model=peft_model,
    args=training_args_peft,
    compute_metrics=compute_metrics,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [8]:
trainer_peft.train()

# Evaluate the PEFT model 
peft_results = trainer_peft.evaluate()

# Compare results with the foundation model's performance
print("Foundation Model Results:", foundation_results['eval_accuracy'])
print("PEFT Model Results:", peft_results['eval_accuracy'])


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.376335,0.87
2,No log,0.6559,0.76
3,No log,0.373182,0.88
4,No log,0.412556,0.88
5,No log,0.422427,0.87


Foundation Model Results: 0.71
PEFT Model Results: 0.87


In [14]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset
from torch.utils.data import Dataset
from peft import LoraConfig, get_peft_model

# Choose your foundation model (e.g., DistilBERT) and load it
model_name = "distilbert-base-uncased"  # Change this to your chosen model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Load an appropriate dataset for sequence classification from Hugging Face datasets library
# (e.g., Amazon Customer Reviews as mentioned earlier)
# Make sure it is small enough for the Udacity Workspace
# You can load the dataset using the datasets library
dataset = load_dataset("amazon_polarity", split="train[:500]")

# Tokenize the dataset
def tokenize_batch(batch):
    return tokenizer(batch["content"], padding=True, truncation=True)

tokenized_dataset = dataset.map(tokenize_batch, batched=True)

# Split the tokenized dataset into training and evaluation sets
split_ratio = 0.8
split_index = int(len(tokenized_dataset["input_ids"]) * split_ratio)

train_dataset = {
    "input_ids": tokenized_dataset["input_ids"][:split_index],
    "attention_mask": tokenized_dataset["attention_mask"][:split_index],
    "label": tokenized_dataset["label"][:split_index],
}
eval_dataset = {
    "input_ids": tokenized_dataset["input_ids"][split_index:],
    "attention_mask": tokenized_dataset["attention_mask"][split_index:],
    "label": tokenized_dataset["label"][split_index:],
}

# Convert datasets to PyTorch Dataset objects
class CustomDataset(Dataset):
    def __init__(self, input_ids, attention_mask, label):
        self.input_ids = input_ids
        self.attention_mask = attention_mask
        self.label = label

    def __len__(self):
        return len(self.input_ids)

    def __getitem__(self, idx):
        return {
            "input_ids": self.input_ids[idx],
            "attention_mask": self.attention_mask[idx],
            "label": self.label[idx],
        }

train_dataset = CustomDataset(**train_dataset)
eval_dataset = CustomDataset(**eval_dataset)

# Define the evaluation function for the Trainer
def compute_metrics(p):
    return {"accuracy": (p.predictions.argmax(axis=1) == p.label_ids).mean()}

# Create a Trainer for foundation model evaluation
training_args_foundation = TrainingArguments(
    output_dir="./foundation_output",
    per_device_eval_batch_size=8,
)
trainer_foundation = Trainer(
    model=model,
    args=training_args_foundation,
    compute_metrics=compute_metrics,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)
foundation_results = trainer_foundation.evaluate()

# Create a PEFT config with LoRA
peft_config = LoraConfig(
    r=4, 
    lora_dropout=0.05,
    bias="none",
    target_modules=["q", "v"],
)

# Create a PEFT model using the foundation model and PEFT config
peft_model = get_peft_model(model, peft_config)

# Define the training arguments for PEFT
training_args_peft = TrainingArguments(
    output_dir="./peft_output",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    num_train_epochs=1,
)

# Create a Trainer for PEFT
trainer_peft = Trainer(
    model=peft_model,
    args=training_args_peft,
    compute_metrics=compute_metrics,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)
trainer_peft.train()

# Print trainable parameters
peft_model.print_trainable_parameters()

# Save the trained PEFT model
peft_model.save_pretrained("bert-lora")

# Evaluate the PEFT model on the same dataset
peft_results = trainer_peft.evaluate()

# Compare the results with the foundation model's performance
print("Foundation Model Results:", foundation_results['eval_accuracy'])
print("PEFT Model Results:", peft_results['eval_accuracy'])

# Load the PEFT model for inference
from peft import AutoPeftModelForSequenceClassification
loaded_peft_model = AutoPeftModelForSequenceClassification.from_pretrained("bert-lora")


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


ValueError: Target modules ['q', 'v'] not found in the base model. Please check the target modules and try again.