# Lightweight Fine-Tuning Project (Hints)

TODO: In this cell, describe your choices for each of the following

* PEFT technique: DoRA (Weight-Decomposed Low-Rank Adaptation)
* Model: distilbert-base-uncased
* Evaluation approach: Accuracy
* Fine-tuning dataset: sms_spam

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
# Installing required libraries
# Note: After running this cell, restart the kernel to ensure the new packages are properly loaded.
# Instructions to restart the kernel:
# 1. Go to the top menu in Jupyter Notebook and click on "Kernel".
# 2. From the dropdown, select "Restart".
# 3. Confirm the restart when prompted.
# 4. Wait for the kernel to restart (indicated by the kernel icon becoming active again).
# 5. Once the kernel is restarted, continue executing cells from the next one onwards.

!pip install -q -U transformers
!pip install -q -U accelerate
!pip install -q -U datasets
!pip install -q -U peft

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
autogluon-multimodal 1.2 requires nvidia-ml-py3==7.352.0, which is not installed.
autogluon-multimodal 1.2 requires accelerate<1.0,>=0.34.0, but you have accelerate 1.3.0 which is incompatible.
autogluon-multimodal 1.2 requires jsonschema<4.22,>=4.18, but you have jsonschema 4.23.0 which is incompatible.
autogluon-multimodal 1.2 requires nltk<3.9,>=3.4.5, but you have nltk 3.9.1 which is incompatible.
autogluon-multimodal 1.2 requires omegaconf<2.3.0,>=2.1.1, but you have omegaconf 2.3.0 which is incompatible.
autogluon-timeseries 1.2 requires accelerate<1.0,>=0.34.0, but you have accelerate 1.3.0 which is incompatible.[0m[31m
[0m

In [2]:
# Suppressing unrelated warnings
import torchvision
torchvision.disable_beta_transforms_warning()

# Necessary imports
from datasets import load_dataset
from peft import (AutoPeftModelForSequenceClassification,
                  LoraConfig,
                  get_peft_model,
                  TaskType)
from transformers import (AutoTokenizer,
                          AutoModelForSequenceClassification,
                          DataCollatorWithPadding,
                          Trainer,
                          TrainingArguments)
import numpy as np

2025-02-05 21:55:35.198829: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [3]:
# Loading the sms_spam dataset
# Dataset here: https://huggingface.co/datasets/sms_spam
# The sms_spam dataset only has a train split, so we use the train_test_split method to split it into train and test
dataset = load_dataset("sms_spam", split="train").train_test_split(
    test_size=0.2, shuffle=True, seed=23
)

splits = ["train", "test"]

# View the dataset characteristics
dataset["train"]

Dataset({
    features: ['sms', 'label'],
    num_rows: 4459
})

In [13]:
# TODO: Load the tokenizer for the "distilbert-base-uncased" model.
# Hint 1: Use AutoTokenizer.from_pretrained("<model-name>").
# Hint 2: The tokenizer will help convert text to tokens that the model can process.
#tokenizer = None  # Replace None with the appropriate code.
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

# TODO: Tokenize the dataset using the loaded tokenizer.
# Hint 1: Use the map() function with a lambda to apply tokenization to the "sms" column.
# Hint 2: Set truncation=True to ensure the sequences fit within the model's maximum input length.
tokenized_dataset = {}
for split in splits:
#    tokenized_dataset[split] = None  # Replace None with the appropriate code.
    tokenized_dataset[split] = dataset[split].map(
        lambda x: tokenizer(x["sms"], truncation=True), batched=True
    )


Map:   0%|          | 0/4459 [00:00<?, ? examples/s]

Map:   0%|          | 0/1115 [00:00<?, ? examples/s]

<details>
<summary>Click to see the answer</summary>

### Solution for Tokenizer Initialization and Tokenization
```python
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

tokenized_dataset = {}
for split in splits:
    tokenized_dataset[split] = dataset[split].map(
        lambda x: tokenizer(x["sms"], truncation=True), batched=True
    )
```
</details>


In [16]:
# TODO: Implement a function to compute accuracy from predictions and labels.
# Hint 1: Use np.argmax to find the predicted class from the logits.
# Hint 2: Compare the predicted classes with the true labels to calculate accuracy.
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = None  # Replace None with np.argmax to get predicted classes.
    return {"accuracy": None}  # Replace None to compute the mean of correct predictions.

<details>
<summary>Click to see the answer</summary>

### Solution for Metrics Function
```python
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}
```
</details>

In [14]:
# Loading distilbert-base-uncased for sequence classification
model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=2,
    id2label={0: "not spam", 1: "spam"},
    label2id={"not spam": 0, "spam": 1},
)

# Define a function to print the trainable parameters of the model
def print_number_of_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"

print()
print(print_number_of_trainable_model_parameters(model))

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



trainable model parameters: 66955010
all model parameters: 66955010
percentage of trainable model parameters: 100.00%


In [7]:
# TODO: Create the Trainer class instance to evaluate the pre-trained model on the test set.
# Hint 1: Trainer's eval_dataset argument takes an evaluation dataset (i.e. tokenized_dataset["test"]) and computes_metrics argument will take the compute_metrics function callable defined above.
# Hint 2: args argument of Trainer will be this --> args=TrainingArguments(output_dir="./result-distilbert-base", per_device_eval_batch_size=4, report_to="none")
print("Evaluating the model before fine-tuning...")
trainer = None  # Replace with the code given in Hint 2 above
pre_finetune_eval = None  # Replace None with the appropriate trainer.evaluate() call.
print(pre_finetune_eval)

Evaluating the model before fine-tuning...
None


<details>
<summary>Click to see the answer</summary>

### Solution for Model Evaluation Before Fine-Tuning
```python
trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./result-distilbert-base",
        per_device_eval_batch_size=4,
        report_to="none"
    ),
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)
pre_finetune_eval = trainer.evaluate()
print(pre_finetune_eval)
```
</details>

## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [8]:
# TODO: Complete the LoRA configuration.
# Hint 1: Use r=16, lora_alpha=16, and lora_dropout=0.05 as default values.
# Hint 2: Specify target_modules where LoRA is applied (e.g., MultiHeadAttention layers) i.e. target_modules=["q_lin", "k_lin", "v_lin", "out_lin"]
# Hint 3: Use use_dora=True
# Hint 4: Use task_type=TaskType.SEQ_CLS for sequence classification tasks.
lora_config = LoraConfig(
    r=16,
    lora_alpha=16,
    lora_dropout=0.05,
    target_modules=None,  # Replace None with the list of target modules.
    bias='none',
    task_type=None  # Replace None with the correct task type.
)

# Get PEFT model
peft_model = get_peft_model(model, lora_config)

# Reduced trainable parameters
print(print_number_of_trainable_model_parameters(peft_model))

ValueError: Please specify `target_modules` in `peft_config`

<details>
<summary>Click to see the answer</summary>

### Solution for LoRA Configuration
```python
lora_config = LoraConfig(
    r=16,
    lora_alpha=16,
    lora_dropout=0.05,
    target_modules=["q_lin", "k_lin", "v_lin", "out_lin"],
    bias='none',
    use_dora=True,
    task_type=TaskType.SEQ_CLS
)
```
</details>


In [None]:
# Training and evaluating the model prepared for PEFT
trainer = Trainer(
    model=peft_model,
    args=TrainingArguments(
        output_dir="./result-distilbert-lora",
        learning_rate=2e-5,
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        save_strategy="epoch",
        evaluation_strategy="epoch",
        save_steps=1,
        num_train_epochs=2,
        weight_decay=0.01,
        load_best_model_at_end=True,
        report_to="none"
    ),
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()

In [None]:
# Evaluate the model after fine-tuning
print("Evaluating the model after fine-tuning...")
post_finetune_eval = trainer.evaluate()
print(post_finetune_eval)

In [None]:
# TODO: Save the fine-tuned model to a directory.
# Hint: Use model.save_pretrained("<directory-name>").
peft_model.save_pretrained(None)  # Replace None with the directory path.

<details>
<summary>Click to see the answer</summary>

### Solution for Saving the Model
```python
peft_model.save_pretrained("distilbert-lora")
```
</details>

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [None]:
# TODO: Load the saved model from the directory.
# Hint: Use AutoPeftModelForSequenceClassification.from_pretrained("<directory-name>").
lora_model = None  # Replace None with the appropriate code to load the model.

<details>
<summary>Click to see the answer</summary>

### Solution for Loading the Model
```python
lora_model = AutoPeftModelForSequenceClassification.from_pretrained("distilbert-lora")
```
</details>


In [None]:
# Evaluate the loaded fine-tuned model
loaded_trainer = Trainer(
    model=lora_model,
    args=TrainingArguments(
        output_dir="./result-distilbert-lora",
        per_device_eval_batch_size=4,
        report_to="none"
    ),
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

print("Evaluating the loaded fine-tuned model...")
loaded_model_eval = loaded_trainer.evaluate()
print(loaded_model_eval)