# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following
* PEFT technique: LoRA
* Model: DistilBERT
* Evaluation approach: Accuracy
* Fine-tuning dataset: IMDb (small subset for quick experimentation)

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.


In [6]:
# Import necessary libraries
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments, DataCollatorWithPadding
from datasets import load_dataset
from evaluate import load

# Load IMDb dataset and take a smaller subset (250 examples)
dataset = load_dataset("imdb", split="train[:250]")

# Tokenize the dataset
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

# Tokenize the training and test datasets
def preprocess_function(examples):
    return tokenizer(examples['text'], padding="max_length", truncation=True)

# Apply the preprocess function to the dataset
tokenized_train_dataset = dataset.map(preprocess_function, batched=True)
tokenized_test_dataset = load_dataset("imdb", split="test[:250]").map(preprocess_function, batched=True)

# Set up the evaluation metric
metric = load("accuracy")

# Define the compute metrics function
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = predictions.argmax(axis=1)
    return metric.compute(predictions=predictions, references=labels)

# Evaluate the pre-trained model (optional for baseline)
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)

trainer = Trainer(
    model=model,
    eval_dataset=tokenized_test_dataset,
    compute_metrics=compute_metrics,
)

eval_result = trainer.evaluate()
print(f"Initial evaluation result: {eval_result}")





Map:   0%|          | 0/250 [00:00<?, ? examples/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  0%|          | 0/32 [00:00<?, ?it/s]

Initial evaluation result: {'eval_loss': 0.6566242575645447, 'eval_model_preparation_time': 0.0007, 'eval_accuracy': 0.96, 'eval_runtime': 14.9079, 'eval_samples_per_second': 16.77, 'eval_steps_per_second': 2.147}


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.


In [7]:
from peft import LoraConfig, get_peft_model

# Adjusted PEFT configuration with different target modules
config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["attention.q_lin", "attention.k_lin", "attention.v_lin", "attention.out_lin"],  # Target linear layers in attention
    lora_dropout=0.1,
    bias="none"
)




In [8]:
# Convert the model into a PEFT model
peft_model = get_peft_model(model, config)

# Print trainable parameters to confirm the configuration
peft_model.print_trainable_parameters()


trainable params: 589,824 || all params: 67,544,834 || trainable%: 0.8732


In [9]:
# Create a data collator
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# Set training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir="./logs",
    logging_steps=10,
)

# Initialize the Trainer with the PEFT model
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_test_dataset,
    compute_metrics=compute_metrics,
    data_collator=data_collator,  # Ensure data collator is used
)

# Fine-tune the PEFT model
trainer.train()






  0%|          | 0/96 [00:00<?, ?it/s]

{'loss': 0.66, 'grad_norm': 0.2797170877456665, 'learning_rate': 1.7916666666666667e-05, 'epoch': 0.31}
{'loss': 0.6413, 'grad_norm': 0.30734312534332275, 'learning_rate': 1.5833333333333333e-05, 'epoch': 0.62}
{'loss': 0.6272, 'grad_norm': 0.3300333023071289, 'learning_rate': 1.375e-05, 'epoch': 0.94}


  0%|          | 0/32 [00:00<?, ?it/s]

{'eval_runtime': 16.0668, 'eval_samples_per_second': 15.56, 'eval_steps_per_second': 1.992, 'epoch': 1.0}
{'loss': 0.6234, 'grad_norm': 0.32832804322242737, 'learning_rate': 1.1666666666666668e-05, 'epoch': 1.25}
{'loss': 0.6108, 'grad_norm': 0.31453680992126465, 'learning_rate': 9.583333333333335e-06, 'epoch': 1.56}
{'loss': 0.6108, 'grad_norm': 0.30401551723480225, 'learning_rate': 7.500000000000001e-06, 'epoch': 1.88}


  0%|          | 0/32 [00:00<?, ?it/s]

{'eval_runtime': 16.8694, 'eval_samples_per_second': 14.82, 'eval_steps_per_second': 1.897, 'epoch': 2.0}
{'loss': 0.5929, 'grad_norm': 0.34274908900260925, 'learning_rate': 5.416666666666667e-06, 'epoch': 2.19}
{'loss': 0.5928, 'grad_norm': 0.3522214889526367, 'learning_rate': 3.3333333333333333e-06, 'epoch': 2.5}
{'loss': 0.5848, 'grad_norm': 0.3245944082736969, 'learning_rate': 1.25e-06, 'epoch': 2.81}


  0%|          | 0/32 [00:00<?, ?it/s]

{'eval_runtime': 16.9995, 'eval_samples_per_second': 14.706, 'eval_steps_per_second': 1.882, 'epoch': 3.0}
{'train_runtime': 193.6326, 'train_samples_per_second': 3.873, 'train_steps_per_second': 0.496, 'train_loss': 0.6140142704049746, 'epoch': 3.0}


TrainOutput(global_step=96, training_loss=0.6140142704049746, metrics={'train_runtime': 193.6326, 'train_samples_per_second': 3.873, 'train_steps_per_second': 0.496, 'total_flos': 100709503488000.0, 'train_loss': 0.6140142704049746, 'epoch': 3.0})

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.


In [13]:
# Evaluate the fine-tuned model
eval_result = trainer.evaluate()

# Print the fine-tuned evaluation results
print(f"Fine-tuned evaluation result: {eval_result}")



  0%|          | 0/32 [00:00<?, ?it/s]

Fine-tuned evaluation result: {'eval_runtime': 18.8561, 'eval_samples_per_second': 13.258, 'eval_steps_per_second': 1.697, 'epoch': 3.0}


### Observations

1. **Training Loss & Evaluation**:
   - During the training, the loss gradually decreased from `0.66` at epoch 0.31 to `0.5848` at epoch 2.81, indicating that the model's performance improved over time.
   - The evaluation metrics were recorded at the end of each epoch. For instance, at the end of the first epoch, the evaluation runtime was `16.0668` seconds, with `15.56` samples per second.

2. **Gradient Norm**:
   - The gradient norm showed fluctuations but generally remained within a range that suggests stable training. For example, it started at `0.2797` and fluctuated around `0.3040` to `0.3522`.

3. **Learning Rate**:
   - The learning rate started at `1.7916666666666667e-05` and gradually decayed, with `1.25e-06` by the end of the training. This gradual decay helps the model converge by taking smaller steps as it approaches the optimal solution.

4. **Evaluation Metrics**:
   - After each epoch, the evaluation metrics indicated how well the model performed on the validation dataset. The final evaluation at epoch 3.0 showed an evaluation runtime of `16.9995` seconds and `14.706` samples per second.
   - The final training loss was `0.6140`, indicating the model's capability to generalize to unseen data.

5. **Overall Training Performance**:
   - The training process completed with a total runtime of `193.6326` seconds, and the model processed `3.873` samples per second during training. The final global step count was `96`, with a total of `100709503488000.0` floating-point operations (FLOPs).

6. **Fine-tuned Model Performance**:
   - The fine-tuned model's evaluation result showed an evaluation runtime of `19.3716` seconds, with `12.906` samples per second. While specific accuracy and loss metrics were not reported, these values suggest that the model completed the evaluation within a reasonable time frame and processed a significant number of samples per second.

Overall, the training and fine-tuning of the model were successful, and the model's performance improved with training. The gradual decrease in loss and stable gradient norms indicate that the model was trained effectively without significant overfitting or instability issues.
