# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: 
* Model: 
* Evaluation approach: 
* Fine-tuning dataset: 


## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
!pip install datasets
!pip install transformers
!pip install peft
!pip install evaluate

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


Defaulting to user installation because normal site-packages is not writeable


### Imports

##### Compute Metrics

In [2]:
from datasets import load_dataset, DatasetDict, Dataset
from transformers import (
    AutoTokenizer,
    AutoConfig,
    AutoModelForSequenceClassification,
    DataCollatorWithPadding,
    TrainingArguments,
    Trainer)

from peft import PeftModel, PeftConfig, get_peft_model, LoraConfig
from sklearn.metrics import precision_recall_fscore_support, accuracy_score
import evaluate
import torch
import numpy as np

### Helper methods

In [3]:
def compute_metrics(eval_pred):
    '''
    Calculates metrics to evaluate model including accuracy, F1 score, precision, and recall.
    Provides detailed debug information about the prediction results.
    '''
    # Unpacking the logits and labels from the eval_pred tuple
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=-1)

    # Calculating metrics
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='weighted')
    acc = accuracy_score(labels, preds)

    return {
        'accuracy': acc,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }

def print_trainable_parameters(model):
    '''
    Generic method | % of Trainable Parameters.
    '''
    total_params = sum(p.numel() for p in model.parameters())
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    print(f"Trainable parameters: {trainable_params}")
    print(f"Total parameters: {total_params}")
    print(f"Percentage: {trainable_params/total_params * 100} %")

### Load BERT MODEL for classification


In [4]:
# define label maps
id2label = {0: "Negative", 1: "Positive"}
label2id = {"Negative":0, "Positive":1}

In [5]:
model_name = "bert-base-uncased"  # Using the base version for simplicity
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2,
    id2label={0: "Negative", 1: "Positive"},
    label2id={"Negative": 0, "Positive": 1})  # Paraphrase detection has two labels
tokenizer = AutoTokenizer.from_pretrained(model_name)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [6]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

### Load the SST dataset for sentiment analysis.

In [7]:
# https://huggingface.co/datasets/stanfordnlp/sst2
dataset = load_dataset('glue', 'sst2')  # example dataset
dataset

DatasetDict({
    train: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 67349
    })
    validation: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 872
    })
    test: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 1821
    })
})

### Tokenizing & Padding the dataset

In [8]:
def preprocess_function(examples):
    '''Tokenizing the dataset'''
    return tokenizer(examples['sentence'], truncation=True)

encoded_dataset = dataset.map(preprocess_function, batched=True)

Map:   0%|          | 0/872 [00:00<?, ? examples/s]

In [9]:
encoded_dataset

DatasetDict({
    train: Dataset({
        features: ['sentence', 'label', 'idx', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 67349
    })
    validation: Dataset({
        features: ['sentence', 'label', 'idx', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 872
    })
    test: Dataset({
        features: ['sentence', 'label', 'idx', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 1821
    })
})

#### Collator for padding.

In [10]:
# create data collator
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

#### Explore

In [11]:
print("Input ids format in validation set:\n",encoded_dataset['train'][0]['input_ids'])
print()
print("Label format in validation set:", encoded_dataset['train'][0]['label'])
print()
print(encoded_dataset['validation'].column_names)

Input ids format in validation set:
 [101, 5342, 2047, 3595, 8496, 2013, 1996, 18643, 3197, 102]

Label format in validation set: 0

['sentence', 'label', 'idx', 'input_ids', 'token_type_ids', 'attention_mask']


In [12]:
model = model.to(device)

In [15]:
# Create a trainer with the base model (before fine-tuning)
base_trainer = Trainer(
    model=model,                 # the original base model
    #args=training_args,
    train_dataset=encoded_dataset['train'], 
    eval_dataset=encoded_dataset['validation'],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

# Evaluate the base model
base_eval_results = base_trainer.evaluate()
print("Evaluation results before fine-tuning:", base_eval_results)

Evaluation results before fine-tuning: {'eval_loss': 0.6959120631217957, 'eval_accuracy': 0.5, 'eval_f1': 0.4848450460673854, 'eval_precision': 0.5037709730542496, 'eval_recall': 0.5, 'eval_runtime': 2.3437, 'eval_samples_per_second': 372.059, 'eval_steps_per_second': 46.507}


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

`https://huggingface.co/docs/peft/v0.10.0/en/package_reference/peft_model#peft.PeftModel`

In [16]:
peft_config = LoraConfig(task_type="SEQ_CLS", r=4, lora_alpha=32)
peft_config

LoraConfig(peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path=None, revision=None, task_type='SEQ_CLS', inference_mode=False, r=4, target_modules=None, lora_alpha=32, lora_dropout=0.0, fan_in_fan_out=False, bias='none', modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None)

In [17]:
lora_model = get_peft_model(model, peft_config)
lora_model

PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): BertForSequenceClassification(
      (bert): BertModel(
        (embeddings): BertEmbeddings(
          (word_embeddings): Embedding(30522, 768, padding_idx=0)
          (position_embeddings): Embedding(512, 768)
          (token_type_embeddings): Embedding(2, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (encoder): BertEncoder(
          (layer): ModuleList(
            (0-11): 12 x BertLayer(
              (attention): BertAttention(
                (self): BertSelfAttention(
                  (query): Linear(
                    in_features=768, out_features=768, bias=True
                    (lora_dropout): ModuleDict(
                      (default): Identity()
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=768, out_features=4, bias=Fa

In [18]:
print("Using the custom method.")
print(lora_model.print_trainable_parameters())

Using the custom method.
trainable params: 150,532 || all params: 109,632,772 || trainable%: 0.13730565893198432
None


In [19]:
print("Using the function.")
print_trainable_parameters(lora_model)

Using the function.
Trainable parameters: 150532
Total parameters: 109632772
Percentage: 0.13730565893198432 %


# hyperparameters

In [20]:
lr = 1e-3
batch_size = 16
num_epochs = 5

In [21]:
from transformers import TrainingArguments,Trainer,DataCollatorWithPadding

training_args = TrainingArguments(
    output_dir='./results/-lora-text-classification',          # directory to save model checkpoints
        # Set the learning rate
        learning_rate=2e-5,
        # Set the per device train batch size and eval batch size
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        # Evaluate and save the model after each epoch
        evaluation_strategy="epoch",
        save_strategy="epoch",
        num_train_epochs=1,
        weight_decay=0.01,
        load_best_model_at_end=True,
    )


In [22]:
encoded_dataset['train']

Dataset({
    features: ['sentence', 'label', 'idx', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 67349
})

In [23]:
lora_model = lora_model.to(device)

In [29]:
# QUANTIZATION Q LORA
from transformers import AdamW
import bitsandbytes as bnb

# Replace AdamW with a bitsandbytes 8-bit optimizer
optimizer = bnb.optim.Adam8bit(lora_model.parameters(), lr=1e-5)

In [31]:
trainer = Trainer(
    model=lora_model,               # the PEFT model to be trained
    args=training_args,             # training arguments, defining how to train
    train_dataset=encoded_dataset['train'],  # training dataset
    eval_dataset=encoded_dataset['validation'],  # evaluation dataset
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,  # function to compute metrics during evaluation
    optimizers=(optimizer, None)  # No separate scheduler used
)


In [32]:
train_result = trainer.train()
print(train_result)

Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.2735,0.26737,0.893349,0.893255,0.894013,0.893349


Checkpoint destination directory ./results/-lora-text-classification/checkpoint-4210 already exists and is non-empty.Saving will proceed but saved results may be invalid.


TrainOutput(global_step=4210, training_loss=0.283609543390342, metrics={'train_runtime': 343.1934, 'train_samples_per_second': 196.242, 'train_steps_per_second': 12.267, 'total_flos': 1218786755125800.0, 'train_loss': 0.283609543390342, 'epoch': 1.0})


## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [33]:
# Evaluation after fine-tuning
eval_results_after = trainer.evaluate(eval_dataset=encoded_dataset['validation'])
print("Evaluation results after fine-tuning:", eval_results_after)

Evaluation results after fine-tuning: {'eval_loss': 0.2673702538013458, 'eval_accuracy': 0.893348623853211, 'eval_f1': 0.8932552005187875, 'eval_precision': 0.8940128781532564, 'eval_recall': 0.893348623853211, 'eval_runtime': 2.5055, 'eval_samples_per_second': 348.039, 'eval_steps_per_second': 21.952, 'epoch': 1.0}


In [34]:
lora_model.to('cpu')
# define list of examples
text_list = ["Terrible stuff all around", "Pretty beautiful .", "The movie sucked ! ."]

print("Fintuned model predictions:")
print("==============================")
for text in text_list:
    # tokenize text
    inputs = tokenizer.encode(text, return_tensors="pt")
    # compute logits
    logits = lora_model(inputs).logits
    # convert logits to label
    predictions = torch.argmax(logits)

    print(text + " > SENTIMENT IS " + id2label[predictions.tolist()])

Fintuned model predictions:
Terrible stuff all around > SENTIMENT IS Negative
Pretty beautiful . > SENTIMENT IS Positive
The movie sucked ! . > SENTIMENT IS Negative


In [35]:
# Print evaluation metrics before fine-tuning
print("Evaluation results before fine-tuning:")
print(f"Accuracy: {base_eval_results['eval_accuracy']}")
print(f"F1 Score: {base_eval_results['eval_f1']}")
print(f"Precision: {base_eval_results['eval_precision']}")
print(f"Recall: {base_eval_results['eval_recall']}")

# Print evaluation metrics after fine-tuning
print("\nEvaluation results after fine-tuning:")
print(f"Accuracy: {eval_results_after['eval_accuracy']}")
print(f"F1 Score: {eval_results_after['eval_f1']}")
print(f"Precision: {eval_results_after['eval_precision']}")
print(f"Recall: {eval_results_after['eval_recall']}")

Evaluation results before fine-tuning:
Accuracy: 0.5
F1 Score: 0.4848450460673854
Precision: 0.5037709730542496
Recall: 0.5

Evaluation results after fine-tuning:
Accuracy: 0.893348623853211
F1 Score: 0.8932552005187875
Precision: 0.8940128781532564
Recall: 0.893348623853211


In [None]:
                                ### THANK YOU !!! ####