# Lightweight Fine-Tuning Project

#### Student: Rodrigo Quezada Reyes

#### The Udacity platform was having Connection Failed over and over again during training or evaluation so rather perform locally with my GPU and will submit as a zip file.

***

Describing my choices for each of the following:

* PEFT technique: Lora as I find it a great option for fine-tuning while freezing a lot of paremeters for computation effiency.
* Model: deberta-v2-xlarge because it is a great model for text classification tasks and this is one of those.
* Evaluation approach: Performing initial evaluation with the foundational model, then performing the same evaluation using the trained Peft Model. This will allow a fair comparison of the model as is compared to the model fine-tuned.
* Fine-tuning dataset: Hugging Face tweet_eval dataset as it is an interesting collection of tweet-based benchmark tasks designed for evaluating text classification models on social media content.

## Loading and Evaluating a Foundation Model

* I am selecting to load as my chosen pre-trained foundational model, the deberta-v2-xlarge after assessing its capability for classification tasks.
* I will evaluate its performance prior to fine-tuning and then after fine-tuning. 
* I will also include loading an appropriate tokenizer and dataset.

***

#### Import all dependencies including the Hugging Face PEFT Library

In [1]:
#!pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu126

In [2]:
#!pip install sentencepiece

In [3]:
#import os
#os._exit(00)

In [4]:
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

In [5]:
# Importing the torch library

import torch
print(torch.__version__)
print(torch.version.cuda)
print(torch.cuda.is_available())

2.7.1+cu126
12.6
True


In [6]:
# Importing the rest of the libraries

from peft import LoraConfig
from peft import TaskType
from transformers import AutoModelForCausalLM,AutoModelForSequenceClassification
from peft import get_peft_model
from peft import AutoPeftModelForCausalLM
from peft import PeftModel, PeftConfig
from datasets import load_dataset
from transformers import AutoTokenizer
from sklearn.metrics import accuracy_score, classification_report
import numpy as np
from transformers import pipeline, DataCollatorWithPadding, Trainer, TrainingArguments
from transformers import DebertaV2ForSequenceClassification, DebertaV2Tokenizer
import evaluate
from torch.utils.data import DataLoader
import sentencepiece


In [7]:
# Verify if GPU is available

import torch
print(torch.cuda.is_available())
print(torch.cuda.device_count())
print(torch.version.cuda)

True
1
12.6


In [8]:
# Setting this up because my session expires as it times out while I am training the model at Udacity platform
# now running locally so no longer needed

#import time
#import threading

#def keep_alive():
 #   while True:
  #      print("Keeping the session alive...")
   #     time.sleep(300)  # Sleep for 5 minutes

# Start the keep-alive thread
#keep_alive_thread = threading.Thread(target=keep_alive)
#keep_alive_thread.daemon = True  # This allows the thread to exit when the main program does
#keep_alive_thread.start()


### Function to perform evaluate model performance (same function will be used for before and after model fine-tuning)

In [9]:
# Function to evaluate both the initial performance of the pretrained model with the dataset 
# to then to use to evaluate the pretrained fine-tuned model with the dataset 

kpi = evaluate.load("accuracy")

def model_evaluating(model, dataset, batch_size=1):
    model.eval()
    model.to("cuda")
    dataloader = DataLoader(dataset, batch_size=batch_size)
    
    for i in dataloader:
        input_ids = i["input_ids"].to("cuda")
        attention_mask = i["attention_mask"].to("cuda")
        labels = i["label"].to("cuda")
        
        with torch.no_grad():
            outputs = model(input_ids, attention_mask=attention_mask)
            predictions = outputs.logits.argmax(dim=-1)
        
        kpi.add_batch(predictions=predictions, references=labels)
    
    return kpi.compute()

## Loading the selected pre-trained model

***

In [10]:
# Creating a model object from a Converting a Transformer Model by loading my chosen pre-trained Hugging Face model: gpt2

# At my local machine this model worked great but switching to alternative per the platform options
#my_model = AutoModelForSequenceClassification.from_pretrained(
 #   "bert-base-uncased",
  #  num_labels=6)
    
#my_model = DebertaV2ForSequenceClassification.from_pretrained(
 #   "microsoft/deberta-v3-base",
  #  num_labels=6
#)
           
my_model = DebertaV2ForSequenceClassification.from_pretrained(
    'microsoft/deberta-v3-small', 
    num_labels=4,
    ignore_mismatched_sizes=True,
    use_safetensors=True 
    )


Some weights of DebertaV2ForSequenceClassification were not initialized from the model checkpoint at microsoft/deberta-v3-small and are newly initialized: ['classifier.bias', 'classifier.weight', 'pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Performing Parameter-Efficient Fine-Tuning

Creating a PEFT model from my loaded model, run a training loop, and saving the PEFT model weights.

In [11]:
# 1) Create a PEFT model from your loaded mode

# Creating a PEFT configuration object from a Lora function

my_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,  # For sequence classification
    r=8,
    lora_alpha=32,
    lora_dropout=0.1
)

# Creating a trainable PEFT model object from a PEFT function

lora_peft_model = get_peft_model(my_model, my_config)

In [12]:
# Confirmation on Lora is working so only some parameters can be fine-tuned

lora_peft_model.print_trainable_parameters()

trainable params: 150,532 || all params: 142,048,520 || trainable%: 0.1060


In [13]:
# Observing the model object

lora_peft_model

PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): DebertaV2ForSequenceClassification(
      (deberta): DebertaV2Model(
        (embeddings): DebertaV2Embeddings(
          (word_embeddings): Embedding(128100, 768, padding_idx=0)
          (LayerNorm): LayerNorm((768,), eps=1e-07, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (encoder): DebertaV2Encoder(
          (layer): ModuleList(
            (0-5): 6 x DebertaV2Layer(
              (attention): DebertaV2Attention(
                (self): DisentangledSelfAttention(
                  (query_proj): lora.Linear(
                    (base_layer): Linear(in_features=768, out_features=768, bias=True)
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.1, inplace=False)
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=768, out_features=8, bias=False)
          

In [14]:
# Saving the original peft model

lora_peft_model.save_pretrained("./tmp/original_lora_peft_8142133_model")

In [15]:
# Practicing correct LoRA peft load

# 1. Load PEFT config to find base model name
peft_model_path = "./tmp/original_lora_peft_8142133_model"  
peft_config = PeftConfig.from_pretrained(peft_model_path)

# 2. Load base model with correct label count
my_model = DebertaV2ForSequenceClassification.from_pretrained(
    'microsoft/deberta-v3-small', 
    num_labels=4,
    ignore_mismatched_sizes=True,
    use_safetensors=True 
    )

# 3. Load LoRA adapter on top of base model
original_lora_peft = PeftModel.from_pretrained(my_model, peft_model_path)


Some weights of DebertaV2ForSequenceClassification were not initialized from the model checkpoint at microsoft/deberta-v3-small and are newly initialized: ['classifier.bias', 'classifier.weight', 'pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  result[k] = f.get_tensor(k)


In [16]:
# Observing the retrieved model object

original_lora_peft

PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): DebertaV2ForSequenceClassification(
      (deberta): DebertaV2Model(
        (embeddings): DebertaV2Embeddings(
          (word_embeddings): Embedding(128100, 768, padding_idx=0)
          (LayerNorm): LayerNorm((768,), eps=1e-07, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (encoder): DebertaV2Encoder(
          (layer): ModuleList(
            (0-5): 6 x DebertaV2Layer(
              (attention): DebertaV2Attention(
                (self): DisentangledSelfAttention(
                  (query_proj): lora.Linear(
                    (base_layer): Linear(in_features=768, out_features=768, bias=True)
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.1, inplace=False)
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=768, out_features=8, bias=False)
          

I am selecting the tweet_eval dataset which is a collection of tweet-based benchmark tasks designed for evaluating text classification models on social media content. It includes several sub-tasks such as emotion classification, hate speech detection, irony, stance detection, and more—each with its own labeled subset. Tweets are short, informal, and often noisy, making the dataset ideal for developing and testing models in real-world, low-resource language scenarios. The dataset is widely used for benchmarking due to its diversity and compact size, with some sub-tasks (like the "emotion" subset) containing fewer than 4,000 training examples—making it especially suitable for training with limited computing resource because using the dataset emotion the traning crashes.

In [17]:
# Now select and load a dataset

from datasets import load_dataset

#my_dataset = load_dataset("emotion")
my_dataset = load_dataset("tweet_eval", "emotion")

dataset_splits = ['train', 'validation', 'test']

print(my_dataset)

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 3257
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 1421
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 374
    })
})


In [18]:
# Review overall train dataset

my_dataset["train"]

Dataset({
    features: ['text', 'label'],
    num_rows: 3257
})

In [19]:
# Review overall test dataset

my_dataset["test"]

Dataset({
    features: ['text', 'label'],
    num_rows: 1421
})

In [20]:
# Review the first example from the train dataset

my_dataset["train"][0]

{'text': "“Worry is a down payment on a problem you may never have'. \xa0Joyce Meyer.  #motivation #leadership #worry",
 'label': 2}

In [21]:
# Review the first example from the validation dataset

my_dataset["test"][0]

{'text': '#Deppression is real. Partners w/ #depressed people truly dont understand the depth in which they affect us. Add in #anxiety &amp;makes it worse',
 'label': 3}

In [22]:
# Loading an appropriate selected tokenizer

my_tokenizer = DebertaV2Tokenizer.from_pretrained('microsoft/deberta-v2-xlarge')

#my_tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-base")

In [23]:
# Improved tokenizer version

my_tokenized_dataset = {}

for split in dataset_splits:
    my_tokenized_dataset[split] = my_dataset[split].map(
        #lambda x: my_tokenizer(x["text"], truncation=True, padding="max_length"), 
        lambda x: my_tokenizer(x["text"], truncation=True, padding=True, return_tensors="pt"), 
        batched=True
    )

# Inspect the available columns in the dataset
print(my_tokenized_dataset["train"].column_names)

['text', 'label', 'input_ids', 'token_type_ids', 'attention_mask']


In [24]:
print(my_tokenized_dataset["test"].column_names)

['text', 'label', 'input_ids', 'token_type_ids', 'attention_mask']


In [25]:
print(my_tokenized_dataset["train"].column_names)

['text', 'label', 'input_ids', 'token_type_ids', 'attention_mask']


In [26]:
print(my_tokenized_dataset["train"][0])

{'text': "“Worry is a down payment on a problem you may never have'. \xa0Joyce Meyer.  #motivation #leadership #worry", 'label': 2, 'input_ids': [1, 68, 43422, 41870, 13, 10, 184, 1574, 21, 10, 453, 17, 111, 252, 30, 25, 4, 15282, 15583, 4, 1539, 76839, 1539, 71038, 1539, 118308, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}


In [27]:
print(my_tokenized_dataset["test"][0])

{'text': '#Deppression is real. Partners w/ #depressed people truly dont understand the depth in which they affect us. Add in #anxiety &amp;makes it worse', 'label': 3, 'input_ids': [1, 1539, 99185, 56743, 13, 340, 4, 8583, 2564, 96, 1539, 2539, 30606, 98, 1276, 5826, 513, 5, 3291, 11, 59, 49, 2271, 120, 4, 1962, 11, 1539, 63270, 169, 10087, 93, 54082, 22, 2416, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}


## Performing the baseline evaluation of the pre-trained model

***

In [28]:
my_tokenized_dataset["test"].set_format(type="torch", columns=["input_ids", "attention_mask", "label"])

#my_tokenized_dataset["test"].set_format(type="torch", columns=["input_ids", "attention_mask", "label"], padding=True, truncation=True, return_tensors="pt")

#tokenizer(texts, padding=True, truncation=True, return_tensors="pt")

print(my_tokenized_dataset["test"])

Dataset({
    features: ['text', 'label', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 1421
})


In [29]:
# Making sure the testing dataset is ready for evaluation

my_testing_tokenized_dataset = my_tokenized_dataset["test"].map(batched=True)

my_testing_tokenized_dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "label"])


In [30]:
# Delete cache before running to allow enough memory for it

torch.cuda.empty_cache()

In [31]:
# Test initial accuracy of the pretrained Model on the selected dataset

baseline_results = model_evaluating(original_lora_peft, my_testing_tokenized_dataset)
print("Base Model Evaluation:", baseline_results)

Base Model Evaluation: {'accuracy': 0.26882477128782545}


## Apply PEFT to fine-tune the model efficiency via training it

***

In [32]:
# Delete cache before training to allow enough memory for it

torch.cuda.empty_cache()
torch.cuda.reset_peak_memory_stats()


In [33]:
# 2) Run a training loop

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

trainer = Trainer(
    model=original_lora_peft,
    args=TrainingArguments(
        output_dir="./tmp/patent_class",
        # Set the learning rate
        learning_rate=2e-5,
        # Set the per device train batch size and eval batch size
        per_device_train_batch_size=1,
        per_device_eval_batch_size=1,
        # Evaluate and save the model after each epoch
        #evaluation_strategy="epoch",
        eval_strategy="epoch",
        save_strategy="epoch",
        # Set the learning rate
        num_train_epochs=2,
        weight_decay=0.01,
        load_best_model_at_end=True,
    ),
    train_dataset=my_tokenized_dataset["train"],
    eval_dataset=my_tokenized_dataset["test"],
        
    tokenizer=my_tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=my_tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()

  trainer = Trainer(
No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Epoch,Training Loss,Validation Loss,Accuracy
1,1.3277,1.388205,0.392681
2,1.3001,1.420024,0.392681


TrainOutput(global_step=6514, training_loss=1.3221184489256332, metrics={'train_runtime': 617.5189, 'train_samples_per_second': 10.549, 'train_steps_per_second': 10.549, 'total_flos': 114033337882464.0, 'train_loss': 1.3221184489256332, 'epoch': 2.0})

###  ⚠️ IMPORTANT ⚠️

Due to workspace storage constraints, you should not store the model weights in the same directory but rather use `/tmp` to avoid workspace crashes which are irrecoverable.
Ensure you save it in /tmp always.

In [35]:
# Now merge LoRA weights with base model

ft_model = original_lora_peft.merge_and_unload()
ft_model.save_pretrained("./tmp/finetuned_814_2347_model")


## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

## Now evaluate the foundational model performance after the PEFT fine-tuning

In [36]:
# Delete cache before running to allow enough memory for it

torch.cuda.empty_cache()

In [37]:
# Using the same evaluation method, now for the fine-tuned/trained peft model

new_results = model_evaluating(ft_model, my_testing_tokenized_dataset)
print("Post PEFT fine-tune Model Evaluation:", new_results)

Post PEFT fine-tune Model Evaluation: {'accuracy': 0.39268121041520054}


## Compare performance before and after fine-tuning

***

In [38]:
# Delete cache before running to allow enough memory for it

torch.cuda.empty_cache()

In [39]:
print("Original Model: ", baseline_results)
print("Fine-tuned Model: ", new_results)

Original Model:  {'accuracy': 0.26882477128782545}
Fine-tuned Model:  {'accuracy': 0.39268121041520054}


-The End-