# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: LoRA - I chose to keep it simple for this part. LoRA is a good option for optimizing the fine-tuning process
* Model: I chose distilbert-base-uncased because in the dataset I chose, case doesn't matter. I also wanted a model that was fast and lightweight. Distilbert has historically worked well for sequence classification tasks.
* Evaluation approach: Argmax
* Fine-tuning dataset: I chose the clinc_oos dataset because I wanted to apply text classification concepts when there are more than two classes

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
# imports

from datasets import load_dataset, Features, Value, ClassLabel
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments
from peft import get_peft_model, LoraConfig

  warn("The installed version of bitsandbytes was compiled without GPU support. "


/opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32


In [2]:
# load a dataset 

dataset = load_dataset('clinc/clinc_oos', 'small', split='train').train_test_split(
    test_size=0.2, shuffle=True, seed=23
)

# limit the dataset to 3 classes only for simplicity
dataset = dataset.filter(lambda x: (x["intent"] == 0) or (x["intent"] == 1) or (x["intent"] == 9))

# change column names for ease
dataset = dataset.rename_column("intent", "labels")

# change label 9 to 2 for ease
def change_label(x):
    if (x["labels"] == 9):
        x['labels'] = 2
    return x

dataset = dataset.map(change_label)

splits = ["train", "test"]

print(dataset["train"])

print(dataset["test"])
print(dataset["test"]['labels'])

Dataset({
    features: ['text', 'labels'],
    num_rows: 115
})
Dataset({
    features: ['text', 'labels'],
    num_rows: 35
})
[0, 2, 2, 0, 2, 0, 2, 1, 0, 2, 0, 0, 2, 1, 1, 2, 0, 1, 2, 2, 2, 2, 0, 2, 2, 2, 2, 1, 2, 0, 1, 1, 1, 0, 0]


In [3]:
# tokenize the dataset

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

tokenized_dataset = {}
for split in splits:
    tokenized_dataset[split] = dataset[split].map(
        lambda x: tokenizer(x["text"], truncation=True), batched=True
    )

Map:   0%|          | 0/35 [00:00<?, ? examples/s]

In [4]:
# load a model

model = AutoModelForSequenceClassification.from_pretrained(
    'distilbert-base-uncased',
    num_labels=3,
    id2label={0: 'restaurant_reviews', 1: 'nutrition_info', 2: 'accept_reservations'},
    label2id={'restaurant_reviews': 0, 'nutrition_info': 1, 'accept_reservations': 2},
)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [5]:
# evaluate the model on the subset dataset

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
#     print(predictions)
    predictions = np.argmax(predictions, axis=-1)
    return {"accuracy": (predictions == labels).mean()}

trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./data/intent_analysis",
        learning_rate=0.001,
        per_device_train_batch_size=8,
        per_device_eval_batch_size=8,
        num_train_epochs=10, 
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
#     data_collator=DataCollatorWithPadding(tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()

# RESULTS WITH 1 EPOCH AND NO PADDING:
# 1	No log	1.482860	0.228571

# TrainOutput(global_step=115, training_loss=1.7422258460003397, 
# metrics={'train_runtime': 442.9046, 'train_samples_per_second': 0.26, 'train_steps_per_second': 0.26, 
# 'total_flos': 319272822468.0, 'train_loss': 1.7422258460003397, 'epoch': 1.0})

# RESULTS WITH 10 EPOCHS AND NO PADDING AND LR=0.001:

# Epoch	Training Loss	Validation Loss	Accuracy
# 1	No log	1.132571	0.314286
# 2	No log	1.124154	0.314286
# 3	No log	1.161006	0.228571
# 4	No log	1.131212	0.228571
# 5	No log	1.107636	0.228571
# 6	No log	1.138629	0.228571
# 7	No log	1.114104	0.228571
# 8	No log	1.120871	0.228571
# 9	No log	1.129556	0.314286
# 10	No log	1.118878	0.314286


You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.132571,0.314286
2,No log,1.124154,0.314286
3,No log,1.161006,0.228571
4,No log,1.131212,0.228571
5,No log,1.107636,0.228571
6,No log,1.138629,0.228571
7,No log,1.114104,0.228571
8,No log,1.120871,0.228571
9,No log,1.129556,0.314286
10,No log,1.118878,0.314286


TrainOutput(global_step=150, training_loss=1.1418226114908854, metrics={'train_runtime': 576.777, 'train_samples_per_second': 1.994, 'train_steps_per_second': 0.26, 'total_flos': 4204621262502.0, 'train_loss': 1.1418226114908854, 'epoch': 10.0})

In [6]:
trainer.evaluate()

# {'eval_loss': 1.1076363325119019,
#  'eval_accuracy': 0.22857142857142856,
#  'eval_runtime': 3.1586,
#  'eval_samples_per_second': 11.081,
#  'eval_steps_per_second': 1.583,
#  'epoch': 10.0}

{'eval_loss': 1.1076363325119019,
 'eval_accuracy': 0.22857142857142856,
 'eval_runtime': 3.1586,
 'eval_samples_per_second': 11.081,
 'eval_steps_per_second': 1.583,
 'epoch': 10.0}

## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [8]:
# print(model)

In [9]:
# set up LoRA

lora_config = LoraConfig(
    r=8, 
    lora_alpha=32,
    lora_dropout=0.05, 
    # I am targeting a lot of layers because the initial model is so poor
    target_modules=['word_embeddings', 'position_embeddings', 'q_lin', 'k_lin', 'v_lin'],
    bias='none',
    task_type="CAUSAL_LM" 
)

model = AutoModelForSequenceClassification.from_pretrained(
    'distilbert-base-uncased',
    num_labels=3,
    id2label={0: 'restaurant_reviews', 1: 'nutrition_info', 2: 'accept_reservations'},
    label2id={'restaurant_reviews': 0, 'nutrition_info': 1, 'accept_reservations': 2},
)

lora_model = get_peft_model(model, lora_config)

lora_model.print_trainable_parameters()
# trainable params: 481,744 || all params: 67,437,523 || trainable%: 0.7143560121566149

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 481,744 || all params: 67,437,523 || trainable%: 0.7143560121566149


In [10]:
# train the LoRA model
trainerLoRA = Trainer(
    model=lora_model,
    args=TrainingArguments(
        output_dir="./data/peft_intent_analysis",
        learning_rate=0.001,
        per_device_train_batch_size=8,
        per_device_eval_batch_size=8,
        num_train_epochs=10,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
#     data_collator=DataCollatorWithPadding(tokenizer),
    compute_metrics=compute_metrics,
)

trainerLoRA.train()

# RESULTS WITH 10 EPOCHS, NO PADDING:
# Epoch	Training Loss	Validation Loss	Accuracy
# 1	No log	0.932374	0.914286
# 2	No log	0.525102	0.971429
# 3	No log	0.263483	0.971429
# 4	No log	0.150468	0.971429
# 5	No log	0.138190	0.971429
# 6	No log	0.120842	0.971429
# 7	No log	0.123744	0.971429
# 8	No log	0.125688	0.971429
# 9	No log	0.114711	0.971429
# 10	No log	0.111369	0.971429

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.932374,0.914286
2,No log,0.525102,0.971429
3,No log,0.263483,0.971429
4,No log,0.150468,0.971429
5,No log,0.13819,0.971429
6,No log,0.120842,0.971429
7,No log,0.123744,0.971429
8,No log,0.125688,0.971429
9,No log,0.114711,0.971429
10,No log,0.111369,0.971429


TrainOutput(global_step=150, training_loss=0.2744485473632812, metrics={'train_runtime': 273.786, 'train_samples_per_second': 4.2, 'train_steps_per_second': 0.548, 'total_flos': 4251594192966.0, 'train_loss': 0.2744485473632812, 'epoch': 10.0})

In [11]:
trainerLoRA.evaluate()
# {'eval_loss': 0.11136851459741592,
#  'eval_accuracy': 0.9714285714285714,
#  'eval_runtime': 2.3284,
#  'eval_samples_per_second': 15.031,
#  'eval_steps_per_second': 2.147,
#  'epoch': 10.0}

# this is much better than the original model on both loss and accuracy

{'eval_loss': 0.11136851459741592,
 'eval_accuracy': 0.9714285714285714,
 'eval_runtime': 2.3284,
 'eval_samples_per_second': 15.031,
 'eval_steps_per_second': 2.147,
 'epoch': 10.0}

In [24]:
# save LoRA model
# model.save_model("./data/LoRA_model3")
lora_model.save_pretrained("./data/LoRA_model4")

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [30]:
# load the saved LoRA model

# saved_model = AutoModelForSequenceClassification.from_pretrained("./data/LoRA_Model")
# from peft import PeftModel
# saved_model = PeftModel.from_pretrained("./data/LoRA_Model")

# saved_model.eval()

# I AM HERE - https://knowledge.udacity.com/questions/1038277

# saved_tokenizer = AutoTokenizer.from_pretrained('./data/LoRA_model')
# saved_model = AutoModelForSequenceClassification.from_pretrained('./data/peft_intent_analysis/checkpoint-150')

saved_model = AutoModelForSequenceClassification.from_pretrained('./data/LoRA_model4',
    num_labels=3,
    id2label={0: 'restaurant_reviews', 1: 'nutrition_info', 2: 'accept_reservations'},
    label2id={'restaurant_reviews': 0, 'nutrition_info': 1, 'accept_reservations': 2},
)

# saved_config = LoraConfig.from_json_file("./data/LoRA_model4/adapter_config.json")
# saved_config=LoraConfig(saved_config)
# config1=LoraConfig(config)



Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [37]:
# freeze parameters so nothing gets updated

for param in saved_model.base_model.parameters():
    param.requires_grad = True

In [39]:
# train the model on the test set again on one epoch only
trainer_saved_model = Trainer(
    model=saved_model,
    args=TrainingArguments(
        output_dir="./data/LoRA_model4",
        learning_rate=0.001,
        per_device_train_batch_size=8,
        per_device_eval_batch_size=8,
        num_train_epochs=10,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
#     data_collator=DataCollatorWithPadding(tokenizer),
    compute_metrics=compute_metrics,
)

trainer_saved_model.train()

# RESULTS - 10 EPOCHS:
# Epoch	Training Loss	Validation Loss	Accuracy
# 1	No log	1.072840	0.457143
# 2	No log	1.212983	0.228571
# 3	No log	1.181420	0.314286
# 4	No log	1.169748	0.228571
# 5	No log	1.139324	0.228571
# 6	No log	1.116392	0.228571
# 7	No log	1.114447	0.228571
# 8	No log	1.148039	0.228571
# 9	No log	1.146256	0.228571
# 10	No log	1.127559	0.228571


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.248721,0.228571




FileNotFoundError: [Errno 2] No such file or directory: './data/LoRA_model4/checkpoint-15/pytorch_model.bin'

In [62]:
# evaluate the model on the test set
# I AM HERE

trainer_saved_model.evaluate()
# {'eval_loss': 1.0728404521942139,
#  'eval_accuracy': 0.45714285714285713,
#  'eval_runtime': 7.5757,
#  'eval_samples_per_second': 4.62,
#  'eval_steps_per_second': 0.66,
#  'epoch': 10.0}

{'eval_loss': 1.0728404521942139,
 'eval_accuracy': 0.45714285714285713,
 'eval_runtime': 7.5757,
 'eval_samples_per_second': 4.62,
 'eval_steps_per_second': 0.66,
 'epoch': 10.0}

In [None]:
# final notes