<a href="https://colab.research.google.com/github/chitraju-chaithanya/FineTuneSequentialModel/blob/main/FineTuneSequentialModel.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Apply Lightweight Fine-Tuning to a Foundation Model - LoRA



1. Using the Hugging Face ecosystem to fine-tune a language model to classify text as ‘positive’ or ‘negative’.

2. Fine-tuning distilbert-base-uncased, a ~70M parameter model based on BERT.

3. Trasfer learning is employed to replace the base model head with a classification head.

4. LoRA to fine-tune the model.

In [1]:
pip install evaluate

Collecting evaluate
  Downloading evaluate-0.4.1-py3-none-any.whl (84 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets>=2.0.0 (from evaluate)
  Downloading datasets-2.17.1-py3-none-any.whl (536 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m536.7/536.7 kB[0m [31m18.4 MB/s[0m eta [36m0:00:00[0m
Collecting dill (from evaluate)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m18.8 MB/s[0m eta [36m0:00:00[0m
Collecting multiprocess (from evaluate)
  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m21.0 MB/s[0m eta [36m0:00:00[0m
Collecting responses<0.19 (from evaluate)
  Downloading responses-0.18.0-py3-none-any.whl (38 kB)
Installing collected packages: dill, responses, 

In [2]:
pip install scikit-learn



In [3]:
pip install peft

Collecting peft
  Downloading peft-0.8.2-py3-none-any.whl (183 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/183.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━[0m [32m92.2/183.4 kB[0m [31m2.7 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m183.4/183.4 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
Collecting accelerate>=0.21.0 (from peft)
  Downloading accelerate-0.27.2-py3-none-any.whl (279 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m280.0/280.0 kB[0m [31m13.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: accelerate, peft
Successfully installed accelerate-0.27.2 peft-0.8.2


In [4]:
pip install accelerate -U



In [5]:
! pip install -U accelerate
! pip install -U transformers

Collecting transformers
  Downloading transformers-4.38.1-py3-none-any.whl (8.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.5/8.5 MB[0m [31m30.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.37.2
    Uninstalling transformers-4.37.2:
      Successfully uninstalled transformers-4.37.2
Successfully installed transformers-4.38.1


In [29]:

#Load imports

from datasets import load_dataset, DatasetDict, Dataset

from transformers import (
    AutoTokenizer,
    AutoConfig,
    AutoModelForSequenceClassification,
    DataCollatorWithPadding,
    TrainingArguments,
    Trainer)

from peft import PeftModel, PeftConfig, get_peft_model, LoraConfig
import torch
import numpy as np

In [30]:
#Load base model

model_checkpoint = 'distilbert-base-uncased'

# define label maps
id2label = {0: "Negative", 1: "Positive"}
label2id = {"Negative":0, "Positive":1}

# generate classification model from model_checkpoint
model = AutoModelForSequenceClassification.from_pretrained(
    model_checkpoint, num_labels=2, id2label=id2label, label2id=label2id)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [31]:
#Load data set


# load dataset
dataset = load_dataset("shawhin/imdb-truncated")

# dataset =
# DatasetDict({
#     train: Dataset({
#         features: ['label', 'text'],
#         num_rows: 1000
#     })
#     validation: Dataset({
#         features: ['label', 'text'],
#         num_rows: 1000
#     })
# })

In [32]:
# create tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, add_prefix_space=True)

In [33]:
# create tokenize function
def tokenize_function(examples):
    # extract text
    text = examples["text"]

    #tokenize and truncate text
    tokenizer.truncation_side = "left"
    tokenized_inputs = tokenizer(
        text,
        return_tensors="np",
        truncation=True,
        max_length=512
    )

    return tokenized_inputs

# add pad token if none exists
if tokenizer.pad_token is None:
    tokenizer.add_special_tokens({'pad_token': '[PAD]'})
    model.resize_token_embeddings(len(tokenizer))

# tokenize training and validation datasets
tokenized_dataset = dataset.map(tokenize_function, batched=True)


# tokenized_dataset =
# DatasetDict({
#     train: Dataset({
#        features: ['label', 'text', 'input_ids', 'attention_mask'],
#         num_rows: 1000
#     })
#     validation: Dataset({
#         features: ['label', 'text', 'input_ids', 'attention_mask'],
#         num_rows: 1000
#     })
# })

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

In [34]:
# create data collator
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

In [35]:
import evaluate

# import accuracy evaluation metric
accuracy = evaluate.load("accuracy")

# define an evaluation function to pass into trainer later
def compute_metrics(p):
    predictions, labels = p
    predictions = np.argmax(predictions, axis=1)

    return {"accuracy": accuracy.compute(predictions=predictions,
                                          references=labels)}

In [36]:
# hyperparameters
lr = 1e-3 # size of optimization step
batch_size = 4 # number of examples processed per optimziation step
num_epochs = 4 # number of times model runs through training data

# define training arguments
training_args = TrainingArguments(
    output_dir= model_checkpoint + "-lora-text-classificationv1-without-finetunning",
    learning_rate=lr,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=num_epochs,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

In [37]:
# creater trainer object
trainer = Trainer(
    model=model, # our peft model
    args=training_args, # hyperparameters
    train_dataset=tokenized_dataset["train"], # training data
    eval_dataset=tokenized_dataset["validation"], # validation data
    tokenizer=tokenizer, # define tokenizer
    data_collator=data_collator, # this will dynamically pad examples in each batch to be equal length
    compute_metrics=compute_metrics, # evaluates model using compute_metrics() function from before
)

# train model without fine tunning.
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.989105,{'accuracy': 0.5}
2,0.738000,0.695266,{'accuracy': 0.5}
3,0.738000,0.693339,{'accuracy': 0.5}
4,0.696900,0.693998,{'accuracy': 0.5}


Trainer is attempting to log a value of "{'accuracy': 0.5}" of type <class 'dict'> for key "eval/accuracy" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
Checkpoint destination directory distilbert-base-uncased-lora-text-classificationv1-without-finetunning/checkpoint-250 already exists and is non-empty. Saving will proceed but saved results may be invalid.
Trainer is attempting to log a value of "{'accuracy': 0.5}" of type <class 'dict'> for key "eval/accuracy" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
Checkpoint destination directory distilbert-base-uncased-lora-text-classificationv1-without-finetunning/checkpoint-500 already exists and is non-empty. Saving will proceed but saved results may be invalid.
Trainer is attempting to log a value of "{'accuracy': 0.5}" of type <class 'dict'> for key "eval/accuracy" as a scalar. This invocation of Tensorboard's wri

TrainOutput(global_step=1000, training_loss=0.7174591064453125, metrics={'train_runtime': 246.9115, 'train_samples_per_second': 16.2, 'train_steps_per_second': 4.05, 'total_flos': 438218713178880.0, 'train_loss': 0.7174591064453125, 'epoch': 4.0})

In [38]:
# Show the performance of the model on the test set
trainer.evaluate()

Trainer is attempting to log a value of "{'accuracy': 0.5}" of type <class 'dict'> for key "eval/accuracy" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.


{'eval_loss': 0.6933394074440002,
 'eval_accuracy': {'accuracy': 0.5},
 'eval_runtime': 15.0816,
 'eval_samples_per_second': 66.306,
 'eval_steps_per_second': 16.577,
 'epoch': 4.0}

In [39]:
import torch

# Define the device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# define list of examples
text_list = ["It was good.", "Not a fan, don't recommed.",
"Better than the first one.", "This is not worth watching even once.",
"This one is a pass."]

print("model predictions without finetunning:")
print("----------------------------")
for text in text_list:
    # tokenize text
    inputs = tokenizer.encode(text, return_tensors="pt").to(device)
    # compute logits
    logits = model(inputs).logits
    # convert logits to label
    predictions = torch.argmax(logits)

    print(text + " - " + id2label[predictions.tolist()])


model predictions without finetunning:
----------------------------
It was good. - Negative
Not a fan, don't recommed. - Negative
Better than the first one. - Negative
This is not worth watching even once. - Negative
This one is a pass. - Negative


In [41]:
#Fine-tuning with LoRA

peft_config = LoraConfig(task_type="SEQ_CLS", # sequence classification
                        r=4, # intrinsic rank of trainable weight matrix
                        lora_alpha=32, # this is like a learning rate
                        lora_dropout=0.01, # probablity of dropout
                        target_modules = ['q_lin']) # we apply lora to query layer only

#Load base model

model_checkpoint = 'distilbert-base-uncased'

# define label maps
id2label = {0: "Negative", 1: "Positive"}
label2id = {"Negative":0, "Positive":1}

# generate classification model from model_checkpoint
model = AutoModelForSequenceClassification.from_pretrained(
    model_checkpoint, num_labels=2, id2label=id2label, label2id=label2id)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [42]:
#New trainable model with trainable params.

model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

trainable params: 628,994 || all params: 67,584,004 || trainable%: 0.9306847223789819


In [43]:
# hyperparameters
lr = 1e-3 # size of optimization step
batch_size = 4 # number of examples processed per optimziation step
num_epochs = 4 # number of times model runs through training data

# define training arguments
training_args = TrainingArguments(
    output_dir= model_checkpoint + "-lora-text-classificationv1",
    learning_rate=lr,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=num_epochs,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

In [44]:
# creater trainer object
trainer = Trainer(
    model=model, # our peft model
    args=training_args, # hyperparameters
    train_dataset=tokenized_dataset["train"], # training data
    eval_dataset=tokenized_dataset["validation"], # validation data
    tokenizer=tokenizer, # define tokenizer
    data_collator=data_collator, # this will dynamically pad examples in each batch to be equal length
    compute_metrics=compute_metrics, # evaluates model using compute_metrics() function from before
)

# train model
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.408457,{'accuracy': 0.879}
2,0.395100,0.427934,{'accuracy': 0.894}
3,0.395100,0.525885,{'accuracy': 0.901}
4,0.106700,0.555101,{'accuracy': 0.899}


Trainer is attempting to log a value of "{'accuracy': 0.879}" of type <class 'dict'> for key "eval/accuracy" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
Checkpoint destination directory distilbert-base-uncased-lora-text-classificationv1/checkpoint-250 already exists and is non-empty. Saving will proceed but saved results may be invalid.
Trainer is attempting to log a value of "{'accuracy': 0.894}" of type <class 'dict'> for key "eval/accuracy" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
Checkpoint destination directory distilbert-base-uncased-lora-text-classificationv1/checkpoint-500 already exists and is non-empty. Saving will proceed but saved results may be invalid.
Trainer is attempting to log a value of "{'accuracy': 0.901}" of type <class 'dict'> for key "eval/accuracy" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so w

TrainOutput(global_step=1000, training_loss=0.25089546585083006, metrics={'train_runtime': 183.5504, 'train_samples_per_second': 21.792, 'train_steps_per_second': 5.448, 'total_flos': 444610902443520.0, 'train_loss': 0.25089546585083006, 'epoch': 4.0})

In [45]:
# Show the performance of the fine tuned model on the test set
trainer.evaluate()

Trainer is attempting to log a value of "{'accuracy': 0.879}" of type <class 'dict'> for key "eval/accuracy" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.


{'eval_loss': 0.40845710039138794,
 'eval_accuracy': {'accuracy': 0.879},
 'eval_runtime': 15.3821,
 'eval_samples_per_second': 65.01,
 'eval_steps_per_second': 16.253,
 'epoch': 4.0}

In [46]:
#save the trained model and its tokenizer

model.save_pretrained("peftmodelV18")
tokenizer.save_pretrained("peftmodelV18")

('peftmodelV18/tokenizer_config.json',
 'peftmodelV18/special_tokens_map.json',
 'peftmodelV18/vocab.txt',
 'peftmodelV18/added_tokens.json',
 'peftmodelV18/tokenizer.json')

In [47]:
#Load the saved model

import torch
from peft import PeftModel,PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import AutoPeftModelForSequenceClassification

peft_model_id = "peftmodelV18"

config = PeftConfig.from_pretrained(peft_model_id)

model = AutoModelForSequenceClassification.from_pretrained(config.base_model_name_or_path, return_dict=True)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

model = PeftModel.from_pretrained(model, peft_model_id)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [48]:
print("Trained model predictions:")
print("--------------------------")
for text in text_list:
    inputs = tokenizer.encode(text, return_tensors="pt")

    logits = model(inputs).logits
    predictions = torch.max(logits,1).indices

    print(text + " - " + id2label[predictions.tolist()[0]])

# Output:
# Trained model predictions:
# ----------------------------
# It was good. - Positive
# Not a fan, don't recommed. - Negative
# Better than the first one. - Positive
# This is not worth watching even once. - Negative
# This one is a pass. - Positive # this one is tricky

Trained model predictions:
--------------------------
It was good. - Positive
Not a fan, don't recommed. - Negative
Better than the first one. - Positive
This is not worth watching even once. - Negative
This one is a pass. - Negative


# Results

Before fine tunning:


```
Epoch	Training Loss	Validation Loss	Accuracy
1	No log	0.989105	{'accuracy': 0.5}
2	0.738000	0.695266	{'accuracy': 0.5}
3	0.738000	0.693339	{'accuracy': 0.5}
4	0.696900	0.693998	{'accuracy': 0.5}
```



```
Trainer is attempting to log a value of "{'accuracy': 0.5}" of type <class 'dict'> for key "eval/accuracy" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
{'eval_loss': 0.6933394074440002,
 'eval_accuracy': {'accuracy': 0.5},
 'eval_runtime': 15.0816,
 'eval_samples_per_second': 66.306,
 'eval_steps_per_second': 16.577,
 'epoch': 4.0}
```




```
model predictions without finetunning:
----------------------------
It was good. - Negative
Not a fan, don't recommed. - Negative
Better than the first one. - Negative
This is not worth watching even once. - Negative
This one is a pass. - Negative
```




Results on fine tunning a model:



```

Epoch	Training Loss	Validation Loss	Accuracy
1	No log	0.408457	{'accuracy': 0.879}
2	0.395100	0.427934	{'accuracy': 0.894}
3	0.395100	0.525885	{'accuracy': 0.901}
4	0.106700	0.555101	{'accuracy': 0.899}
```



```
Trainer is attempting to log a value of "{'accuracy': 0.879}" of type <class 'dict'> for key "eval/accuracy" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
{'eval_loss': 0.40845710039138794,
 'eval_accuracy': {'accuracy': 0.879},
 'eval_runtime': 15.3821,
 'eval_samples_per_second': 65.01,
 'eval_steps_per_second': 16.253,
 'epoch': 4.0}
```



```
Fine Tuned - Trained model predictions:
--------------------------
It was good. - Positive
Not a fan, don't recommed. - Negative
Better than the first one. - Positive
This is not worth watching even once. - Negative
This one is a pass. - Negative
```

Overrol better results have been achievied w.r.t training loss, validation loss, accuracy and model prediction by fine tunning the model.





References:
1. https://towardsdatascience.com/fine-tuning-large-language-models-llms-23473d763b91
2. https://colab.research.google.com/drive/14xo6sj4dARk8lXZbOifHEn1f_70qNAwy?usp=sharing
3. https://www.youtube.com/watch?v=Us5ZFp16PaU
4. Various online resources and LLMS documentation.
5. OpenAI & Gemini
