# Parameter-Efficient Fine-Tuning of BERT for Text Classification using QLoRA

## Objective:

The main objective of this project is to apply parameter-efficient fine-tuning (PEFT) techniques, speficially QLoRA (Quantized Low-Rank Adaptation), to fine-tune a `BERT-base language model for a text classification task` while significantly reducing the computational and memory requrements typically associated with fine-tuning large language models (LLMs).

## Background and Motivation:

Large Language Models (LLMs) like BERT have revolutionalized NLP tasks such as sentiment analysis, question answering, and classification tasks. Howeverr, full fine-tuning of such models is computationally expensive and requires vast resources. To Ensure fine-tuning feasible for resource-contrained environments (e.g., personal machines or small servers), researchers have developed PEFT methods like LoRA and its optimized variant QLoRA.

QLoRA uses quantization (e.g., 4-bit quantization) and low-rank adaptations to fine-tune only a small portion of the model, reducing both memory and time costs without significatly compromising accuracy.


## Tasks:

1. Data Preparation and Tokenizatiion:
   * Load a binary classification dataset: [From HuggingFace](https://huggingface.co/datasets/dipanjanS/imdb_sentiment_finetune_dataset20k) and [KaggleHub](https://www.kaggle.com/datasets/bhavikjikadara/imdb-dataset-sentiment-analysis)
   * Preprocess the data: tokenize the input text using a tokenizer compatible with BERT or HuggingFace models.
   * Split dataset into train and test sets.
2. Model Setup:
   * Load a pre-trained bert-base-uncased model using HuggingFace Transformers.
   * Apply 4-bit quantization uising bitsandbytes.
   * Integrate QLoRA adapters with LoRA configuration.
3. Training the Model:
   * Configure a Trainer using HuggingFace's transformers.Trainer API
   * Train the model using parameter-efficient stratagies.
   * Monitor evaluation metrics such as loss and accuracy. Logging training to Weights and biases.
4. Evaluation:
   * Evaluate the fine-tuned model on the test set
   * Compare peformance in terms of classification accuracy and memory/parameter efficiency.
5. Analysis and Interpretation:
   * Analyze the number of trainable parameters before and after applying QLoRA.
   * Determine the Memory efficiency gained by using QLoRA instead of full fine tuning

# QLoRA fine-tuning of a BERT SLM for Classification

![](https://i.imgur.com/2Kw1yTZ.gif)

Transfer Learning is the power of leveraging already trained models and tune \ adapt them to our own downstream tasks. 

Digs up how to fine-tune a simple BERT Small Language Model (SLM) step by step for a simple yet essential task in NLP - Text Classification for Sentiment Analysis 

Instead of full-finetuning, PEFT methodologies used, more notably the Quantized Low-Rank Adaptation (QLoRA) technique

In [1]:
!pip install torch==2.8.0



In [2]:
import torch
torch.cuda.empty_cache()

In [3]:
# Check for Compute power (CUDA)
!nvidia-smi

zsh:1: command not found: nvidia-smi


# 1. HuggingFace Environment Setup

In [4]:
# ------------------------------------------------------- HuggingFace API Environment Setup ----------------------------------------------------
os.environ["HUGGINGFACE_API_KEY"] = open('HUGGINGFACE_API_TOKEN.txt','r').read()
os.environ["TOKENIZERS_PARALLELISM"] = "false"
# os.environ["TRANSFORMERS_OFFLINE"] = "1"

## 2. Load Datasets

In [5]:
import pandas as pd
from datasets import load_dataset


SEED = 42

# Load Dataset
dataset = pd.read_csv('/Users/emmanueldanielchonza/Documents/Parameter-Efficient-Fine-tuning-LLMs/data/IMDB_dataset.csv', encoding='utf-8')

# View the first 5 rows
dataset.head()

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. <br /><br />The...,positive
2,I thought this was a wonderful way to spend ti...,positive
3,Basically there's a family where a little boy ...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive


In [6]:
# View the dataset shape
dataset.shape

(50000, 2)

### Get manageable data size for computation efficiency

In [7]:
# Get Sample of 20000 values
data_sample = dataset.sample(frac=0.4)
data_sample.shape

(20000, 2)

In [8]:
# Train and Test Data Split
train_df = data_sample[:12000]
test_df = data_sample[12000:]

# Print Dataset shapes
print(f"The Train set shape: {train_df.shape}")
print(f"The test Set Shape is: {test_df.shape}")

The Train set shape: (12000, 2)
The test Set Shape is: (8000, 2)


In [9]:
train_df.head()

Unnamed: 0,review,sentiment
30379,I had numerous problems with this film.<br /><...,negative
381,This film was pretty good. I am not too big a ...,positive
11668,I remember when this came out a lot of kids we...,negative
9179,I saw this movie in the theater when I was a k...,negative
21458,I absolutely love Promised Land. The first epi...,positive


In [10]:
test_df.head()

Unnamed: 0,review,sentiment
32706,"For me an unsatisfactory, unconvincing heist m...",negative
36025,Without a doubt one of the worst movies I've s...,negative
25702,Saw this movie when it came out and then a cou...,positive
1737,I don't have much to say about this movie. It ...,negative
33090,I'd never seen an independent movie and I was ...,positive


In [11]:
# Check for missing values in the train set
train_df.isna().mean()

review       0.0
sentiment    0.0
dtype: float64

In [12]:
# Check for missing values in the test set
test_df.isna().mean()

review       0.0
sentiment    0.0
dtype: float64

In [13]:
# Check for datatype
train_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 12000 entries, 30379 to 36178
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   review     12000 non-null  object
 1   sentiment  12000 non-null  object
dtypes: object(2)
memory usage: 281.2+ KB


##### Visible that columns have object datatype

# Dataset Preprocessing

In [14]:
# Map LABEL2ID and ID2LABEL

LABEL2ID = {'positive': 1, 'negative': 0}
ID2LABEL = {1: 'positive', 0: 'negative'}

# Create the label column
train_df['label'] = train_df['sentiment'].map(LABEL2ID)
test_df['label'] = test_df['sentiment'].map(LABEL2ID)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  train_df['label'] = train_df['sentiment'].map(LABEL2ID)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_df['label'] = test_df['sentiment'].map(LABEL2ID)


In [15]:
# View the train set with a new column
train_df.head(25)

Unnamed: 0,review,sentiment,label
30379,I had numerous problems with this film.<br /><...,negative,0
381,This film was pretty good. I am not too big a ...,positive,1
11668,I remember when this came out a lot of kids we...,negative,0
9179,I saw this movie in the theater when I was a k...,negative,0
21458,I absolutely love Promised Land. The first epi...,positive,1
1887,A young couple -- father Ben (solid Charles Ba...,positive,1
10967,"STRANGER THAN FICTION angered me so much, I si...",negative,0
21935,I decided I need to lengthen up my review for ...,positive,1
13933,"Directed by Michael Curtiz, Four Daughters is ...",positive,1
6900,Dolemite may not have been the first black exp...,positive,1


In [16]:
test_df.head(25)

Unnamed: 0,review,sentiment,label
32706,"For me an unsatisfactory, unconvincing heist m...",negative,0
36025,Without a doubt one of the worst movies I've s...,negative,0
25702,Saw this movie when it came out and then a cou...,positive,1
1737,I don't have much to say about this movie. It ...,negative,0
33090,I'd never seen an independent movie and I was ...,positive,1
6474,Directed by the younger brother of great direc...,positive,1
12111,"I generally love this type of movie. However, ...",negative,0
26517,"After watching many of the ""Next Action Star"" ...",negative,0
9763,The Mascot is Ladislaw Starewicz's masterpiece...,positive,1
2054,I loved Long Way Round and wasn't even aware o...,positive,1


In [17]:
# View the data distribution in the target column
train_df['sentiment'].value_counts()

sentiment
positive    6053
negative    5947
Name: count, dtype: int64

In [18]:
# View the data distribution in the target column
test_df['sentiment'].value_counts()

sentiment
positive    4049
negative    3951
Name: count, dtype: int64

Dataset nearly perfectly distributed

In [19]:
# Save train and test split datasets
train_df

Unnamed: 0,review,sentiment,label
30379,I had numerous problems with this film.<br /><...,negative,0
381,This film was pretty good. I am not too big a ...,positive,1
11668,I remember when this came out a lot of kids we...,negative,0
9179,I saw this movie in the theater when I was a k...,negative,0
21458,I absolutely love Promised Land. The first epi...,positive,1
...,...,...,...
41673,"On one level, this film can bring out the chil...",positive,1
28736,Oh mY God That has got to be one of the Most U...,negative,0
320,"""Quitting"" may be as much about exiting a pre-...",positive,1
10027,I have just seen this movie and have not read ...,negative,0


# Load Model and Tokenizer

In [20]:
from transformers import AutoTokenizer
from transformers import logging

# Get the model and instantiate tokenizer
model_checkpoint = "google-bert/bert-base-uncased"

logging.set_verbosity_error()  # suppress info/progress messages

tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)

In [21]:
tokenizer

BertTokenizerFast(name_or_path='google-bert/bert-base-uncased', vocab_size=30522, model_max_length=512, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'}, clean_up_tokenization_spaces=False, added_tokens_decoder={
	0: AddedToken("[PAD]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	100: AddedToken("[UNK]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	101: AddedToken("[CLS]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	102: AddedToken("[SEP]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	103: AddedToken("[MASK]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
)

# Convert Data to HuggingFace Datasets

In [22]:
from datasets import Dataset, DatasetDict


# Convert to Hugging Face Dataset
dataset = DatasetDict({
    'train': Dataset.from_pandas(train_df[['review', 'label']], preserve_index=False),
    'test': Dataset.from_pandas(test_df[['review', 'label']], preserve_index=False),
})

In [23]:
dataset

DatasetDict({
    train: Dataset({
        features: ['review', 'label'],
        num_rows: 12000
    })
    test: Dataset({
        features: ['review', 'label'],
        num_rows: 8000
    })
})

## Save HuggingFace Formatted dataset

In [24]:
from datasets import load_dataset, DatasetDict

# Save the dataset locally
dataset.save_to_disk("/Users/emmanueldanielchonza/Documents/Parameter-Efficient-Fine-tuning-LLMs/data")

Saving the dataset (0/1 shards):   0%|          | 0/12000 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/8000 [00:00<?, ? examples/s]

In [25]:
from datasets import load_from_disk

dataset = load_from_disk("/Users/emmanueldanielchonza/Documents/Parameter-Efficient-Fine-tuning-LLMs/data")
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['review', 'label'],
        num_rows: 12000
    })
    test: Dataset({
        features: ['review', 'label'],
        num_rows: 8000
    })
})


In [26]:
# View the data split
dataset.keys()

dict_keys(['train', 'test'])

In [27]:
# Looking at the first two rows of the train dataset
dataset['train'][:1]

{'review': ['I had numerous problems with this film.<br /><br />It contains some basic factual information concerning quantum mechanics, which is fine. Although quantum physics has been around for over 50 years, the film presents this information in a grandiose way that seems to be saying: "Aren\'t you just blown away by this!" Well, not really. These aren\'t earth shattering revelations anymore. At any rate, I was already familiar with quantum theory, and the fact that particles have to be described by wave equations, etc. is not new.<br /><br />The main problem I have with this movie, however, is the way these people use quantum theory as a way of providing a scientific basis for mysticism and spiritualism. I don\'t have any serious problem with mysticism and spiritualism, but quantum mechanics doesn\'t really have anything to do with these things, and it should be kept separate. The people they interviewed for this movie start with the ideas of quantum theory and then make the leap 

# Create the Evaluation Metrics

In [28]:
import evaluate

# Key Classification Task Metrics
metric1 = evaluate.load("precision")
metric2 = evaluate.load("recall")
metric3 = evaluate.load("f1")
metric4 = evaluate.load("accuracy")

# Create a function to compute metrics
def evaluate_performance(predictions, references):
    precision = metric1.compute(predictions=predictions, references=references, average="macro")["precision"]
    recall = metric2.compute(predictions=predictions, references=references, average="macro")["recall"]
    f1 = metric3.compute(predictions=predictions, references=references, average="macro")["f1"]
    accuracy = metric4.compute(predictions=predictions, references=references)["accuracy"]
    return {"precision": precision, "recall": recall, "f1": f1, "accuracy": accuracy}

In [29]:
# Test the function
preds = [1,0,1,1,0]
actuals = [1,1,0,1,0]
scores = evaluate_performance(
    predictions=preds, references=actuals
)

# View scores
scores

{'precision': 0.5833333333333333,
 'recall': 0.5833333333333333,
 'f1': 0.5833333333333333,
 'accuracy': 0.6}

# Create Tokens of the reviews

In [30]:
# Check the tokenization functionality
tokenizer("Hello, this is a sentence!")

{'input_ids': [101, 7592, 1010, 2023, 2003, 1037, 6251, 999, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]}

In [31]:
# Create a function to tokenize dataset
def preprocess_function(examples):
    # max length is 512 as that is the context window limit of BERT models
    # It can process documents of upto 512 tokens each input
    model_inputs = tokenizer(examples['review'], max_length=512, truncation=True)
    model_inputs["label"] = examples["label"]
    return model_inputs

In [32]:
# Tokenize the first row of the train set
preprocess_function(dataset["train"][:1])

{'input_ids': [[101, 1045, 2018, 3365, 3471, 2007, 2023, 2143, 1012, 1026, 7987, 1013, 1028, 1026, 7987, 1013, 1028, 2009, 3397, 2070, 3937, 25854, 2592, 7175, 8559, 9760, 1010, 2029, 2003, 2986, 1012, 2348, 8559, 5584, 2038, 2042, 2105, 2005, 2058, 2753, 2086, 1010, 1996, 2143, 7534, 2023, 2592, 1999, 1037, 2882, 10735, 2063, 2126, 2008, 3849, 2000, 2022, 3038, 1024, 1000, 4995, 1005, 1056, 2017, 2074, 10676, 2185, 2011, 2023, 999, 1000, 2092, 1010, 2025, 2428, 1012, 2122, 4995, 1005, 1056, 3011, 21797, 22191, 4902, 1012, 2012, 2151, 3446, 1010, 1045, 2001, 2525, 5220, 2007, 8559, 3399, 1010, 1998, 1996, 2755, 2008, 9309, 2031, 2000, 2022, 2649, 2011, 4400, 11380, 1010, 4385, 1012, 2003, 2025, 2047, 1012, 1026, 7987, 1013, 1028, 1026, 7987, 1013, 1028, 1996, 2364, 3291, 1045, 2031, 2007, 2023, 3185, 1010, 2174, 1010, 2003, 1996, 2126, 2122, 2111, 2224, 8559, 3399, 2004, 1037, 2126, 1997, 4346, 1037, 4045, 3978, 2005, 17477, 2964, 1998, 6259, 2964, 1012, 1045, 2123, 1005, 1056, 2031, 2

In [33]:
# Tokenized dataset
tokenized_datasets = dataset.map(preprocess_function, batched=True)

Map:   0%|          | 0/12000 [00:00<?, ? examples/s]

Map:   0%|          | 0/8000 [00:00<?, ? examples/s]

In [34]:
tokenized_datasets

DatasetDict({
    train: Dataset({
        features: ['review', 'label', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 12000
    })
    test: Dataset({
        features: ['review', 'label', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 8000
    })
})

In [35]:
# remove unnecessary columns
tokenized_datasets = tokenized_datasets.remove_columns('review')
tokenized_datasets = tokenized_datasets.remove_columns('label')

In [36]:
tokenized_datasets

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 12000
    })
    test: Dataset({
        features: ['input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 8000
    })
})

# Create the Quantization and Configure the Model

## 1. Quantization

Quantization represents data with fewer bits, making it a useful technique for reducing memory-usage and accelerating inference especially when it comes to large language models (LLMs)

Note that, after a model is quantized it isn’t typically further trained for downstream tasks because training can be unstable due to the lower precision of the weights and activations.

PEFT methods only add extra trainable parameters, allowing to train a quantized model with a PEFT adapter on top with some special training methodologies!

Note further that: Combining quantization with PEFT can be a good strategy for training even the largest models on a single GPU.

For example, `QLoRA is a method that quantizes a model to 4-bits and then trains it with LoRA`

In [37]:
!pip install bitsandbytes



In [38]:
!pip install --quiet bitsandbytes
!pip install --quiet --upgrade transformers # Install latest version of transformers
!pip install --quiet --upgrade accelerate
!pip install --quiet sentencepiece

In [39]:
!pip install transformers bitsandbytes accelerate --quiet

In [40]:
import bitsandbytes as bnb
print(bnb.__version__)

'NoneType' object has no attribute 'cadam32bit_grad_fp32'
0.42.0


  warn("The installed version of bitsandbytes was compiled without GPU support. "


In [41]:
import torch
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer, BitsAndBytesConfig
import bitsandbytes

# Create the quantize
config = BitsAndBytesConfig(
    load_in_4bit=True,  
    bnb_4bit_quant_type="nf4",  
    bnb_4bit_use_double_quant=True,  
    bnb_4bit_compute_dtype=torch.bfloat16,  # Or torch.float16 if bfloat16 unsupported
    bnb_4bit_skip_modules=["classifier", "pre_classifier"]  # skip certain modules
)

In [42]:
# import torch
# from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer, BitsAndBytesConfig

# config = BitsAndBytesConfig(
#     # Quantize the model weights to 4-bit precision upon loading, reducing memory usage.
#     load_in_4bit=True,  
#     # Use the 'Normalized Float 4' (NF4) data type, which uses a normal distribution to encode weights with just 4 bits
#     bnb_4bit_quant_type="nf4",  
#     # Apply double quantization: first quantize weights to 4-bit, then quantize the quantization constants used for quantizing weights
#     bnb_4bit_use_double_quant=True,  
#     # Utilize bfloat16 for computation, which takes less memory
#     bnb_4bit_compute_dtype=torch.bfloat16,  
#     # Skip quantization for specified modules, which will be trained separately
#     llm_int8_skip_modules=["classifier", "pre_classifier"]  
# )

In [43]:
import os

# Check current working directory
print("Current dir:", os.getcwd())

# List contents of current directory
print(os.listdir("."))

Current dir: /Users/emmanueldanielchonza/Documents/Parameter-Efficient-Fine-tuning-LLMs
['.DS_Store', 'requirements.txt', 'peft-BERT.py', 'Solutions', 'PEFT-QLoRA-BERT-Classifier.ipynb', 'README.md', 'peft_env_py311', 'HUGGINGFACE_API_TOKEN.txt', 'transformers', 'mergedQLoRA-Adapter-BERT-BaseModel.py', 'inference.py', '.ipynb_checkpoints', 'data', 'peft_env']


## 2. Model Configuration

In [44]:
# Configure the model
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint,
                                                           id2label=ID2LABEL,
                                                           label2id=LABEL2ID,
                                                           num_labels=2,
                                                           quantization_config=config)

ImportError: The installed version of bitsandbytes (<0.43.1) requires CUDA, but CUDA is not available. You may need to install PyTorch with CUDA support or upgrade bitsandbytes to >=0.43.1.

In [None]:
from peft import prepare_model_for_kbit_training

model = prepare_model_for_kbit_training(model)

In [None]:
# Create a function to print trainable parameters
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}")

In [None]:
# Print Parameters
print_trainable_parameters(model)

In [None]:
# Model architecture
model

# Train LoRA Using the QLoRA Config.

In [None]:
from peft import LoraConfig, get_peft_model, TaskType, replace_lora_weights_loftq

# Set up the LoRA configuration for the model
config = LoraConfig(
    r=8,  # Rank of the LoRA matrices; a smaller rank reduces memory usage but may affect model performance.
    lora_alpha=32,  # Scaling factor applied to the LoRA updates; helps control the contribution of the LoRA weights.
    target_modules=["q_lin", "k_lin", "v_lin", "out_lin"],  # Specify the modules (weight matrix) within the model where LoRA is applied.
    lora_dropout=0.05,  # Dropout probability for LoRA layers to prevent overfitting during training.
    bias="none",  # Specifies whether to add learnable biases to the LoRA layers.
    task_type=TaskType.SEQ_CLS  # Defines the task type, here it's set to sequence classification.
)

# Apply the LoRA configuration to the model
peft_model = get_peft_model(model, config)

# Print the number of trainable parameters in the model after applying LoRA
print_trainable_parameters(peft_model)

In [None]:
peft_model.device

In [None]:
type(peft_model)

In [None]:
# if batch size is 64
# if total documents are 8000
# total number of steps (batches of data) to complete 1 full epoch is?
12000 // 32

In [None]:
from transformers import TrainingArguments

batch_size = 32 
metric_name = "f1"

# Set up the training arguments
args = TrainingArguments(
    output_dir="distilbert-cls-qlorafinetune-runs",  # Directory where the model checkpoints and outputs will be saved.
    eval_strategy="steps",                          # Perform evaluation at regular intervals during training.
    save_strategy="steps",                          # Save the model checkpoint at regular intervals.
    learning_rate=1e-4,                             # Initial learning rate for the optimizer.
    logging_steps=20,                               # Log training metrics every 20 steps.
    eval_steps=20,                                  # Perform evaluation every 20 steps.
    save_steps=50,                                  # Save the model checkpoint every 50 steps.
    per_device_train_batch_size=batch_size,         # Batch size per GPU/TPU core/CPU during training.
    per_device_eval_batch_size=batch_size,          # Batch size per GPU/TPU core/CPU during evaluation.
    max_steps=250,                                  # Stop training after 250 total steps.
    weight_decay=0.01,                              # Apply weight decay to reduce overfitting.
    metric_for_best_model=metric_name,              # Metric to use for selecting the best model during evaluation.
    push_to_hub=False,                              # Do not push the model to the Hugging Face Hub after training.
    fp16=True,                                      # Use 16-bit floating point precision to reduce memory usage and speed up training.
    optim="paged_adamw_8bit",                       # Use an 8-bit AdamW optimizer for memory efficiency and faster computation.
)

In [None]:
from transformers import DataCollatorWithPadding
# Create the datas collator to padd tokenizs
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

In [None]:
import numpy as np

# Create the metric compute
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return evaluate_performance(predictions=predictions, references=labels)

In [None]:
# Create the trainer
trainer = Trainer(
    model=peft_model,
    args=args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets['test'],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

In [None]:
# Train the Model
trainer.train()

In [None]:
save_path = 'qlora-bert-sentiment-adapter'
trainer.save_model(save_path)

In [None]:
# remove model checkpoints
!rm -rf distilbert-cls-qlorafinetune-runs

In [None]:
!du -sh * | sort -hr | grep qlora

## Load Classification LoRA Adapter into Base Model

In [None]:
# load the base BERT model first
cls_model = AutoModelForSequenceClassification.from_pretrained('distilbert/distilbert-base-uncased',
                                                                id2label=ID2LABEL,
                                                                label2id=LABEL2ID,
                                                                num_labels=2)
tokenizer = AutoTokenizer.from_pretrained('distilbert/distilbert-base-uncased', fast=True)

In [None]:
cls_model.load_adapter(peft_model_id='qlora-bert-sentiment-adapter',
                       adapter_name='sentiment-classifier')

# Using the Fine-tuned model for Classification

In [None]:
from transformers import pipeline

# Here you can load your locally trained \ saved model
clf = pipeline(task='text-classification', 
               model=cls_model, 
               tokenizer=tokenizer, 
               device='cuda')

In [None]:
document = "The movie was not good at all"

In [None]:
clf(document)

In [None]:
document1 = "The movie was amazing"

In [None]:
clf(document1)

## Use the Fine-tuned Transformer to Make Predictions on Test Data

In [None]:
dataset['test'][:2]

In [None]:
%%time

predictions = clf(dataset['test']['review'],
                  batch_size=512, 
                  max_length=512, 
                  truncation=True)
predictions = [pred['label'] for pred in predictions]

predictions = [0 if item == 'NEGATIVE' else 1 for item in predictions]
labels = dataset['test']['sentiment']

# Evaluate the Model Performance on Test Set

In [None]:
from sklearn.metrics import confusion_matrix, classification_report

print(classification_report(labels, predictions))
pd.DataFrame(confusion_matrix(labels, predictions))

## Merge Classification LoRA Adapter into Base BERT Model

Instead of loading the LoRA model adapter weights into the base model everytime and doing inference, merge the weights directly with the weights of the base model and make a final model. 

This helps with faster inference loading both model and adapter everytime.

In [None]:
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Load the base model and fine-tuned
peft_model_id = "qlora-bert-sentiment-adapter"
config = PeftConfig.from_pretrained(save_path) # peft_model_id or save_path
base_model = AutoModelForSequenceClassification.from_pretrained(config.base_model_name_or_path,
                                                                id2label=ID2LABEL,
                                                                label2id=LABEL2ID,
                                                                num_labels=2)
# Create tokenizer
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path, fast=True)

# Load the model to device
peft_model = PeftModel.from_pretrained(base_model, save_path).to('cuda')

In [None]:
# Merge the models
merged_cls_model = peft_model.merge_and_unload()

In [None]:
# Save the model
save_path = 'merged-qlora-bert-classifier'

merged_cls_model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)

In [None]:
# load the merged BERT model 
cls_model = AutoModelForSequenceClassification.from_pretrained('merged-qlora-bert-classifier',
                                                                id2label=ID2LABEL,
                                                                label2id=LABEL2ID,
                                                                num_labels=2)

# Create tokenizer using the merged model
tokenizer = AutoTokenizer.from_pretrained('merged-qlora-bert-classifier', fast=True)

In [None]:
# Instantiate the classifier
clf = pipeline(task='text-classification', 
               model=cls_model, 
               tokenizer=tokenizer, 
               device='cuda')

In [None]:
document = "The movie was not good at all"

In [None]:
clf(document)

In [None]:
document2 = "The movie was amazing"

In [None]:
clf(document2)