# **Mastering Fine-Tuning: Leveraging LoRA, 4-bit Quantization, and PEFT for Efficient Model Training on the IMDB Dataset**
---

## **Overview**
This script demonstrates how to fine-tune a pre-trained language model using techniques like LoRA (Low-Rank Adaptation) and parameter-efficient fine-tuning (PEFT) on the IMDB movie reviews dataset. The code primarily leverages the unsloth library, which is a highly efficient training framework for large language models, combined with transformers and trl (Transformers Reinforcement Learning) libraries.

The main focus of this process is to adapt the pre-trained model using LoRA to reduce the computational and memory overhead while achieving significant performance gains. This approach allows for fine-tuning large models on limited hardware resources, such as GPUs with limited VRAM, by using 4-bit quantization.

## **Installing Required Packages**
- `unsloth`: A library for efficient training and fine-tuning of language models.
- `transformers`: The core library from Hugging Face for handling transformer models.
- `trl`: Library to assist with fine-tuning models using reinforcement learning methods.
- `torch` and `torchvision`: Ensure that the latest versions of PyTorch and its related packages are installed.

In [1]:
!pip install "unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"

Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[cu121-torch240]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-953p2zhm/unsloth_4e61c4d343c542548f34eb4d8150975d
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-953p2zhm/unsloth_4e61c4d343c542548f34eb4d8150975d
  Resolved https://github.com/unslothai/unsloth.git to commit 0c8c5ed81e423658ab9ae81eac5aab8d18f5d7af
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting bitsandbytes>=0.43.3 (from unsloth@ git+https://github.com/unslothai/unsloth.git->unsloth[cu121-torch240]@ git+https://github.com/unslothai/unsloth.git)
  Downloading bitsandbytes-0.44.1-py3-none-manylinux_2_24_x86_64.whl.metadata (3.5 kB)
Collecting xformers@ https://download.pytorch.org

In [2]:
!pip install "git+https://github.com/huggingface/transformers.git"

Collecting git+https://github.com/huggingface/transformers.git
  Cloning https://github.com/huggingface/transformers.git to /tmp/pip-req-build-0lh2g831
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers.git /tmp/pip-req-build-0lh2g831
  Resolved https://github.com/huggingface/transformers.git to commit a06a0d12636756352494b99b5b264ac9955bc735
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: transformers
  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone
  Created wheel for transformers: filename=transformers-4.47.0.dev0-py3-none-any.whl size=10051734 sha256=f9214891012553ef3ff57ee08b12ec9b9ec87b9ba43a574afccaca6e91ff70bc
  Stored in directory: /tmp/pip-ephem-wheel-cache-x9t4vg9i/wheels/e7/9c/5b/e1a9c8007c343041e61cc484433d512ea9274272e3fcbe7c16
Successfully b

In [3]:
!pip install trl



In [4]:
!pip install --upgrade torch torchvision

Collecting torch
  Downloading torch-2.5.1-cp310-cp310-manylinux1_x86_64.whl.metadata (28 kB)
Collecting torchvision
  Downloading torchvision-0.20.1-cp310-cp310-manylinux1_x86_64.whl.metadata (6.1 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5

## **1. Importing Libraries and Disabling Weights & Biases Logging**
- Importing required libraries:
  - `datasets`: For loading and handling datasets.
  - `torch`: The fundamental library for deep learning in Python.
  - `FastLanguageModel`: An optimized wrapper around language models provided by unsloth.
  - `SFTTrainer`: A specialized trainer for supervised fine-tuning (SFT).
  - `TrainingArguments`: Configurations for training models using Hugging Face’s transformers.
- Disables Weights & Biases (a popular experiment tracking tool) since it's not needed for this run.



In [5]:
from datasets import load_dataset
import torch
from unsloth import FastLanguageModel
from trl import SFTTrainer
from transformers import TrainingArguments

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


In [26]:
import os
os.environ["WANDB_MODE"] = "disabled"

## **2. Loading the Dataset**
- Loads the IMDB dataset which consists of movie reviews. We are specifically loading the train split.
- The dataset contains a large collection of text reviews, useful for sentiment analysis tasks.

In [6]:
dataset = load_dataset("imdb", split="train")

README.md:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

unsupervised-00000-of-00001.parquet:   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

In [7]:
dataset

Dataset({
    features: ['text', 'label'],
    num_rows: 25000
})

## **3. Loading the Pre-trained Model**
- Loads a pre-trained model (mistral-7b-bnb-4bit) from the unsloth repository.

**Parameters**:
  - `model_name`: Specifies the pre-trained model to load.
  - `max_seq_length`: Sets the maximum sequence length (2048 tokens).
  - `dtype`: Allows specifying the data type (left as None to use default).
  - `load_in_4bit`: Loads the model in 4-bit precision, reducing memory usage.

**Concept: Quantization**
- **4-bit quantization**: Reduces model size and speeds up inference by using 4-bit integers instead of 16 or 32-bit floating point numbers.


In [19]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/mistral-7b-bnb-4bit",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True
)

==((====))==  Unsloth 2024.11.5: Fast Mistral patching. Transformers = 4.47.0.dev0.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.5.1+cu124. CUDA = 7.5. CUDA Toolkit = 12.4.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.28.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


## **4. Applying LoRA (Low-Rank Adaptation)**
- Applies LoRA fine-tuning to the pre-trained model:
  - `r`: Low-rank factor.
  - `lora_alpha`: Scaling factor for the LoRA weights.
  - `lora_dropout`: Dropout rate for LoRA layers.
  - `target_modules`: Specifies which layers to adapt using LoRA (e.g., projection layers).
  - `bias`: Determines whether to add bias terms during adaptation.
  - `use_gradient_checkpointing`: Reduces memory usage by recomputing certain gradients.
  - `random_state`: Seed for reproducibility.

**Concept: LoRA**
- **Low-Rank Adaptation (LoRA)**: Fine-tunes large models by injecting small, trainable matrices into certain layers. This reduces the number of parameters that need to be updated, making training more efficient.

In [20]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    bias="none",
    use_gradient_checkpointing=True,
    random_state=3407,
    max_seq_length=2048
)

## **5. Setting Up Training Arguments**
- Defines the training configuration:
  - `per_device_train_batch_size`: Batch size per GPU.
  - `gradient_accumulation_steps`: Accumulates gradients over multiple steps to simulate a larger batch size.
  - `warmup_steps`: Gradually increases the learning rate at the start.
  - `max_steps`: Number of training steps.
  - `fp16/bf16`: Enables mixed precision for faster training.
  - `optim`: Uses an 8-bit variant of AdamW optimizer for efficient memory use.

In [28]:
training_args = TrainingArguments(
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    warmup_steps=10,
    max_steps=60,
    fp16=not torch.cuda.is_bf16_supported(),
    bf16=torch.cuda.is_bf16_supported(),
    logging_steps=1,
    output_dir="unsloth_outputs",
    optim="adamw_8bit",
    report_to="none",
    seed=3407
)

## **6. Initializing and Training the Model**
- `SFTTrainer` initializes a supervised fine-tuning process using the specified dataset and training arguments.
- `train()`: Starts the training process.

In [29]:
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=2048,
    tokenizer=tokenizer,
    args=training_args,
)

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


In [30]:
trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 25,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
1,2.4361
2,2.3472
3,2.3889
4,2.4532
5,2.5867
6,2.7237
7,2.306
8,2.3266
9,2.2072
10,2.4582


TrainOutput(global_step=60, training_loss=2.3912086606025698, metrics={'train_runtime': 858.7019, 'train_samples_per_second': 0.559, 'train_steps_per_second': 0.07, 'total_flos': 9642624560529408.0, 'train_loss': 2.3912086606025698, 'epoch': 0.0192})

## **7. Inference with the Fine-tuned Model**
- Encodes a sample review using the tokenizer.
- Uses the fine-tuned model for generating a response.
- Decodes the generated output into human-readable text.

In [31]:
review = "I really enjoyed this movie. It was great!"

In [32]:
input = tokenizer(
    [review],
    return_tensors="pt",
    padding=True,
).to("cuda")

In [35]:
model = FastLanguageModel.for_inference(model)

In [36]:
output = model.generate(**input, max_new_tokens=128, use_cache=True)

In [37]:
tokenizer.batch_decode(output)

['<s> I really enjoyed this movie. It was great! I was a little worried when I saw the previews because it looked like it was going to be a typical teen movie. But it was so much more than that. It was a great story about a girl who is trying to find herself. I really liked the way the movie was filmed. It was very realistic and I felt like I was there with the characters. The acting was great and the story was very well written. I would definitely recommend this movie to anyone who likes a good story. I give it a 9 out of 10.</s>']

## **8. Saving the Fine-tuned Model**
- Saves the fine-tuned model to disk.

In [38]:
model.save_pretrained("unsloth_lora_model")

## **9. Uploading Model to Hugging Face Hub**
- Allows pushing the fine-tuned model to the Hugging Face Hub for sharing.

In [44]:
# from huggingface_hub import notebook_login

# notebook_login()
# model.push_to_hub("Ali-Naqvi/unsloth_finetuning_using_imdb_dataset")