# Implementing QLORA for Fine-Tuning Falcon-7b LLM

Here we implement QLORA for fine-tuning the Falcon-7b LLM on a custom dataset. Note the environment requirement of GPU for this notebook is 40GB Nvidia A100 GPU. Thanks for [Amod's](https://medium.com/@amodwrites) excellent article and I have credited it at the bottom.

## Loading the required libraries

The first step in the process is to load the necessary libraries:

* `bitsandbytes`: A lightweight wrapper by Hugging Face (🤗) around CUDA custom functions, particularly 8-bit optimizers and quantization functions. It’s used to handle the quantization process in QLoRA.
* `peft`: A library by 🤗 that enables parameter efficient fine tuning.
* `transformers`: A library by 🤗 that provides pre-trained models and training utilities for various natural language processing tasks.
* `datasets`: Another library by 🤗 that provides easy access to a wide range of datasets.
* `accelerate`: A library that by 🤗 abstracts the boilerplate code related to multi-GPUs/TPU/fp16, making it easier to write the training loop of PyTorch models.
* `loralib`: A PyTorch implementation of Low-Rank Adaptation (LoRA), a parameter-efficient approach to adapt a large pre-trained deep learning model.
* `einops`: A library that simplifies tensor operations.
* `xformers`: A collection of composable Transformer building blocks.

!pip install -qU bitsandbytes transformers datasets accelerate loralib einops xformers
!pip install -q -U git+https://github.com/huggingface/peft.git

In [None]:
import os
import bitsandbytes as bnb
import pandas as pd
import torch
import torch.nn as nn
import transformers
from datasets import load_dataset
from peft import (
    LoraConfig,
    PeftConfig,
    get_peft_model,
    prepare_model_for_kbit_training,
)
from transformers import (
    AutoConfig,
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)

## Loading the Pre-Trained Model

Here we load the pre-trained model. In this case, 
* the Falcon 7b model is loaded using the `AutoModelForCausalLM.from_pretrained()` function from the 🤗 transformers library.
* the model is loaded in 4-bit using the `BitsAndBytesConfig` from the bitsandbytes library. 

This is part of the QLoRA process, which involves quantizing the pre-trained weights of the model to 4-bit and keeping them fixed during fine-tuning.

In [None]:
model_id = "tiiuae/falcon-7b"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    load_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model =AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True,
    quantization_config=bnb_config,
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

## Preparing the Model for QLoRA

The model is then prepared for QLoRA using the `prepare_model_for_kbit_training()` function. This function initializes the model for QLoRA by setting up the necessary configurations.

In [None]:
model = prepare_model_for_kbit_training(model)

## Configuring LoRA

The LoRA configuration is set up using the `LoraConfig` class. The parameters in this configuration include:

* `r`: The rank of the update matrices. Lower rank results in smaller update matrices with fewer trainable parameters.
* `lora_alpha`: LoRA scaling factor.
* `target_modules`: The modules (for example, attention blocks) to apply the LoRA update matrices.
* `lora_dropout`: Dropout probability of the LoRA layers.
* `bias`: Specifies if the bias parameters should be trained. Can be ‘none’, ‘all’ or ‘lora_only’.

The model is then updated with the LoRA configuration using the `get_peft_model()` function.

In [None]:
config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["query_key_value"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)

## Loading and Preparing the Dataset

The dataset is loaded using the `load_dataset()` function from the 🤗 datasets library. The dataset is then shuffled and mapped to the `generate_and_tokenize_prompt()` function, which generates and tokenizes each data point in the dataset.

In [None]:
def generate_prompt(data_point):
  return f"""
<Human>: {data_point["Context"]}
<AI>: {data_point["Response"]}
  """.strip()

def generate_and_tokenize_prompt(data_point):
  full_prompt = generate_prompt(data_point)
  tokenized_full_prompt = tokenizer(full_prompt, padding=True, truncation=True)
  return tokenized_full_prompt

from datasets import load_dataset
dataset_name = 'Amod/mental_health_counseling_conversations'
dataset = load_dataset(dataset_name, split="train")

dataset = dataset.shuffle().map(generate_and_tokenize_prompt)

## Setting Up the Training Arguments

The training arguments are set up using the `TrainingArguments` class from the transformers library. These arguments include:

* `auto_find_batch_size`: If set to True, the trainer will automatically find the largest batch size that fits into memory.
* `num_train_epochs`: The number of training epochs.
* `learning_rate`: The learning rate for the optimizer.
* `bf16`: If set to True, the trainer will use bf16 precision for training.
* `save_total_limit`: The total number of checkpoints that can be saved.
* `logging_steps`: The number of steps between each logging.
* `output_dir`: The directory where the model checkpoints will be saved.
* `save_strategy`: The strategy to use for saving checkpoints. In this case, a checkpoint is saved after each epoch.

In [None]:
training_args = transformers.TrainingArguments(
    auto_find_batch_size=True,
    num_train_epochs=4,
    learning_rate=2e-4,
    bf16=True,
    save_total_limit=4,
    logging_steps=10,
    output_dir=./output,
    save_strategy='epoch',
)

## Training the Model

Finally, the model is trained using the `Trainer` class from the transformers library. The trainer takes in:
* the model
* the dataset
* the training arguments
* a data collator for language modeling

The training process is then started using the `train()` method of the trainer.

In [None]:
trainer = transformers.Trainer(
    model=model,
    train_dataset=dataset,
    args=training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False
trainer.train()

# Summary

This is a high-level overview of the code. Each step in the process is crucial for fine-tuning the Falcon 7b model using QLoRA. The combination of these steps allows for efficient and effective fine-tuning of the model on a custom dataset. After the training has completed, you can save the updated LoRAs locally or upload them to Hugging Face to be used with 🤗 PEFT. Or you can choose to merge the LoRAs with the corresponding foundation LLM using ‘model.merge_and_unload()’ function from the PEFT library.

# Credit

https://medium.com/@amodwrites/a-definitive-guide-to-qlora-fine-tuning-falcon-7b-with-peft-78f500a1f337