<a href="https://colab.research.google.com/github/arkeodev/nlp/blob/main/Fine_Tuning/02_LoRA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

$$
\begin{array}{c}
\text{$\Large "Everything\ should\ be\ made\ as\ simple\ as\ possible,\ but\ not\ simpler."$} \\
{\text{{$\small Albert\ Einstein$}}} \\
\end{array}
$$

# LoRA Tuning with PEFT from Hugging Face

## Introduction to LoRA and PEFT

### What is LoRA?

Low-Rank Adaptation (LoRA) is a technique used to adapt large language models with a small number of parameters by decomposing the weight matrices into low-rank representations. This significantly reduces the number of trainable parameters and the computational resources required for fine-tuning.

### What is PEFT?

Parameter-Efficient Fine-Tuning (PEFT) is a library from Hugging Face that allows efficient fine-tuning of large models by leveraging techniques like LoRA. It makes it easy to integrate and apply these techniques in your projects.

## Step-by-Step Process of Fine-Tuning with LoRA

. **Pre-Trained Model Initialization**:
   - Start with a pre-trained model, such as GPT-3 or BERT. These models have already been trained on large corpora and have established strong baseline performance on various tasks.

2. **Understand Low-Rank Adaptation (LoRA)**:
   - **Objective**: LoRA aims to reduce the number of trainable parameters by injecting low-rank trainable matrices into each layer of the transformer model. This allows fine-tuning to be more efficient in terms of both computation and memory. Specifically, in the Transformer architecture, LoRA is typically applied to the weight matrices within the self-attention modules (query, key, and value projections) and can be extended to the MLP layers. This modularity allows for selective adaptation, which can be more efficient depending on the specific task requirements.

   - **Concept**: Instead of updating the entire weight matrix of a transformer layer during fine-tuning, LoRA factorizes the weight updates into two low-rank matrices. This significantly reduces the number of parameters that need to be updated and stored.

3. **Injecting LoRA Matrices**:
   - **Original Weight Matrix**: Consider a weight matrix $( W \in \mathbb{R}^{d \times d} )$ in a transformer layer.

   - **Decomposition**: Decompose the weight update into two smaller matrices $( A \in \mathbb{R}^{d \times r} )$ and $( B \in \mathbb{R}^{r \times d} )$, where $( r )$ is the rank of the approximation and much smaller than $( d )$.

   - **Modified Weight Update**: During fine-tuning, the weight matrix $( W )$ is modified as:
     $$
     W_{\text{new}} = W + \Delta W
     $$
     where $( \Delta W = A \times B )$.

4. **Training Process with LoRA**:
   - **Freeze Original Weights**: Keep the original pre-trained weights $( W )$ frozen and only update the matrices $( A )$ and $( B )$.

   - **Forward Pass**: During the forward pass, compute the output using the modified weight matrix $( W_{\text{new}} )$.

   - **Backward Pass**: Compute gradients and update the low-rank matrices $( A )$ and $( B )$ only.

5. **Implementation Details**:
   - **Choosing Rank $( r )$**: The rank $( r )$ should be chosen such that it balances the trade-off between model capacity and computational efficiency. Common choices are small integers like 4 or 8.

   - **Initialization**: Initialize $( A )$ and $( B )$ with small random values or using some form of pre-training.

   - **Optimizer**: Use standard optimizers (e.g., Adam) to update $( A )$ and $( B )$.

6. **Integration into Training Pipeline**:
   - **Data Preparation**: Prepare your training data specific to the task you want to fine-tune the model on.

   - **Training Loop**: Incorporate the LoRA adaptation into your training loop. Ensure that only $( A )$ and $( B )$ are updated during training.
   
   - **Evaluation**: After fine-tuning, evaluate the model on validation and test sets to ensure that the fine-tuning has improved performance on the specific task.

<figure>
    <img src="https://raw.githubusercontents.com/arkeodev/nlp/main/Fine_Tuning/images/LoRA.png" width="1000" height="400" alt="LoRA">
    <figcaption>LoRA</figcaption>
</figure>

## Benefits of Using LoRA

- **Parameter Efficiency**: By reducing the number of trainable parameters, LoRA makes the fine-tuning process more memory efficient.

- **Speed**: Fine-tuning with fewer parameters can be significantly faster, making it feasible to fine-tune large models on smaller datasets or with limited computational resources.

- **Flexibility**: LoRA allows the adaptation of pre-trained models to new tasks without requiring extensive computational resources.

## Implementation

### 1. Dataset Selection and Preparation

We'll use the "Quora Question Pairs" dataset from Kaggle, which contains pairs of questions and a label indicating if they are paraphrases.

Now we'll install `kaggle` package and upload `kaggle.json` file from the computer.

In [18]:
import os
import sys
import json
from IPython.display import display
from ipywidgets import FileUpload


kaggle_json_path = './kaggle.json'

def setup_kaggle_api(file_content):
    # Save the uploaded file
    with open(kaggle_json_path, 'wb') as f:
        f.write(file_content)

    # Read the kaggle.json file
    with open(kaggle_json_path, 'r') as f:
        kaggle_token = json.load(f)

    # Set up environment variables for Kaggle API credentials
    os.environ['KAGGLE_USERNAME'] = kaggle_token['username']
    os.environ['KAGGLE_KEY'] = kaggle_token['key']

    print("Kaggle API credentials are set up successfully.")

def create_upload_widget():
    # Create a file upload widget
    upload_widget = FileUpload(accept='.json', multiple=False)

    def on_upload_change(change):
        # Get the uploaded file content
        uploaded_file = upload_widget.value["kaggle.json"]

        # Check if the uploaded file is a dictionary and has the key 'content'
        if not isinstance(uploaded_file, dict) or 'content' not in uploaded_file:
            raise ValueError("Uploaded file is not valid or missing 'content'.")

        # Setup Kaggle API with the uploaded file content
        setup_kaggle_api(uploaded_file['content'])

    # Attach the callback function to the widget
    upload_widget.observe(on_upload_change, names='value')

    # Display the widget
    display(upload_widget)

In [19]:
# Install the kaggle package
! pip install kaggle -q

# Call the function to create and display the upload widget
create_upload_widget()

FileUpload(value={}, accept='.json', description='Upload')

Kaggle API credentials are set up successfully.


Now it is needed to accept the rules to download the file. Here's how you can do it:

1. Go to the [Quora Question Pairs competition page](https://www.kaggle.com/c/quora-question-pairs).
2. Sign in with your Kaggle account.
3. Click on the "Rules" tab.
4. Scroll down and click the "I Understand and Accept" button.

After accepting the rules, you can download the dataset using the Kaggle API in your Jupyter Notebook.

In [22]:
# Download the Quora Question Pairs dataset
! kaggle competitions download -c quora-question-pairs

# Unzip the dataset
! unzip quora-question-pairs.zip -d quora_question_pairs
! unzip -o ./quora_question_pairs/test.csv.zip -d ./quora_question_pairs
! unzip -o ./quora_question_pairs/train.csv.zip -d ./quora_question_pairs

quora-question-pairs.zip: Skipping, found more recently modified local copy (use --force to force download)
Archive:  quora-question-pairs.zip
  inflating: quora_question_pairs/sample_submission.csv.zip  
  inflating: quora_question_pairs/test.csv  
  inflating: quora_question_pairs/test.csv.zip  
  inflating: quora_question_pairs/train.csv.zip  
Archive:  ./quora_question_pairs/test.csv.zip
  inflating: ./quora_question_pairs/test.csv  
Archive:  ./quora_question_pairs/train.csv.zip
  inflating: ./quora_question_pairs/train.csv  


### 2. Environment Setup

We need to install and import the necessary libraries, including Hugging Face's Transformers, Datasets, and PEFT.

In [23]:
# Install required libraries
! pip install transformers datasets peft -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m542.0/542.0 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m251.6/251.6 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.6/302.6 kB[0m [31m12.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.3/21.3 MB[0m [31m41.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [24]:
import os
import pandas as pd
import torch
from datasets import load_dataset, Dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
from peft import LoraConfig, get_peft_model, PeftModel

### 3. Data Preprocessing

Load and preprocess the dataset to make it suitable for fine-tuning.

In [25]:
import pandas as pd
from datasets import Dataset
from transformers import AutoTokenizer

# Load the datasets as pandas dataframes for initial inspection
train_df = pd.read_csv('quora_question_pairs/train.csv')
test_df = pd.read_csv('quora_question_pairs/test.csv')

# Remove any rows with missing data in important columns
train_df.dropna(subset=['question1', 'question2', 'is_duplicate'], inplace=True)
test_df.dropna(subset=['question1', 'question2'], inplace=True)

# Drop the columns we don't need
train_df = train_df[['question1', 'question2', 'is_duplicate']]
test_df = test_df[['question1', 'question2']]

# Convert the dataframes to Hugging Face datasets
train_dataset = Dataset.from_pandas(train_df)
test_dataset = Dataset.from_pandas(test_df)

# Initialize the tokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

# Define the preprocessing function
def preprocess_function(examples):
    try:
        inputs = tokenizer(examples['question1'], examples['question2'], truncation=True, padding='max_length', max_length=128)
        if 'is_duplicate' in examples:
            inputs['labels'] = examples['is_duplicate']
        return inputs
    except Exception as e:
        print(f"Error processing example: {e}")
        return {}

# Preprocess the datasets and handle errors
encoded_train_dataset = train_dataset.map(preprocess_function, batched=True, remove_columns=train_dataset.column_names)
encoded_test_dataset = test_dataset.map(preprocess_function, batched=True, remove_columns=test_dataset.column_names)

# Filter out empty results
encoded_train_dataset = encoded_train_dataset.filter(lambda x: x['input_ids'] is not None)
encoded_test_dataset = encoded_test_dataset.filter(lambda x: x['input_ids'] is not None)

# Print to verify the datasets
print(encoded_train_dataset)
print(encoded_test_dataset)

  test_df = pd.read_csv('quora_question_pairs/test.csv')


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Map:   0%|          | 0/404287 [00:00<?, ? examples/s]

Map:   0%|          | 0/3563466 [00:00<?, ? examples/s]

Filter:   0%|          | 0/404287 [00:00<?, ? examples/s]

Filter:   0%|          | 0/3563466 [00:00<?, ? examples/s]

Dataset({
    features: ['input_ids', 'token_type_ids', 'attention_mask', 'labels'],
    num_rows: 404287
})
Dataset({
    features: ['input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 3563466
})


### 4. Model Selection and Configuration

Select a pre-trained model and configure it for fine-tuning.

In [26]:
from transformers import AutoModelForSequenceClassification
from peft import LoraConfig, get_peft_model

# Load a pre-trained model for sequence classification
model = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

# Define LoRA (Low-Rank Adaptation) configuration
lora_config = LoraConfig(
    r=8,                   # Rank of the low-rank adaptation matrices. Controls the size of the low-rank projection.
    lora_alpha=1,         # Scaling factor for the low-rank matrices. Balances the contribution of the low-rank matrices.
    lora_dropout=0.1,      # Dropout rate applied to the low-rank adaptation matrices. Helps prevent overfitting.
    bias="none",           # Indicates whether to add a bias term to the low-rank adaptation. Options: "none", "all", "lora_only".
    target_modules=["query", "key", "value"]  # Specifies the target modules in the transformer layers to which LoRA is applied.
)

# Apply PEFT (Parameter-Efficient Fine-Tuning) using the defined LoRA configuration
model = get_peft_model(model, lora_config)

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


`lora_alpha` is a scaling factor applied to the low-rank adaptation matrices in the LoRA approach. The primary purpose of `lora_alpha` is to adjust the impact of the low-rank adaptation matrices on the model's parameters. It essentially scales the output of these low-rank matrices before adding them to the original model parameters.

  Mathematically:
  $$
  W_{\text{new}} = W + \alpha (A \cdot B)
  $$

Here, $( A \cdot B )$ is the output of the low-rank adaptation, and $( \alpha )$ is `lora_alpha`.

### 5. LoRA Tuning with PEFT

Fine-tune the model using the Trainer API from Hugging Face.

In [27]:
# Define training arguments
training_args = TrainingArguments(
    output_dir='./results',
    evaluation_strategy="epoch",
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
)

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=encoded_train_dataset,
    eval_dataset=encoded_test_dataset,
    tokenizer=tokenizer,
)

# Train the model
trainer.train()



Epoch,Training Loss,Validation Loss




KeyboardInterrupt: 

### 6. Evaluation and Analysis

Evaluate the model on the test set and analyze the results.

In [None]:
# Evaluate the model
eval_results = trainer.evaluate()
print(eval_results)

# Analyze the results
def predict(question1, question2):
    inputs = tokenizer(question1, question2, return_tensors='pt', truncation=True, padding='max_length', max_length=128)
    with torch.no_grad():
        logits = model(**inputs).logits
    probabilities = torch.nn.functional.softmax(logits, dim=-1)
    return probabilities

# Test the model with example questions
question1 = "How do I cook pasta?"
question2 = "What is the process of cooking pasta?"
print(predict(question1, question2))

## Conclusion

In this notebook, we have covered the process of fine-tuning a pre-trained language model using LoRA with the PEFT library from Hugging Face.

## Additional Resources

- The original paper on arXiv: [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)

- Hugging Face LoRA apapter: [Hugging Face](https://huggingface.co/docs/peft/package_reference/lora)