<a href="https://colab.research.google.com/github/atharv-arya/Fine-Tuning-LLAMA-2-using-LoRA-and-QLoRA/blob/main/Fine_tuning_LLAMA_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Installing Reqiured Packages

In [None]:
!pip install --upgrade pip setuptools wheel
!pip install tokenizers==0.22.1 transformers==4.56.2 accelerate peft bitsandbytes trl

Collecting tokenizers==0.22.1
  Using cached tokenizers-0.22.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.8 kB)
Collecting transformers==4.56.2
  Using cached transformers-4.56.2-py3-none-any.whl.metadata (40 kB)
Collecting bitsandbytes
  Using cached bitsandbytes-0.47.0-py3-none-manylinux_2_24_x86_64.whl.metadata (11 kB)
Collecting trl
  Using cached trl-0.23.0-py3-none-any.whl.metadata (11 kB)
Using cached tokenizers-0.22.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
Using cached transformers-4.56.2-py3-none-any.whl (11.6 MB)
Using cached bitsandbytes-0.47.0-py3-none-manylinux_2_24_x86_64.whl (61.3 MB)
Using cached trl-0.23.0-py3-none-any.whl (564 kB)
Installing collected packages: tokenizers, transformers, bitsandbytes, trl
[2K  Attempting uninstall: tokenizers
[2K    Found existing installation: tokenizers 0.22.0
[2K    Uninstalling tokenizers-0.22.0:
[2K      Successfully uninstalled tokenizers-0.22.0
[2K  Attempting uninstall: 

In [None]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

# For LLAMA 2, the prompt template used for chat models is as follows:

* System Prompt(optional): to guide the model
* User Prompt(required): to give instructions
* Model Answer(requred)



```
<s> [INT] <<SYS>>
System Prompt
<</SYS>>

User Prompt [/INST] Model Answer </s>
```









# We will reformat our instruction dataset to follow LLAMA 2's template

* Original dataset: https://huggingface.co/datasets/timdettmers/openassistant-guanaco

* Reformated the above Dataset to follow LLAMA 2 template for 1k samples: https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k

* Fully reformated dataset to follow LLAMA 2 template: https://huggingface.co/datasets/mlabonne/guanaco-llama2

# How to Fine Tune LLAMA 2
* Free Google Colab offers a 15GB Graphics Card (Limited Resources --> Barely enough to store Llama 2–7b’s weights)

* We also need to consider the overhead due to optimizer states, gradients, and forward activations

* Full fine-tuning is not possible here: we need parameter-efficient fine-tuning (PEFT) techniques like LoRA or QLoRA.

* To drastically reduce the VRAM usage, we must fine-tune the model in 4-bit precision, which is why we’ll use QLoRA here.


Steps:
1. Load a llama-2-7b-chat-hf-model (chat model)
2. Train it on the Reformated Dataset (mlabonne/guanaco-llama2-1k), whihc iwll prodice our fine-tuned Llama-2-7b-chat-finetune

QLoRA will use rank of 64 with a scaling parameter of 16. We will load the Llama 2 model directly in 4-bit precision using the NF4 type and train it for one epoch