<a href="https://www.kaggle.com/code/aisuko/producing-adapter-with-vera?scriptVersionId=185168250" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Overview

Fine-tuning Vectors on top of Random Matrices(VeRA).VeRA has two matrices same to LoRA, but they are frozen, random and shread across layers. The trinable parameters are in two vectors d and b that are placed after A and b, respectively. d and b are not shared across layers. It means that VeRA uses random matrices in the context of parameter-efficient fine-tuning. Since VeRA only trains 2 vectors, VeRA has significantly fewer trainable parameters than LoRA.

It is implemented in Hugging Face PEFT. This implementation has two significant limitations(as of June 8th, 2024):
* It doen't support VeRA over a quantized model, it can only target modules using `nn.Linear`
* The targeted modules must have the same shape
* VeRA produce a larger adapter than LoRA. This is because, PEFT also saves the random matrices in addition to the fine-tuned vectors. It guarantees the portability of the fine-tuned adapter to other hardware/software configurations.

In [None]:
!pip install -U -q transformers==4.39.3
!pip install -U -q accelerate==0.28.0
!pip install -U -q datasets==2.18.0
!pip install -U -q peft==0.10.0
!pip install -U -q bitsandbytes==0.43.1
!pip install -U -q trl==0.8.6

In [None]:
import os
from huggingface_hub import login
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
login(token=user_secrets.get_secret("HUGGINGFACE_TOKEN"))

os.environ["WANDB_API_KEY"]=user_secrets.get_secret("WANDB_API_KEY")
os.environ["WANDB_PROJECT"] = "Fine-tuning Llama 3 8B with vera"
os.environ["WANDB_NAME"] = "ft-Llama3-8b-vera"
os.environ["MODEL_NAME"] = "meta-llama/Meta-Llama-3-8B"
os.environ["DATASET"] = "timdettmers/openassistant-guanaco"

In [None]:
!accelerate estimate-memory ${MODEL_NAME} --library_name transformers

In [None]:
if torch.cuda.is_bf16_supported():
    compute_dtype = torch.bfloat16

In [None]:
from transformers import AutoTokenizer


tokenizer=AutoTokenizer.from_pretrained(os.getenv("MODEL_NAME"))

In [None]:
tokenizer.pad_token="<|eot_id|>"
tokenizer.pad_token_id=128009
tokenizer.padding_side="left"

# Loading data

In [None]:
from datasets import load_dataset

# loading datesets
ds=load_dataset(os.getenv("DATASET"))
ds

In [None]:
import multiprocessing

# add EOS token
def pre_process(x):
    x["text"]=x["test"]+"<|end_of_text|>"
    return x

ds=ds.map(pre_process, num_proc=multiprocessing.cpu_count(), load_from_cache_file=False)
ds

# Loading model

In [None]:
from transformers import AutoModelForCausalLM

model=AutoModelForCausalLM.from_pretrained(
    os.getenv("MODEL_NAME"), 
    device_map="auto",
    torch_dtype=compute_dtype
    # attn_implementation
)
model

In [None]:
model.device

In [None]:
def print_trainable_parameters(model):
    trainable_params=0
    all_params=0
    for _, param in model.named_parameters():
        all_params+=param.numel()
        if param.requires_grad:
            trainable_params+=param.numel()
    print(f"trainable params: {trainable_params} || all params: {all_params} || trainable%: {100 * trainable_params/all_params:.2f}")

print_trainable_parameters(model)

In [None]:
model.gradient_checkpointing_enable()

In [None]:
from trl import SFTTrainer, SFTConfig
from peft import VeraConfig

peft_config=VeraConfig(
    vera_dropout=0.05,
    r=512,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["gate_proj","up_proj"]
)

training_arguments=SFTConfig(
    output_dir=os.getenv("WANDB_NAME"),
    evaluation_strategy="steps",
    do_eval=False,
    optim="paged_adamw_8bit",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=16,
    per_device_eval_batch_size=2,
    log_level="debug",
    save_strategy="epoch",
    logging_steps=100,
    learning_rate=1e-4,
    fp16=True,
    bf16=torch.cuda.is_bf16_supported(),
    eval_steps=100,
    num_train_epochs=3,
    warmup_ratio=0.1,
    lr_scheduler_type="linear",
    report_to="wandb",
    run_name=os.getenv('WANDB_NAME'),
    output_dir=os.getenv('WANDB_NAME')
)

trainer=SFTrainer(
    model=model,
    train_dataset=ds["train"],
    eval_dataset=ds["test"],
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length="512",
    tokenizer=tokenizer,
    arges=training_arguments
)

trainer.train()

# Credit
* https://towardsdatascience.com/fine-tune-tiny-adapters-for-llama-3-with-vera-7c48f4391d84
* https://arxiv.org/abs/2310.11454
* https://www.kaggle.com/code/aisuko/fine-tune-llama3-with-orpo