<a href="https://colab.research.google.com/github/SmallChungus1/screen_assist/blob/main/visualwebbenchmark_qlora.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Refs:


*   Qlora example notebook: https://colab.research.google.com/drive/1VoYNfYDKcKRQRor98Zbf2-9VQTtGJ24k?usp=sharing#scrollTo=E0Nl5mWL0k2T
* lora smolVLM example: https://huggingface.co/learn/cookbook/en/fine_tuning_smol_vlm_sft_trl
*   Qlora HF blog: https://huggingface.co/blog/4bit-transformers-bitsandbytes
* VisualWebBench paper: https://arxiv.org/pdf/2404.05955
* Moondream VLM HF: https://huggingface.co/vikhyatk/moondream2



In [20]:
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q -U git+https://github.com/huggingface/trl.git
!pip install -q datasets
!pip install flash-attn --no-build-isolation

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for trl (pyproject.toml) ... [?25l[?25hdone


In [31]:
from datasets import load_dataset
import torch
from transformers import Idefics3ForConditionalGeneration, AutoProcessor, BitsAndBytesConfig
from peft import prepare_model_for_kbit_training, LoraConfig, get_peft_model
from trl import SFTConfig, SFTTrainer
from PIL import Image
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

#google drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [25]:
#free up gpu ram for reloading models
import gc
import time

def clear_memory():
    # Delete variables if they exist in the current global scope
    if 'inputs' in globals(): del globals()['inputs']
    if 'model' in globals(): del globals()['model']
    if 'processor' in globals(): del globals()['processor']
    if 'trainer' in globals(): del globals()['trainer']
    if 'peft_model' in globals(): del globals()['peft_model']
    if 'bnb_config' in globals(): del globals()['bnb_config']
    time.sleep(2)

    # Garbage collection and clearing CUDA memory
    gc.collect()
    time.sleep(2)
    torch.cuda.empty_cache()
    torch.cuda.synchronize()
    time.sleep(2)
    gc.collect()
    time.sleep(2)

    print(f"GPU allocated memory: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")
    print(f"GPU reserved memory: {torch.cuda.memory_reserved() / 1024**3:.2f} GB")


In [12]:
vbench_webqa_ds = load_dataset("visualwebbench/VisualWebBench", "webqa")['test']
#train val split by indices
splits = vbench_webqa_ds.train_test_split(test_size=0.2)
vbench_webqa_train = splits["train"]
vbench_webqa_val = splits["test"]

In [13]:
vbench_webqa_train

Dataset({
    features: ['id', 'task_type', 'website', 'image', 'image_size', 'question', 'answer'],
    num_rows: 251
})

In [14]:
def format_data(sample):
    return {
        "images": [sample["image"]],    # actual image data (PIL or array)
        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "image"},  # image placeholder, NOT the pixels
                    {"type": "text", "text": sample["question"]}
                ]
            },
            {
                "role": "assistant",
                "content": [
                    {"type": "text", "text": sample["answer"]}
                ]
            }
        ]
    }

In [15]:
train_dataset = [format_data(sample) for sample in vbench_webqa_train]
val_dataset = [format_data(sample) for sample in vbench_webqa_val]

In [26]:
model_id = "HuggingFaceTB/SmolVLM-Instruct"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = Idefics3ForConditionalGeneration.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    _attn_implementation="flash_attention_2" if DEVICE == "cuda" else "eager",
)

processor = AutoProcessor.from_pretrained(model_id)

Loading weights:   0%|          | 0/657 [00:00<?, ?it/s]

In [27]:
#required pre-processing before training, need to look into this
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

In [28]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

Lora Setup

In [29]:
peft_config = LoraConfig(
    r=8,
    lora_alpha=8,
    lora_dropout=0.1,
    target_modules=['down_proj','o_proj','k_proj','q_proj','gate_proj','up_proj','v_proj'],
    use_dora=True,
    init_lora_weights="gaussian"
)

In [30]:
# Apply PEFT model adaptation
peft_model = get_peft_model(model, peft_config)

# Print trainable parameters
peft_model.print_trainable_parameters()

trainable params: 11,269,248 || all params: 2,257,542,128 || trainable%: 0.4992


In [23]:
#clear_memory()

GPU allocated memory: 0.01 GB
GPU reserved memory: 0.02 GB


In [36]:
training_args = SFTConfig(
    output_dir="/content/drive/MyDrive/!personalMLProject/screen_qa/smolVlm_webbench_qlora",
    num_train_epochs=1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    warmup_steps=50,
    learning_rate=1e-4,
    weight_decay=0.01,
    logging_steps=25,
    save_strategy="steps",
    save_steps=25,
    save_total_limit=1,
    optim="adamw_torch_fused",
    bf16=True,
    push_to_hub=False,
    report_to="none",
    max_length=None
)

In [37]:
trainer = SFTTrainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    peft_config=peft_config,
    processing_class=processor,
)

The model is already on multiple devices. Skipping the move to device specified in `args`.


In [None]:
trainer.train()

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 49154, 'bos_token_id': 1, 'pad_token_id': 2}.
Casting fp32 inputs back to torch.bfloat16 for flash-attn compatibility.


Step,Training Loss


In [None]:
trainer.save_model(training_args.output_dir)