# Setting up environment

Check cuda version

In [1]:
!nvidia-smi

Tue Apr  1 08:54:04 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   46C    P8             10W /   70W |       1MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla T4                       Off |   00

Clone GitHub repo

In [6]:
!git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git

Cloning into 'LLaMA-Factory'...
remote: Enumerating objects: 348, done.[K
remote: Counting objects: 100% (348/348), done.[K
remote: Compressing objects: 100% (289/289), done.[K
remote: Total 348 (delta 82), reused 163 (delta 44), pack-reused 0 (from 0)[K
Receiving objects: 100% (348/348), 9.53 MiB | 32.86 MiB/s, done.
Resolving deltas: 100% (82/82), done.


Change directory

In [7]:
%cd LLaMA-Factory

/kaggle/working/LLaMA-Factory


Install packages

In [8]:
!pip install -e ".[torch,metrics]"
!pip install deepspeed triton
!pip install flash-attn --no-build-isolation

Obtaining file:///kaggle/working/LLaMA-Factory
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Collecting transformers!=4.46.*,!=4.47.*,!=4.48.0,<=4.50.0,>=4.41.2 (from llamafactory==0.9.3.dev0)
  Downloading transformers-4.50.0-py3-none-any.whl.metadata (39 kB)
Collecting trl<=0.9.6,>=0.8.6 (from llamafactory==0.9.3.dev0)
  Downloading trl-0.9.6-py3-none-any.whl.metadata (12 kB)
Collecting gradio<=5.21.0,>=4.38.0 (from llamafactory==0.9.3.dev0)
  Downloading gradio-5.21.0-py3-none-any.whl.metadata (16 kB)
Collecting uvicorn (from llamafactory==0.9.3.dev0)
  Downloading uvicorn-0.34.0-py3-none-any.whl.metadata (6.5 kB)
Collecting fastapi (from llamafactory==0.9.3.dev0)
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collecting sse-starlette (from llamafactory==0.9.

# Loading Dataset

Import packages

In [9]:
from datasets import load_dataset, concatenate_datasets, Value
import json

Load videos mapping

In [10]:
with open("/kaggle/input/d/seanjeanmoey/next-qa-dataset/map_vid_vidorID.json") as file:
    video_dir_map = json.load(file)

Format data

In [11]:
def format_data(sample):
    return {
        "messages": [
            {
                "content": f"<video>{sample['question']}",
                "role": "user"
            },
            {
                "content": f"{sample['answer']}",
                "role": "assistant"
            }
        ],
        "videos": [
            f"/kaggle/input/d/seanjeanmoey/next-qa-dataset/NExTVideo/NExTVideo/{video_dir_map[sample['video']]}.mp4"
        ]
    }

Format MCQ

In [12]:
def reformat_mcq(sample):
    choice_labels = ["A", "B", "C", "D", "E"]
    choices = [sample[f"a{i}"] for i in range(5)]
    formatted_choices = "\n".join([f"{choice_labels[i]}. {choice}" for i, choice in enumerate(choices)])
    
    return {
        "video": sample["video"],
        "frame_count": sample["frame_count"],
        "width": sample["width"],
        "height": sample["height"],
        "question": f"{sample['question']}\n{formatted_choices}\nSelect one best answer to the above multiple-choice question based on the video. Respond with only the letter (A, B, C, D or E) of the correct option.",
        "answer": choice_labels[sample["answer"]],
        "qid": sample["qid"],
        "type": sample["type"],
        "additional_ref_answer": None
    }

Load dataset

In [13]:
dataset_id = 'lmms-lab/NExTQA'

mcq_dataset = load_dataset(dataset_id, 'MC')['test'].map(reformat_mcq, remove_columns=['a0', 'a1', 'a2', 'a3', 'a4'])
new_features = mcq_dataset.features.copy()
new_features["video"] = Value("string")
new_features["frame_count"] = Value("int32")
new_features["width"] = Value("int32")
new_features["height"] = Value("int32")
new_features["qid"] = Value("int32")
mcq_dataset = mcq_dataset.cast(new_features)
train_test_split = mcq_dataset.train_test_split(test_size=0.3, seed=42)
val_test_split = train_test_split['test'].train_test_split(test_size=2/3, seed=42)
mcq_train_dataset = train_test_split['train']
mcq_eval_dataset = val_test_split['train']
mcq_test_dataset = val_test_split['test']

oe_train_dataset, oe_eval_dataset, oe_test_dataset = load_dataset(dataset_id, 'OE', split=['train', 'validation', 'test'])

train_dataset = concatenate_datasets([mcq_train_dataset, oe_train_dataset])
eval_dataset = concatenate_datasets([mcq_eval_dataset, oe_eval_dataset])
test_dataset = concatenate_datasets([mcq_test_dataset, oe_test_dataset])

train_dataset = [format_data(sample) for sample in train_dataset]
eval_dataset = [format_data(sample) for sample in eval_dataset]
test_dataset = [format_data(sample) for sample in test_dataset]

dataset = train_dataset + eval_dataset + test_dataset

README.md:   0%|          | 0.00/1.44k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/899k [00:00<?, ?B/s]

Generating test split:   0%|          | 0/8564 [00:00<?, ? examples/s]

Map:   0%|          | 0/8564 [00:00<?, ? examples/s]

Casting the dataset:   0%|          | 0/8564 [00:00<?, ? examples/s]

train-00000-of-00001.parquet:   0%|          | 0.00/2.22M [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/289k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/572k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/37523 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/5343 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/9178 [00:00<?, ? examples/s]

Create dataset info json

In [14]:
args = { 
    "nextqa": {
        "file_name": "nextqa.json",
        "formatting": "sharegpt",
        "columns": {
            "messages": "messages",
            "videos": "videos"
        },
        "tags": {
            "role_tag": "role",
            "content_tag": "content",
            "user_tag": "user",
            "assistant_tag": "assistant"
        }
    }
}
with open("data/dataset_info.json", "w", encoding="utf-8") as f: 
    json.dump(args, f, ensure_ascii=False, indent=4)

Create dataset json

In [15]:
with open("data/nextqa.json", "w", encoding="utf-8") as f: 
    json.dump(dataset, f, ensure_ascii=False, indent=4)

# Fine-tuning Model

Import packages

In [2]:
import json
import wandb
from kaggle_secrets import UserSecretsClient

Login to wandb

In [3]:
wandb.login(key=UserSecretsClient().get_secret("WANDB_API_KEY"))

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33mseanjeanmoey123[0m ([33mseanjeanmoey123-nus[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

Create fine-tuning script

In [16]:
args = {
    "model_name_or_path": "Qwen/Qwen2.5-VL-3B-Instruct",
    "image_max_pixels": 4096,
    "video_max_pixels": 4096,
    "trust_remote_code": True,
    "stage": "sft",
    "do_train": True,
    "finetuning_type": "lora",
    "deepspeed": "examples/deepspeed/ds_z3_offload_config.json",
    "use_fast_tokenizer": True,
    "lora_rank": 8,
    "lora_target": "all",
    "dataset": "nextqa",
    "template": "qwen2_vl",
    # "cutoff_len": 2048,
    "max_samples": 128,
    "overwrite_cache": True,
    "preprocessing_num_workers": 128,
    # "dataloader_num_workers": 4,
    "output_dir": "/kaggle/working/finetuned",
    "logging_steps": 10,
    "save_steps": 500,
    "plot_loss": True,
    "overwrite_output_dir": True,
    "save_only_model": False,
    "per_device_train_batch_size": 1,
    "gradient_accumulation_steps": 8,
    "learning_rate": 1.0e-4,
    "num_train_epochs": 3.0,
    "lr_scheduler_type": "cosine",
    "warmup_steps": 100,
    "weight_decay": 0.1,
    "warmup_ratio": 0.1,
    "bf16": True,
    "ddp_timeout": 180000000,
    "resume_from_checkpoint": None,
    # "val_size": 0.1,
    # "per_device_eval_batch_size": 1,
    # "eval_strategy": "steps",
    # "eval_steps": 500,
}
with open("train.json", "w", encoding="utf-8") as f: 
    json.dump(args, f, ensure_ascii=False, indent=4)

Train model

In [17]:
!llamafactory-cli train train.json

2025-04-01 16:27:42.833337: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-04-01 16:27:43.052643: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-04-01 16:27:43.115424: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[2025-04-01 16:27:56,226] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[INFO|2025-04-01 16:28:02] llamafactory.cli:143 >> Initializing 2 distributed tasks at: 127.0.0.1:40901
W0401 16:28:04.229000 337 torch/distributed/run.py:793] 
W0401 16:28:04.229000 337 torch/distributed/run.py:793] ****************************

# Merging Fine-tuned Model

Import packages

In [18]:
import json

Create merging script

In [20]:
args = {
    "model_name_or_path": "Qwen/Qwen2.5-VL-3B-Instruct",
    "adapter_name_or_path": "/kaggle/working/finetuned",
    "template": "qwen2_vl",
    "finetuning_type": "lora",
    "trust_remote_code": True,
    "export_dir": "/kaggle/working/merged",
    "export_size": 5,
    "export_device": "cpu",
    "export_legacy_format": False,
}
with open("merge.json", "w", encoding="utf-8") as f: 
    json.dump(args, f, ensure_ascii=False, indent=4)

Merge model

In [21]:
!llamafactory-cli export merge.json

2025-04-01 17:03:45.071478: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-04-01 17:03:45.221905: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-04-01 17:03:45.267249: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[2025-04-01 17:03:51,487] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[INFO|tokenization_utils_base.py:2060] 2025-04-01 17:03:57,717 >> loading file vocab.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-VL-3B-Instruct/snapshots/c747f21f03e7d0792c30766310bd7d8de17eeeb3/vocab.json
[INFO|tokeniz

Zip the output

In [30]:
%cd /kaggle/working
!7z a -r finetuned.zip finetuned
!7z a -r merged.zip merged

/kaggle/working

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs Intel(R) Xeon(R) CPU @ 2.00GHz (50653),ASM,AES-NI)

Open archive: finetuned.zip
--
Path = finetuned.zip
Type = zip
Physical Size = 221819401

Scanning the drive:
  0M Sca        5 folders, 40 files, 272561333 bytes (260 MiB)

Updating archive: finetuned.zip

Items to compress: 45

      3% 13 U finetuned/checkpoint-24/global_s . rank_1_mp_rank_00_optim_states.                                                                              6% 13 U finetuned/checkpoint-24/global_s . rank_1_mp_rank_00_optim_states.                                                                             10% 13 U finetuned/checkpoint-24/global_s . rank_1_mp_rank_00_optim_states.                                                                             13% 13 U finetuned/checkpoint-24/global_s . rank_1_mp_rank_00_optim_states.                    