# Fine-Tune Qwen3-0.6B and Qwen3-1.7B on Kaggle

This notebook fine-tunes Qwen3-0.6B and Qwen3-1.7B models on a BrainDrive Q&A dataset using Hugging Face `transformers` and `peft` with LoRA. It runs on Kaggle's T4 x2 GPUs, installs dependencies, fetches models from Hugging Face, and converts the fine-tuned models to GGUF format for download.

## Prerequisites
- **Dataset**: Upload `braindrive_qa_dataset.jsonl` to Kaggle as a dataset (e.g., `/kaggle/input/braindrive-qa/braindrive_qa_dataset.jsonl`).
- **Kaggle Settings**: Enable GPU (T4 x2) in the notebook settings.
- **Output**: Fine-tuned models and GGUF files will be saved to `/kaggle/working/` for download.

## Steps
1. Install dependencies.
2. Fine-tune Qwen3-0.6B and Qwen3-1.7B with LoRA.
3. Convert models to GGUF format.
4. Save outputs for download.

In [1]:
# Install dependencies
!pip install torch transformers peft trl datasets accelerate bitsandbytes numpy rich fsspec protobuf google-api-core huggingface_hub
!pip install numpy rich fsspec protobuf google-api-core

# Install llama.cpp without dependencies to avoid protobuf conflicts
# !pip install git+https://github.com/ggerganov/llama.cpp.git --no-deps --no-cache-dir -q

# Verify GPU availability
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU count: {torch.cuda.device_count()}")
print(f"GPU name: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}")

Collecting trl
  Downloading trl-0.17.0-py3-none-any.whl.metadata (12 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.45.5-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch)
  Downloading nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cusolver-cu12==11.6.1.9 (from torch)
  Downloading nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cusparse-cu12==12.3.1.170 (from torch)
  Downloading nvidi

## Setup Logging and Imports

Configure logging to track progress and import required libraries.

In [2]:
import os
import logging
from logging.handlers import RotatingFileHandler
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model
from datasets import load_dataset
from trl import SFTTrainer
import torch

# Configure logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')

# Console handler
console_handler = logging.StreamHandler()
console_handler.setFormatter(formatter)
logger.addHandler(console_handler)

# File handler
os.makedirs('/kaggle/working/logs', exist_ok=True)
file_handler = RotatingFileHandler('/kaggle/working/logs/finetune_qwen3.log', maxBytes=10*1024*1024, backupCount=5)
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)

2025-05-19 01:08:32.917232: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1747616913.352057      35 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1747616913.476987      35 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


## Define Utility Functions

Define a function to check dataset token lengths and ensure compatibility with the model's max sequence length.

In [3]:
def check_dataset_lengths(dataset_path, model_name, max_seq_length):
    """Check token lengths in dataset to validate max_seq_length."""
    logger.info("Checking dataset token lengths")
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    dataset = load_dataset("json", data_files=dataset_path)["train"]
    lengths = []
    for example in dataset:
        text = tokenizer.apply_chat_template(example["messages"], tokenize=False)
        length = len(tokenizer(text).input_ids)
        lengths.append(length)
    max_length = max(lengths)
    percentile_95 = sorted(lengths)[int(0.95 * len(lengths))]
    logger.info(f"Dataset token lengths: Max={max_length}, 95th percentile={percentile_95}")
    if max_length > max_seq_length:
        logger.warning(f"Some examples exceed max_seq_length={max_seq_length}. Consider increasing or truncating.")
    return max_length, percentile_95

## Fine-Tune Function

Define a reusable function to fine-tune a model with LoRA, save it, and convert to GGUF.

In [4]:
from transformers import DataCollatorForLanguageModeling

def finetune_model(model_name, output_dir, dataset_path, max_seq_length=2048):
    """Fine-tune a model with LoRA and save it."""
    logger.info(f"Starting fine-tuning for {model_name}")

    # 4-bit quantization config
    # bnb_config = BitsAndBytesConfig(
    #     load_in_4bit=True,
    #     bnb_4bit_quant_type="nf4",
    #     bnb_4bit_compute_dtype=torch.float16,
    #     bnb_4bit_use_double_quant=True,
    # )

    # Load model and tokenizer
    logger.info(f"Loading model and tokenizer: {model_name}")
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        # quantization_config=bnb_config,
        device_map={'': torch.cuda.current_device()},  # Explicitly place on primary GPU
        torch_dtype=torch.float16,
    )
    tokenizer = AutoTokenizer.from_pretrained(model_name)

    # Configure LoRA
    logger.info("Configuring LoRA adapters")
    lora_config = LoraConfig(
        r=32,
        lora_alpha=64,
        target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
        lora_dropout=0.05,
        bias="none",
        task_type="CAUSAL_LM",
    )
    model = get_peft_model(model, lora_config)

    # Load dataset
    logger.info("Loading and preprocessing dataset")
    dataset = load_dataset("json", data_files=dataset_path)

    # Check token lengths
    max_length, percentile_95 = check_dataset_lengths(dataset_path, model_name, max_seq_length)

    # Format dataset
    def format_chat(example):
        return {"text": tokenizer.apply_chat_template(example["messages"], tokenize=False)}
    dataset = dataset.map(format_chat, num_proc=4)

    # Split dataset
    logger.info("Splitting dataset for training and evaluation")
    dataset = dataset["train"].train_test_split(test_size=0.1, seed=42)
    train_dataset = dataset["train"]
    eval_dataset = dataset["test"]
    logger.info(f"Dataset size: {len(train_dataset)} train, {len(eval_dataset)} eval")

    # Training arguments
    logger.info("Setting up training arguments")
    training_args = TrainingArguments(
        output_dir=output_dir,
        per_device_train_batch_size=1,
        per_device_eval_batch_size=1,
        gradient_accumulation_steps=16,
        learning_rate=2e-5,
        num_train_epochs=10,
        save_strategy="epoch",
        eval_strategy="epoch",
        logging_steps=10,
        fp16=True,
        optim="adamw_torch",
        warmup_ratio=0.1,
        lr_scheduler_type="cosine",
        report_to="none",
        save_total_limit=1,
        load_best_model_at_end=True,
        metric_for_best_model="eval_loss",
    )

    # Initialize data collator
    data_collator = DataCollatorForLanguageModeling(
        tokenizer=tokenizer,
        mlm=False,  # Causal LM, not masked LM
    )

    # Initialize trainer
    logger.info("Initializing SFTTrainer")
    trainer = SFTTrainer(
        model=model,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        data_collator=data_collator,
        args=training_args,
    )

    # Train
    logger.info("Starting training")
    trainer.train()

    # Save model
    logger.info(f"Saving fine-tuned model to {output_dir}")
    model.save_pretrained(output_dir)
    tokenizer.save_pretrained(output_dir)

    # Note: GGUF conversion moved to local machine
    logger.info(f"Model saved to {output_dir}. GGUF conversion to be performed locally.")

    return model, tokenizer

## Fine-Tune Qwen3-0.6B

Fine-tune the smaller Qwen3-0.6B model first.

In [7]:
# Dataset path (adjust to your Kaggle dataset path)
dataset_path = "/kaggle/input/braindrive-concierge-qa-pairs/braindrive_qa_dataset.jsonl"

# Fine-tune Qwen3-0.6B
model_06b, tokenizer_06b = finetune_model(
    model_name="Qwen/Qwen3-0.6B",
    output_dir="/kaggle/working/finetuned_qwen3-0.6B",
    dataset_path=dataset_path,
    max_seq_length=2048
)

2025-05-19 01:11:06,414 - INFO - Starting fine-tuning for Qwen/Qwen3-0.6B
2025-05-19 01:11:06,416 - INFO - Loading model and tokenizer: Qwen/Qwen3-0.6B
2025-05-19 01:11:07,947 - INFO - Configuring LoRA adapters
2025-05-19 01:11:08,181 - INFO - Loading and preprocessing dataset
2025-05-19 01:11:08,294 - INFO - Checking dataset token lengths
2025-05-19 01:11:09,776 - INFO - Dataset token lengths: Max=280, 95th percentile=156
2025-05-19 01:11:09,959 - INFO - Splitting dataset for training and evaluation
2025-05-19 01:11:09,963 - INFO - Dataset size: 1089 train, 121 eval
2025-05-19 01:11:09,964 - INFO - Setting up training arguments
2025-05-19 01:11:09,993 - INFO - Initializing SFTTrainer
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
2025-05-19 01:11:11,020 - INFO - Starting traini

Epoch,Training Loss,Validation Loss
1,1.5532,2.426454
2,0.8826,1.576433
3,0.7302,1.490416
4,0.8013,1.449658
5,0.717,1.423204
6,0.7296,1.404234
7,0.6568,1.36547
8,0.7643,1.354662
9,0.6671,1.350945


2025-05-19 01:36:39,949 - INFO - Saving fine-tuned model to /kaggle/working/finetuned_qwen3-0.6B
2025-05-19 01:36:40,329 - INFO - Model saved to /kaggle/working/finetuned_qwen3-0.6B. GGUF conversion to be performed locally.


## Fine-Tune Qwen3-1.7B

Fine-tune the larger Qwen3-1.7B model.

In [None]:
# Fine-tune Qwen3-1.7B
model_17b, tokenizer_17b = finetune_model(
    model_name="Qwen/Qwen3-1.7B",
    output_dir="/kaggle/working/finetuned_qwen3-1.7B",
    dataset_path=dataset_path,
    max_seq_length=2048
)

In [None]:
# Merge Lora with base below!
# !zip -r /kaggle/working/finetuned_qwen3_models.zip /kaggle/working/finetuned_qwen3-0.6B /kaggle/working/finetuned_qwen3-1.7B
# print("Zipped models saved to /kaggle/working/finetuned_qwen3_models.zip")

In [8]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
import os

# Paths
base_model_name = "Qwen/Qwen3-0.6B" # "Qwen/Qwen3-1.7B"
lora_adapter_path = "/kaggle/working/finetuned_qwen3-0.6B"
merged_model_path = "/kaggle/working/merged_qwen3-0.6B"

# Load base model and tokenizer
print("Loading base model...")
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

# Load LoRA adapters
print("Loading LoRA adapters...")
model = PeftModel.from_pretrained(base_model, lora_adapter_path)

# Merge adapters with base model
print("Merging adapters...")
merged_model = model.merge_and_unload()

# Save merged model
print(f"Saving merged model to {merged_model_path}...")
os.makedirs(merged_model_path, exist_ok=True)
merged_model.save_pretrained(merged_model_path)
tokenizer.save_pretrained(merged_model_path)
print("Merged model saved.")

Loading base model...
Loading LoRA adapters...
Merging adapters...
Saving merged model to /kaggle/working/merged_qwen3-0.6B...
Merged model saved.


In [10]:
!zip -r /kaggle/working/merged_qwen3_0.6B.zip /kaggle/working/merged_qwen3-0.6B/*
!ls -dh /kaggle/working/merged_qwen3_0.6B.zip

/kaggle/working/merged_qwen3_0.6B.zip


In [12]:
from IPython.display import FileLink
print("If this takes you to a 404 page > take a smoke break and come back...\nFor whatever reason Kaggle takes it's sweet time getting this ready to download")
FileLink('/kaggle/working/merged_qwen3_0.6B.zip')


If this takes you to a 404 page > take a smoke break and come back...
For whatever reason Kaggle takes it's sweet time getting this ready to download


In [2]:
# Delete a single file
# !rm /kaggle/working/finetuned_qwen3_models.zip

# Delete the entire directory
!rm -rf /kaggle/working/logs