# Multi-Task Fine-tuning of Qwen/Qwen3-14B with LoRA in Colab

This notebook demonstrates how to fine-tune the Qwen/Qwen3-14B model for multiple tasks using LoRA (Low-Rank Adaptation) and a custom data collator and trainer.

**Note:** Qwen/Qwen3-14B is a large model. You will likely need a Colab Pro subscription with access to a high-RAM GPU (e.g., A100, V100) to run this notebook successfully. Else you can replace "14B" with "0.6B" to run this demo successfully.

## 1. Setup Environment

Install the necessary libraries. Restart the runtime after installation if prompted.

Download the training dataset.

In [1]:
!pip install transformers datasets peft trl bitsandbytes accelerate safetensors torch tensorboard gdown

Collecting datasets
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting trl
  Downloading trl-0.17.0-py3-none-any.whl.metadata (12 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.45.5-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2025.3.0,>=2023.1.0 (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  

In [2]:
import os
os.makedirs("dataset", exist_ok=True)

file_id = "1-RWIK5uMI0SRlnDfVYqLx2WRtzECNkA1"
output_path = f"dataset/trainset.json"

!gdown --id {file_id} -O {output_path}

print(f"✅ Dataset Download success to : {output_path}")

Downloading...
From: https://drive.google.com/uc?id=1-RWIK5uMI0SRlnDfVYqLx2WRtzECNkA1
To: /content/dataset/trainset.json
100% 8.64M/8.64M [00:00<00:00, 54.1MB/s]
✅ Dataset Download success to : dataset/trainset.json


## 2. Configuration

Set up the parameters for the fine-tuning process. These were originally passed as command-line arguments.

In [3]:
import os
import torch
import sys

SUBSET_VAL = "fine"
WEIGHT_BETA = 1.0
WEIGHT_GAMMA = 0.0

MAX_LEN = 2048  # Max sequence length for processing
MODEL_NAME = "Qwen/Qwen3-14B" # Model from Hugging Face Hub

# --- Dataset Path ---
# IMPORTANT: Update this path if your dataset is located elsewhere (e.g., Google Drive)
DATASET_FILE_PATH = f"dataset/trainset.json"
# Example for Google Drive: DATASET_FILE_PATH = f"/content/drive/MyDrive/datasets/trainset.json"

OUTPUT_DIR = f"{SUBSET_VAL}/" # Directory to save checkpoints and final model

# Create output directory if it doesn't exist
os.makedirs(OUTPUT_DIR, exist_ok=True)

print(f"Configuration:\n"
      f"  Subset: {SUBSET_VAL}\n"
      f"  Dataset file path: {DATASET_FILE_PATH}\n"
      f"  Weight Beta (explanation loss): {WEIGHT_BETA}\n"
      f"  Weight Gamma (hunk loss): {WEIGHT_GAMMA}\n"
      f"  Max length: {MAX_LEN}\n"
      f"  Model name: {MODEL_NAME}\n"
      f"  Output directory: {OUTPUT_DIR}")

Configuration:
  Subset: fine
  Dataset file path: dataset/trainset.json
  Weight Beta (explanation loss): 1.0
  Weight Gamma (hunk loss): 0.0
  Max length: 2048
  Model name: Qwen/Qwen3-14B
  Output directory: fine/


## 3. Load Dependencies

In [4]:
from datasets import load_dataset, DatasetDict, concatenate_datasets
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
    DataCollatorForLanguageModeling,
    Trainer,
    Seq2SeqTrainer,
    Seq2SeqTrainingArguments
)
from safetensors.torch import save_model
import pandas as pd
from typing import Any, Dict, List, Optional, Tuple, Union
from peft import AutoPeftModelForCausalLM, LoraConfig, get_peft_model, prepare_model_for_kbit_training, PeftModel
from trl import SFTTrainer, SFTConfig
import torch.nn as nn
import numpy as np
import gc

logging.set_verbosity_info() # Set logging verbosity

## 4. Load Dataset

**Important:** You need to upload your dataset file (e.g., `trainset6.json`) to your Colab environment.
1. Create a directory named `dataset` in your Colab root.
2. Upload your `trainset<SUFFIX>.json` file into this `dataset` directory.
Alternatively, if your dataset is on Google Drive, mount your drive and update `DATASET_FILE_PATH` in the configuration cell.

In [5]:
if not os.path.exists(DATASET_FILE_PATH):
    print(f"ERROR: Dataset file not found at {DATASET_FILE_PATH}")
    print("Please upload your dataset to the specified path or update the DATASET_FILE_PATH variable.")
else:
    full_dataset = load_dataset("json", data_files=DATASET_FILE_PATH, split="train")
    print(f"Dataset loaded successfully: {full_dataset}")
    train_dataset = full_dataset # Using the full dataset for training as per original script
    eval_dataset = None

Generating train split: 0 examples [00:00, ? examples/s]

Dataset loaded successfully: Dataset({
    features: ['text'],
    num_rows: 1535
})


## 5. Model and Tokenizer Initialization

Load the base model with 4-bit quantization and the tokenizer.

In [6]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

base_model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.bfloat16,
    quantization_config=bnb_config,
    trust_remote_code=True,
    device_map={"":torch.cuda.current_device()} # Ensure model is on GPU
)
base_model.config.use_cache = False # Recommended for training
base_model = prepare_model_for_kbit_training(base_model)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
print(f"EOS token: {tokenizer.eos_token}, EOS token ID: {tokenizer.eos_token_id}")
# The original script uses token ID 151645 for splitting, which is <|im_start|> for Qwen
# tokenizer.decode(151645) should give '<|im_start|>'
SPLIT_TOKEN_ID = 151645
print(f"Using SPLIT_TOKEN_ID: {SPLIT_TOKEN_ID} ({tokenizer.decode(SPLIT_TOKEN_ID)}) for splitting tasks.")
END_OF_CHUNK_TOKEN_ID = 2 # Original script appends [2] which is often newline '\n'
print(f"Using END_OF_CHUNK_TOKEN_ID: {END_OF_CHUNK_TOKEN_ID} ({repr(tokenizer.decode(END_OF_CHUNK_TOKEN_ID))}) to mark end of input chunks.")

tokenizer.padding_side = "right"  # Fix for potential overflow issues with fp16 training

config.json:   0%|          | 0.00/728 [00:00<?, ?B/s]

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen3-14B/snapshots/231c69a380487f6c0e52d02dcf0d5456d1918201/config.json
Model config Qwen3Config {
  "architectures": [
    "Qwen3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 17408,
  "max_position_embeddings": 40960,
  "max_window_layers": 40,
  "model_type": "qwen3",
  "num_attention_heads": 40,
  "num_hidden_layers": 40,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.51.3",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}



model.safetensors.index.json:   0%|          | 0.00/36.5k [00:00<?, ?B/s]

loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen3-14B/snapshots/231c69a380487f6c0e52d02dcf0d5456d1918201/model.safetensors.index.json


Fetching 8 files:   0%|          | 0/8 [00:00<?, ?it/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not in

model-00006-of-00008.safetensors:   0%|          | 0.00/3.96G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/3.96G [00:00<?, ?B/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/3.96G [00:00<?, ?B/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/3.96G [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/3.96G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/3.96G [00:00<?, ?B/s]

model-00001-of-00008.safetensors:   0%|          | 0.00/3.84G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

Instantiating Qwen3ForCausalLM model under default dtype torch.bfloat16.
Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151645
}



Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

All model checkpoint weights were used when initializing Qwen3ForCausalLM.

All the weights of Qwen3ForCausalLM were initialized from the model checkpoint at Qwen/Qwen3-14B.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen3ForCausalLM for predictions without further training.


generation_config.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen3-14B/snapshots/231c69a380487f6c0e52d02dcf0d5456d1918201/generation_config.json
Generate config GenerationConfig {
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "pad_token_id": 151643,
  "temperature": 0.6,
  "top_k": 20,
  "top_p": 0.95
}



tokenizer_config.json:   0%|          | 0.00/9.68k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

loading file vocab.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen3-14B/snapshots/231c69a380487f6c0e52d02dcf0d5456d1918201/vocab.json
loading file merges.txt from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen3-14B/snapshots/231c69a380487f6c0e52d02dcf0d5456d1918201/merges.txt
loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen3-14B/snapshots/231c69a380487f6c0e52d02dcf0d5456d1918201/tokenizer.json
loading file added_tokens.json from cache at None
loading file special_tokens_map.json from cache at None
loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen3-14B/snapshots/231c69a380487f6c0e52d02dcf0d5456d1918201/tokenizer_config.json
loading file chat_template.jinja from cache at None
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


EOS token: <|im_end|>, EOS token ID: 151645
Using SPLIT_TOKEN_ID: 151645 (<|im_end|>) for splitting tasks.
Using END_OF_CHUNK_TOKEN_ID: 2 ('#') to mark end of input chunks.


## 6. Custom Data Collator

This data collator splits each input example into three parts based on a special separator token: prediction, explanation, and hunks. It assumes your input data is formatted with this separator.

In [7]:
class TaskPrefixDataCollator(DataCollatorForLanguageModeling):
    def __call__(self, features, return_tensors=None):
        pred_features, expl_features, hunk_features_list = [], [], []

        for feature in features:
            # Assuming 'text' field is tokenized into 'input_ids' and 'attention_mask'
            # If your dataset loading doesn't do this automatically, you might need to tokenize here or earlier
            if 'input_ids' not in feature:
                 # Example tokenization (adapt if needed based on how your dataset is structured)
                 tokenized = tokenizer(feature['text'], truncation=True, max_length=MAX_LEN + 100, padding=False) # Pad later in collator
                 input_ids = tokenized['input_ids']
                 attention_mask = tokenized['attention_mask']
            else:
                 input_ids = feature['input_ids']
                 attention_mask = feature['attention_mask']
            # print(f"Original input_ids length: {len(input_ids)}")

            # Find indices of the split token (e.g., <|im_end|>, ID 151645 for Qwen)
            split_indices = [i for i, x in enumerate(input_ids) if x == SPLIT_TOKEN_ID]

            # Ensure at least three split points for prefix, task1_content, task2_content, task3_content ...
            # Format expected: <prefix><SPLIT_TOKEN_ID><pred_content><SPLIT_TOKEN_ID><expl_content><SPLIT_TOKEN_ID><hunk1_content><SPLIT_TOKEN_ID>...<hunkN_content>
            if len(split_indices) < 3: # Needs prefix, pred, expl separators
                print(f"Warning: Not enough split points ({len(split_indices)}) found in an example. Expected at least 3. Skipping example.")
                # print(f"Problematic input_ids: {input_ids}")
                # print(f"Decoded: {tokenizer.decode(input_ids)}")
                continue

            prefix_ids = input_ids[:split_indices[0]]
            prefix_mask = attention_mask[:split_indices[0]]

            # Prediction task: prefix + prediction content
            pred_content_ids = input_ids[split_indices[0]+1:split_indices[1]]
            pred_input_ids = (prefix_ids + pred_content_ids)[:MAX_LEN-1] + [END_OF_CHUNK_TOKEN_ID]
            pred_input_mask = (prefix_mask + attention_mask[split_indices[0]+1:split_indices[1]])[:MAX_LEN-1] + [1]
            pred_features.append({
                'input_ids': pred_input_ids,
                'attention_mask': pred_input_mask
            })

            # Explanation task: prefix + explanation content
            expl_content_ids = input_ids[split_indices[1]+1:split_indices[2]]
            expl_input_ids = (prefix_ids + expl_content_ids)[:MAX_LEN-1] + [END_OF_CHUNK_TOKEN_ID]
            expl_input_mask = (prefix_mask + attention_mask[split_indices[1]+1:split_indices[2]])[:MAX_LEN-1] + [1]
            expl_features.append({
                'input_ids': expl_input_ids,
                'attention_mask': expl_input_mask
            })

            # Hunk tasks: prefix + hunk_i content
            current_hunk_batch = []
            # Iterate through hunk separators until the end
            for i in range(2, len(split_indices)):
                 start_idx = split_indices[i] + 1
                 end_idx = split_indices[i+1] if (i + 1) < len(split_indices) else len(input_ids) # Go to end if last hunk
                 hunk_content_ids = input_ids[start_idx:end_idx]
                 if not hunk_content_ids: # Skip if a hunk segment is empty
                     # print(f"Warning: Empty hunk segment detected at index {i}. Split indices: {split_indices}")
                     continue

                 hunk_input_ids = (prefix_ids + hunk_content_ids)[:MAX_LEN-1] + [END_OF_CHUNK_TOKEN_ID]
                 hunk_input_mask = (prefix_mask + attention_mask[start_idx:end_idx])[:MAX_LEN-1] + [1]
                 current_hunk_batch.append({
                     'input_ids': hunk_input_ids,
                     'attention_mask': hunk_input_mask
                 })
            if current_hunk_batch: # only add if hunks were processed
                 hunk_features_list.append(current_hunk_batch)
            elif WEIGHT_GAMMA != 0.0: # If gamma is non-zero, we expect hunks
                print(f"Warning: No hunks processed for an example, but WEIGHT_GAMMA is {WEIGHT_GAMMA}. Split indices: {split_indices}")

        if not pred_features or not expl_features:
             # This can happen if all examples in a batch are skipped or invalid
             print("Warning: No valid prediction or explanation features to collate after processing. Batch might be empty or all examples were invalid.")
             # Return empty/dummy batch structure expected by the trainer
             dummy_batch = super().__call__([tokenizer("", return_tensors="pt")], return_tensors=return_tensors) # Create a dummy batch using base class
             # Need labels for loss computation, clone input_ids for Causal LM
             if 'input_ids' in dummy_batch: dummy_batch['labels'] = dummy_batch['input_ids'].clone()

             return {
                 'pred': dummy_batch,
                 'expl': dummy_batch,
                 'hunk': [], # Hunks expect a list of batches
             }

        if WEIGHT_GAMMA != 0.0 and not hunk_features_list:
             # If gamma is non-zero but no hunks were found in the *entire batch*, issue a warning.
             # We still proceed with pred/expl.
             print(f"Warning: WEIGHT_GAMMA is {WEIGHT_GAMMA}, but no valid hunk features found in the entire batch.")

        # Use base class's __call__ to handle padding and tensor conversion for each task type
        collated_pred_features = super().__call__(pred_features, return_tensors)
        collated_expl_features = super().__call__(expl_features, return_tensors)

        collated_hunk_features_batches = []
        if WEIGHT_GAMMA != 0.0:
            for hunk_batch in hunk_features_list: # each item is a list of hunk dicts for ONE original example
                if hunk_batch: # if there are actual hunks for this example
                     # Collate the hunks belonging to the *same original example* together
                     collated_hunks_for_example = super().__call__(hunk_batch, return_tensors)
                     collated_hunk_features_batches.append(collated_hunks_for_example)

        if 'labels' not in collated_pred_features and 'input_ids' in collated_pred_features:
            collated_pred_features['labels'] = collated_pred_features['input_ids'].clone()
        if 'labels' not in collated_expl_features and 'input_ids' in collated_expl_features:
            collated_expl_features['labels'] = collated_expl_features['input_ids'].clone()
        for hunk_batch in collated_hunk_features_batches:
             if 'labels' not in hunk_batch and 'input_ids' in hunk_batch:
                  hunk_batch['labels'] = hunk_batch['input_ids'].clone()

        return {
            'pred': collated_pred_features,
            'expl': collated_expl_features,
            'hunk': collated_hunk_features_batches, # This is now a list of batches, one per original example that had hunks
        }


## 7. Custom Trainer

This custom trainer overrides the `compute_loss` method to calculate a weighted loss across the three tasks (prediction, explanation, and hunks).

In [8]:
class CustomTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False, **kwargs):
        pred_inputs = inputs.get('pred')
        expl_inputs = inputs.get('expl')
        hunk_inputs_list = inputs.get('hunk') # This is a list of batches

        total_loss = 0
        pred_loss = 0
        expl_loss = 0
        hunk_loss_val = 0 # Initialize to float
        num_hunk_batches_processed = 0

        # Prediction task loss
        if pred_inputs and pred_inputs.get('input_ids').numel() > 0 : # Check if pred_inputs is not empty
            outputs_pred = model(**pred_inputs)
            pred_loss = outputs_pred.loss
            total_loss += pred_loss

        # Explanation task loss
        if WEIGHT_BETA > 0 and expl_inputs and expl_inputs.get('input_ids').numel() > 0: # Check if expl_inputs is not empty
            outputs_expl = model(**expl_inputs)
            expl_loss = outputs_expl.loss
            total_loss += WEIGHT_BETA * expl_loss
        elif WEIGHT_BETA > 0:
            # print("Skipping explanation loss, expl_inputs is empty or invalid but WEIGHT_BETA > 0.")
            pass

        # Hunk task loss - careful here, hunk_inputs_list is a list of batch dictionaries
        if WEIGHT_GAMMA > 0 and hunk_inputs_list:
            current_hunk_loss_sum = 0
            for hunk_batch_inputs in hunk_inputs_list: # Iterate over list of batches
                if hunk_batch_inputs and hunk_batch_inputs.get('input_ids').numel() > 0 : # Check if batch is not empty
                    outputs_hunk = model(**hunk_batch_inputs)
                    current_hunk_loss_sum += outputs_hunk.loss
                    num_hunk_batches_processed += 1

            if num_hunk_batches_processed > 0:
                hunk_loss_val = current_hunk_loss_sum / num_hunk_batches_processed # Average loss over hunk batches
                total_loss += WEIGHT_GAMMA * hunk_loss_val

        # Log individual losses
        self.log({
            "pred_loss": pred_loss.item() if isinstance(pred_loss, torch.Tensor) else pred_loss,
            "expl_loss": expl_loss.item() if isinstance(expl_loss, torch.Tensor) else expl_loss,
            "hunk_loss": hunk_loss_val.item() if isinstance(hunk_loss_val, torch.Tensor) else hunk_loss_val,
            "total_weighted_loss": total_loss.item() if isinstance(total_loss, torch.Tensor) else total_loss
        })

        return (total_loss, {"pred_outputs": outputs_pred, "expl_outputs": outputs_expl}) if return_outputs else total_loss


## 8. LoRA Configuration

Set up the LoRA (Low-Rank Adaptation) configuration for efficient fine-tuning.

In [9]:
# LoRA configuration
lora_config = LoraConfig(
    r=16,                             # Rank of the LoRA matrices
    lora_alpha=32,                    # Alpha parameter for LoRA scaling
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj"
    ],                                # Modules to apply LoRA to (specific to Qwen architecture)
    lora_dropout=0.05,                # Dropout probability for LoRA layers
    bias="none",                      # Bias type for LoRA. 'none' is common.
    task_type="CAUSAL_LM"             # Task type
)

# Apply LoRA to the model
model = get_peft_model(base_model, lora_config)

# Print a summary of the trainable parameters
model.print_trainable_parameters()

# Instantiate the custom data collator
data_collator = TaskPrefixDataCollator(tokenizer=tokenizer, mlm=False) # mlm=False for Causal LM


trainable params: 64,225,280 || all params: 14,832,532,480 || trainable%: 0.4330


## 9. Training Arguments

Configure the training arguments. Adjust these based on your available resources and desired training time.

In [10]:
# Training arguments
training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    num_train_epochs=3,                     # Number of training epochs (adjust as needed)
    per_device_train_batch_size=1,          # Batch size per GPU (reduce if OOM errors)
    gradient_accumulation_steps=1,          # Accumulate gradients over X steps (effective batch size = X * per_device_train_batch_size)
    gradient_checkpointing=True,            # Use gradient checkpointing to save memory
    optim="paged_adamw_32bit",              # Optimizer
    save_steps=200,                         # Save checkpoint every X steps
    logging_steps=20,                       # Log metrics every X steps
    learning_rate=2e-4,                     # Learning rate
    weight_decay=0.001,                     # Weight decay
    fp16=False,                             # Set to True if your GPU supports FP16 and you want faster training
    bf16=True,                              # Set to True if your GPU supports BF16 (e.g., A100, H100)
    max_grad_norm=0.3,                      # Max gradient norm for clipping
    max_steps=-1,                           # Number of training steps (set to -1 for full epochs)
    warmup_ratio=0.03,                      # Warmup ratio for learning rate scheduler
    group_by_length=False,                  # Group sequences by length (can improve efficiency)
    lr_scheduler_type="constant",           # Learning rate scheduler type
    report_to="tensorboard",                # Log to tensorboard
    # evaluation_strategy="steps" if eval_dataset else "no", # Evaluate periodically if eval_dataset exists
    # eval_steps=200 if eval_dataset else None, # Evaluation frequency
    save_total_limit=2,                     # Only keep the last 2 checkpoints
    load_best_model_at_end=False,           # Whether to load the best model (if evaluating) at the end
    remove_unused_columns=False,            # Important for custom collator that expects 'text' or specific structures
)


PyTorch: setting up devices


## 10. Initialize Trainer and Start Training

Initialize the custom trainer with the model, datasets, tokenizer, data collator, and training arguments. Then, start the training process.

In [11]:
# Initialize the CustomTrainer
trainer = CustomTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset, # Will be None if not created
    tokenizer=tokenizer,
    data_collator=data_collator,
)

# Start training
print("Starting training...")
trainer.train()

# Save the fine-tuned model
print("Saving model...")
trainer.save_model(OUTPUT_DIR) # Save the LoRA adapter

print(f"Training complete. Model saved to {OUTPUT_DIR}")

# Clean up GPU memory (optional, but good practice in Colab)
del model
del base_model
del trainer
gc.collect()
torch.cuda.empty_cache()


  trainer = CustomTrainer(
Using auto half precision backend
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Starting training...


***** Running training *****
  Num examples = 1,535
  Num Epochs = 3
  Instantaneous batch size per device = 1
  Total train batch size (w. parallel, distributed & accumulation) = 1
  Gradient Accumulation steps = 1
  Total optimization steps = 4,605
  Number of trainable parameters = 64,225,280
Could not estimate the number of tokens of the input, floating-point operations will not be computed


Step,Training Loss
20,1.9699
40,1.2676
60,1.0871
80,0.8571
100,0.776
120,0.6558
140,0.7085
160,0.7259
180,0.5436
200,0.5789


Saving model checkpoint to fine/checkpoint-200
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen3-14B/snapshots/231c69a380487f6c0e52d02dcf0d5456d1918201/config.json
Model config Qwen3Config {
  "architectures": [
    "Qwen3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 17408,
  "max_position_embeddings": 40960,
  "max_window_layers": 40,
  "model_type": "qwen3",
  "num_attention_heads": 40,
  "num_hidden_layers": 40,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.51.3",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}

tokenizer config file saved i

Saving model...


tokenizer config file saved in fine/tokenizer_config.json
Special tokens file saved in fine/special_tokens_map.json


Training complete. Model saved to fine/


## 11. Inference with the Fine-tuned Model

This section demonstrates how to load the fine-tuned LoRA adapters and use the model for inference.

**Note:**
* If you saved the full merged model, you would load it directly using `AutoModelForCausalLM.from_pretrained("YOUR_OUTPUT_DIR/final_merged_checkpoint")` and `AutoTokenizer.from_pretrained("YOUR_OUTPUT_DIR/final_merged_checkpoint")`.
* For LoRA, we load the base model and then apply the saved adapters.

In [19]:
from peft import PeftModel
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import gc
import re
from transformers import pipeline
import difflib # Import difflib to calculate differences

# --- Configuration ---
PEFT_MODEL_PATH = OUTPUT_DIR # Assumes OUTPUT_DIR is defined in a previous cell

if 'bnb_config' not in globals():
    print("Re-defining bnb_config...")
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True, bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True,
    )

print(f"Loading base model: {MODEL_NAME}")
inference_base_model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME, quantization_config=bnb_config, torch_dtype=torch.bfloat16,
    trust_remote_code=True, device_map="auto"
)

print(f"Loading tokenizer for: {MODEL_NAME}")
inference_tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
inference_tokenizer.pad_token = inference_tokenizer.eos_token
inference_tokenizer.padding_side = "left"

print(f"Loading LoRA adapters from: {PEFT_MODEL_PATH}")
inference_model = PeftModel.from_pretrained(inference_base_model, PEFT_MODEL_PATH)
inference_model = inference_model.eval()
print("Fine-tuned model ready.")

BOF = '<|system|><|im_end|><|user|>'
EOF = '<|im_end|><|assistant|>'

def extract_first_cpp_code(text_to_search_in):
    match = re.search(r"```cpp\n(.*?)\n```", text_to_search_in, re.DOTALL)
    if match: return match.group(1).strip()
    match = re.search(r"```c\+\+\n(.*?)\n```", text_to_search_in, re.DOTALL)
    if match: return match.group(1).strip()
    return None

print(f"Creating pipeline on device: {inference_model.device}")
pipe = pipeline("text-generation", model=inference_model, tokenizer=inference_tokenizer)
print("Pipeline created.")

_incorrect_code_text = """/*
Given a non-empty vector of integers lst. add the even elements that are at odd indices..


Examples:
    add({4, 2, 6, 7}) ==> 2
*/
#include<stdio.h>
#include<vector>
using namespace std;
int add(vector<int> lst){
    int sum=0;
    for (int i=0;i*2+1<lst.size();i++)
        if (lst[i*2+1]%2==1) sum+=lst[i*2+1];
    return sum;
}
"""

def generate_repaired_code_via_pipeline(incorrect_code, tokenizer_for_pipe, pipe_instance):
    filename_placeholder = "add.cpp"
    prompt_suffix_for_assistant = """/*
Given a non-empty vector of integers lst. add the even elements that are at odd indices..


Examples:
    add({4, 2, 6, 7}) ==> 2
*/
#include<stdio.h>
#include<vector>
using namespace std;
int add(vector<int> lst){
"""
    prompt = (BOF +
              f" This is an incorrect code ({filename_placeholder}):\n```c++\n{incorrect_code}\n```\n" +
              "You are a software engineer. Can you repair the incorrect code?\n" +
              EOF + "\n```c++\n" + prompt_suffix_for_assistant)

    print(f"\n--- Generating repair (humaneval-cpp.py style) ---")
    prompt_token_count = len(tokenizer_for_pipe.tokenize(prompt))
    min_new_tokens = 64
    max_new_tokens = 512
    max_attempts = 2

    original_padding_side = tokenizer_for_pipe.padding_side
    if tokenizer_for_pipe.padding_side != "left": tokenizer_for_pipe.padding_side = "left"

    outputs = pipe_instance(
        prompt, min_length=prompt_token_count + min_new_tokens,
        max_length=prompt_token_count + max_new_tokens,
        temperature=0.3, do_sample=True, num_return_sequences=1,
        pad_token_id=tokenizer_for_pipe.eos_token_id,
        eos_token_id=tokenizer_for_pipe.eos_token_id
    )
    if tokenizer_for_pipe.padding_side != original_padding_side: tokenizer_for_pipe.padding_side = original_padding_side

    full_generated_text = outputs[0]['generated_text']
    parts_after_eof = full_generated_text.split(EOF, 1)

    assistant_response = parts_after_eof[1].strip()
    return extract_first_cpp_code(assistant_response)

# --- Execute Inference ---
repaired_code_result = generate_repaired_code_via_pipeline(_incorrect_code_text, inference_tokenizer, pipe)

print(f"\n--- Final Repaired Code ---")
print(repaired_code_result)

# --- Calculate and Print Diff (Filtered) ---
if not repaired_code_result.startswith("// Error"):
    print(f"\n--- Diff showing only content changes ---")
    original_lines = _incorrect_code_text.splitlines(keepends=True)
    repaired_lines = repaired_code_result.splitlines(keepends=True)

    diff = difflib.unified_diff(
        original_lines, repaired_lines,
        fromfile='original_buggy.cpp', tofile='repaired_generated.cpp',
        lineterm='\n'
    )

    diff_output_exists = False
    for line in diff:
        if line.startswith(('---', '+++', '@@', ' ')):
            print(line, end='')
            diff_output_exists = True
        elif line.startswith(('-', '+')) and line[1:].strip(): # Check if content exists after +/-
            print(line, end='')
            diff_output_exists = True

    if not diff_output_exists:
         print("(No significant content differences found, only whitespace changes)")

else:
    print("\n--- Diff not calculated due to generation error ---")


loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen3-14B/snapshots/231c69a380487f6c0e52d02dcf0d5456d1918201/config.json
Model config Qwen3Config {
  "architectures": [
    "Qwen3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 17408,
  "max_position_embeddings": 40960,
  "max_window_layers": 40,
  "model_type": "qwen3",
  "num_attention_heads": 40,
  "num_hidden_layers": 40,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.51.3",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}

loading weights file model.safetensors from cache at /root/.cache/huggingfac

Loading base model: Qwen/Qwen3-14B


target_dtype {target_dtype} is replaced by `CustomDtype.INT4` for 4-bit BnB quantization


Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

All model checkpoint weights were used when initializing Qwen3ForCausalLM.

All the weights of Qwen3ForCausalLM were initialized from the model checkpoint at Qwen/Qwen3-14B.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen3ForCausalLM for predictions without further training.
loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen3-14B/snapshots/231c69a380487f6c0e52d02dcf0d5456d1918201/generation_config.json
Generate config GenerationConfig {
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "pad_token_id": 151643,
  "temperature": 0.6,
  "top_k": 20,
  "top_p": 0.95
}

loading file vocab.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen3-14B/snapshots/231c69a380487f6c0e52d02dcf0d5456d1918201/vocab.json
loading file merges.txt from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen3-14B/snapshots/231c69a380

Loading tokenizer for: Qwen/Qwen3-14B


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading LoRA adapters from: fine/


Device set to use cuda:0
The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['AriaTextForCausalLM', 'BambaForCausalLM', 'BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CohereForCausalLM', 'Cohere2ForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'DbrxForCausalLM', 'DeepseekV3ForCausalLM', 'DiffLlamaForCausalLM', 'ElectraForCausalLM', 'Emu3ForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'FalconMambaForCausalLM', 'FuyuForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'Gemma3ForConditionalGeneration', 'Gemma3ForCausalLM', 'GitForCausalLM', 'GlmForCausalLM', 'Glm4ForCausalLM', 'GotOcr2ForConditionalGeneration', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoFo

Fine-tuned model ready.
Creating pipeline on device: cuda:0
Pipeline created.

--- Generating repair (humaneval-cpp.py style) ---

--- Final Repaired Code ---
/*
Given a non-empty vector of integers lst. add the even elements that are at odd indices..


Examples:
    add({4, 2, 6, 7}) ==> 2
*/
#include<stdio.h>
#include<vector>
using namespace std;
int add(vector<int> lst){
    int sum=0;
    for (int i=0;i*2+1<lst.size();i++)
        if (lst[i*2+1]%2==0) sum+=lst[i*2+1];
    return sum;
}

--- Diff showing only content changes ---
--- original_buggy.cpp
+++ repaired_generated.cpp
@@ -11,6 +11,6 @@
 int add(vector<int> lst){
     int sum=0;
     for (int i=0;i*2+1<lst.size();i++)
-        if (lst[i*2+1]%2==1) sum+=lst[i*2+1];
+        if (lst[i*2+1]%2==0) sum+=lst[i*2+1];
     return sum;
-}
+}