# Qwen2.5-VL Error Bar Detection Fine-tuning

This notebook fine-tunes Qwen2.5-VL-7B-Instruct for error bar detection in scientific plots.

## Task:
- **Input**: Scientific plot image + data point coordinates (x, y)
- **Output**: Error bar distances (topBarPixelDistance, bottomBarPixelDistance)

## Approach:
- Use Qwen2.5-VL (Vision-Language Model) with LoRA fine-tuning
- Train on labeled data with known error bar distances
- Evaluate on test data

## 1. Install Required Libraries

In [1]:
# Install libraries for VLM fine-tuning
!pip install -U transformers accelerate bitsandbytes -q
!pip install -U peft trl datasets -q
!pip install -U huggingface_hub -q
!pip install pillow pandas tqdm -q
!pip install qwen-vl-utils -q  # For Qwen VL processing

print("All libraries installed successfully!")

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.1/10.1 MB[0m [31m79.3 MB/s[0m eta [36m0:00:00[0m:00:01[0m:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m380.9/380.9 kB[0m [31m24.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.1/59.1 MB[0m [31m30.6 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m536.7/536.7 kB[0m [31m33.5 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sentence-transformers 5.1.1 requires transformers<5.0.0,>=4.41.0, but you have transformers 5.0.0 which is incompatible.
gradio 5.49.1 requires pydantic<2.12,>=2.0, but you have pydantic 2.12.5 which is incompatible.[0m[31m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m557.0/557.

## 2. Import Libraries

In [2]:
import torch
import os
import json
import gc
import numpy as np
from pathlib import Path
from typing import List, Dict, Optional, Tuple
from PIL import Image
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

# Transformers
from transformers import (
    Qwen2_5_VLForConditionalGeneration,
    AutoProcessor,
    BitsAndBytesConfig,
    TrainingArguments,
)

# PEFT for LoRA
from peft import (
    LoraConfig,
    get_peft_model,
    prepare_model_for_kbit_training,
    TaskType,
    PeftModel,
)

# Datasets
from datasets import Dataset

# HuggingFace Hub
from huggingface_hub import login, HfApi, create_repo

# Check GPU
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

PyTorch version: 2.8.0+cu126
CUDA available: True
GPU: Tesla T4
GPU Memory: 15.8 GB


## 3. Configuration

In [3]:
import os

# ============== DATA PATHS (Kaggle) ==============
BASE_PATH = "/kaggle/input/graph-plots"

# Training data
TRAIN_IMAGES = os.path.join(BASE_PATH, "Train", "images")
TRAIN_LABELS = os.path.join(BASE_PATH, "Train", "labels")  # Ground truth with error bars

# Test data
TEST_IMAGES = os.path.join(BASE_PATH, "Test", "images")
TEST_INPUT_LABELS = os.path.join(BASE_PATH, "Test", "test_labels")  # Input: x,y only
TEST_GROUND_TRUTH = os.path.join(BASE_PATH, "Test", "labels")  # Ground truth with error bars

# ============== MODEL CONFIGURATION ==============
BASE_MODEL = "Qwen/Qwen2.5-VL-7B-Instruct"
NEW_MODEL_NAME = "Chartqwen"

# HuggingFace Hub
HF_USERNAME = "Sayeem26s"
HF_REPO_NAME = "Chartqwen"
HF_REPO_ID = f"{HF_USERNAME}/{HF_REPO_NAME}"

# ============== TRAINING CONFIGURATION ==============
OUTPUT_DIR = "/kaggle/working/qwen_vl_lora_output"
CHECKPOINT_DIR = "/kaggle/working/qwen_vl_checkpoints"

MAX_SEQ_LENGTH = 2048
IMAGE_MAX_SIZE = 768

BATCH_SIZE = 1  # VLM requires small batch size
GRADIENT_ACCUMULATION_STEPS = 16  # Effective batch = 16

LEARNING_RATE = 2e-4  # Slightly higher for faster convergence
NUM_EPOCHS = 1  # 1 epoch for 600 samples
WARMUP_RATIO = 0.03

SAVE_STEPS = 30  # Save every 30 steps
LOGGING_STEPS = 5  # Log frequently

MAX_TRAIN_SAMPLES = 600  # Fast training: ~1-1.5 hours
MAX_GRAD_NORM = 1.0

# ============== LoRA CONFIGURATION ==============
LORA_R = 32
LORA_ALPHA = 64
LORA_DROPOUT = 0.05

# ============== CREATE DIRECTORIES ==============
os.makedirs(OUTPUT_DIR, exist_ok=True)
os.makedirs(CHECKPOINT_DIR, exist_ok=True)

# ============== PRINT CONFIGURATION ==============
print("Configuration:")
print(f"  Base Model          : {BASE_MODEL}")
print(f"  New Model           : {NEW_MODEL_NAME}")
print(f"  Train Images        : {TRAIN_IMAGES}")
print(f"  Train Labels        : {TRAIN_LABELS}")
print(f"  Test Images         : {TEST_IMAGES}")
print(f"  HF Repo             : {HF_REPO_ID}")

print("\nTraining Settings:")
print(f"  - Samples           : {MAX_TRAIN_SAMPLES}")
print(f"  - Epochs            : {NUM_EPOCHS}")
print(f"  - Effective Batch   : {BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS}")
print(f"  - Learning Rate     : {LEARNING_RATE}")
print(f"  - Est. Time         : ~1 hour on T4 GPU")

print("\nLoRA Settings:")
print(f"  - Rank: {LORA_R}, Alpha: {LORA_ALPHA}, Dropout: {LORA_DROPOUT}")

Configuration:
  Base Model          : Qwen/Qwen2.5-VL-7B-Instruct
  New Model           : Chartqwen
  Train Images        : /kaggle/input/graph-plots/Train/images
  Train Labels        : /kaggle/input/graph-plots/Train/labels
  Test Images         : /kaggle/input/graph-plots/Test/images
  HF Repo             : Sayeem26s/Chartqwen

Training Settings:
  - Samples           : 600
  - Epochs            : 1
  - Effective Batch   : 16
  - Learning Rate     : 0.0002
  - Est. Time         : ~1 hour on T4 GPU

LoRA Settings:
  - Rank: 32, Alpha: 64, Dropout: 0.05


## 4. Login to HuggingFace Hub

In [4]:
# Login to HuggingFace Hub
try:
    from kaggle_secrets import UserSecretsClient
    user_secrets = UserSecretsClient()
    HF_TOKEN = user_secrets.get_secret("HF_TOKEN")
    print("HF Token loaded from Kaggle Secrets")
except:
    HF_TOKEN = "YOUR_HF_TOKEN_HERE"  # Replace with your token
    print("Using hardcoded HF token (replace with your actual token)")

login(token=HF_TOKEN)
print(f"Logged in to HuggingFace Hub as: {HF_USERNAME}")

HF Token loaded from Kaggle Secrets
Logged in to HuggingFace Hub as: Sayeem26s


## 5. Load and Prepare Training Data

In [5]:
def load_label_file(json_path: str) -> List[Dict]:
    """Load label JSON file."""
    with open(json_path, 'r') as f:
        return json.load(f)

def get_image_filename(label_filename: str, images_dir: str) -> Optional[str]:
    """
    Find corresponding image file for a label.
    Label: xxxx.json -> Image: xxxx.png or xxxx.jpg
    """
    base_name = label_filename.replace('.json', '')
    
    for ext in ['.png', '.jpg', '.jpeg', '.PNG', '.JPG', '.JPEG']:
        img_path = os.path.join(images_dir, base_name + ext)
        if os.path.exists(img_path):
            return img_path
    return None

def create_training_sample(image_path: str, label_data: List[Dict]) -> Dict:
    """
    Create a training sample from image and labels.
    
    Input format (label_data):
    [
        {
            "label": {"lineName": "..."},
            "points": [
                {"x": 100, "y": 200, "topBarPixelDistance": 10, "bottomBarPixelDistance": 15, ...},
                ...
            ]
        }
    ]
    
    Output format:
    {
        "image_path": "path/to/image.png",
        "input_points": [{"x": 100, "y": 200}, ...],
        "output_points": [{"x": 100, "y": 200, "topBarPixelDistance": 10, "bottomBarPixelDistance": 15}, ...]
    }
    """
    all_input_points = []
    all_output_points = []
    
    for line_data in label_data:
        points = line_data.get('points', [])
        for pt in points:
            # Skip axis labels
            if pt.get('label', '') in ['xmin', 'xmax', 'ymin', 'ymax']:
                continue
            
            x = pt.get('x', 0)
            y = pt.get('y', 0)
            top_dist = pt.get('topBarPixelDistance', 0)
            bottom_dist = pt.get('bottomBarPixelDistance', 0)
            
            all_input_points.append({"x": round(x, 1), "y": round(y, 1)})
            all_output_points.append({
                "x": round(x, 1),
                "y": round(y, 1),
                "topBarPixelDistance": round(top_dist, 1),
                "bottomBarPixelDistance": round(bottom_dist, 1)
            })
    
    return {
        "image_path": image_path,
        "input_points": all_input_points,
        "output_points": all_output_points
    }

def load_all_training_data(labels_dir: str, images_dir: str, max_samples: Optional[int] = None) -> List[Dict]:
    """
    Load all training samples from labels directory.
    """
    samples = []
    label_files = sorted([f for f in os.listdir(labels_dir) if f.endswith('.json')])
    
    if max_samples:
        label_files = label_files[:max_samples]
    
    print(f"Loading {len(label_files)} training samples...")
    
    for label_file in tqdm(label_files):
        try:
            # Find image
            image_path = get_image_filename(label_file, images_dir)
            if not image_path:
                continue
            
            # Load labels
            label_path = os.path.join(labels_dir, label_file)
            label_data = load_label_file(label_path)
            
            # Create training sample
            sample = create_training_sample(image_path, label_data)
            
            # Only include samples with points
            if sample['input_points']:
                samples.append(sample)
                
        except Exception as e:
            print(f"Error loading {label_file}: {e}")
            continue
    
    print(f"Loaded {len(samples)} valid training samples")
    return samples

# Load training data
print("Loading training data...")
training_samples = load_all_training_data(TRAIN_LABELS, TRAIN_IMAGES, MAX_TRAIN_SAMPLES)

# Show sample
if training_samples:
    print(f"\nSample training data:")
    print(f"  Image: {training_samples[0]['image_path']}")
    print(f"  Num points: {len(training_samples[0]['input_points'])}")
    if training_samples[0]['input_points']:
        print(f"  First input: {training_samples[0]['input_points'][0]}")
        print(f"  First output: {training_samples[0]['output_points'][0]}")

Loading training data...
Loading 600 training samples...


100%|██████████| 600/600 [00:18<00:00, 31.71it/s]

Loaded 600 valid training samples

Sample training data:
  Image: /kaggle/input/graph-plots/Train/images/001f3fd0-23eb-4371-a52a-41b9b71c36bf.png
  Num points: 23
  First input: {'x': 177.0, 'y': 94.2}
  First output: {'x': 177.0, 'y': 94.2, 'topBarPixelDistance': 100.7, 'bottomBarPixelDistance': 52.7}





## 6. Load Model and Processor

In [6]:
# 4-bit Quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

print("Loading processor...")
processor = AutoProcessor.from_pretrained(
    BASE_MODEL,
    trust_remote_code=True,
)

# Set pad token
if processor.tokenizer.pad_token is None:
    processor.tokenizer.pad_token = processor.tokenizer.eos_token

print(f"Processor loaded!")

print("\nLoading model with 4-bit quantization...")
print("This may take 3-5 minutes...")

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    BASE_MODEL,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)

print(f"Model loaded!")

# Prepare model for k-bit training
model = prepare_model_for_kbit_training(model)

# Disable cache for training (required for gradient checkpointing)
model.config.use_cache = False

if torch.cuda.is_available():
    print(f"GPU Memory Used: {torch.cuda.memory_allocated() / 1e9:.2f} GB")

Loading processor...


preprocessor_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

chat_template.json: 0.00B [00:00, ?B/s]

The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. 


config.json: 0.00B [00:00, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Processor loaded!

Loading model with 4-bit quantization...
This may take 3-5 minutes...


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Downloading (incomplete total...): 0.00B [00:00, ?B/s]

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

Loading weights:   0%|          | 0/729 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/216 [00:00<?, ?B/s]

Model loaded!
GPU Memory Used: 2.67 GB


## 7. Configure LoRA

In [7]:
# Target modules for Qwen2.5-VL
# These are the key attention layers
target_modules = [
    "q_proj",
    "k_proj",
    "v_proj",
    "o_proj",
    "gate_proj",
    "up_proj",
    "down_proj",
]

# LoRA Configuration
lora_config = LoraConfig(
    r=LORA_R,
    lora_alpha=LORA_ALPHA,
    target_modules=target_modules,
    lora_dropout=LORA_DROPOUT,
    bias="none",
    task_type=TaskType.CAUSAL_LM,
)

# Apply LoRA
model = get_peft_model(model, lora_config)

# Print trainable parameters
model.print_trainable_parameters()

print(f"\nLoRA Configuration:")
print(f"  Rank (r): {LORA_R}")
print(f"  Alpha: {LORA_ALPHA}")
print(f"  Target modules: {target_modules}")

trainable params: 95,178,752 || all params: 8,387,345,408 || trainable%: 1.1348

LoRA Configuration:
  Rank (r): 32
  Alpha: 64
  Target modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj']


## 8. Create Training Prompts

In [8]:
SYSTEM_PROMPT = """You are a precise error bar detection system for scientific plots.
Given an image of a scientific plot and data point coordinates, detect the error bars.
For each point, output the pixel distance from the data point to the top and bottom of the error bar.
If no error bar exists for a point, output 0 for both distances."""

def create_input_prompt(input_points: List[Dict]) -> str:
    """
    Create the input prompt with data point coordinates.
    """
    points_str = json.dumps(input_points, indent=2)
    
    prompt = f"""Analyze this scientific plot image and detect error bars for the following data points:

{points_str}

For each point, measure:
- topBarPixelDistance: pixel distance from data point to top of error bar (0 if none)
- bottomBarPixelDistance: pixel distance from data point to bottom of error bar (0 if none)

Output as JSON array:
[
  {{"x": <x>, "y": <y>, "topBarPixelDistance": <top>, "bottomBarPixelDistance": <bottom>}}
]"""
    
    return prompt

def create_output_response(output_points: List[Dict]) -> str:
    """
    Create the expected output response.
    """
    return json.dumps(output_points, indent=2)

print("Prompt functions defined!")

# Test prompts
if training_samples:
    sample = training_samples[0]
    print("\nSample input prompt:")
    print("-" * 60)
    print(create_input_prompt(sample['input_points'][:2]))
    print("-" * 60)
    print("\nSample output response:")
    print(create_output_response(sample['output_points'][:2]))

Prompt functions defined!

Sample input prompt:
------------------------------------------------------------
Analyze this scientific plot image and detect error bars for the following data points:

[
  {
    "x": 177.0,
    "y": 94.2
  },
  {
    "x": 207.2,
    "y": 105.5
  }
]

For each point, measure:
- topBarPixelDistance: pixel distance from data point to top of error bar (0 if none)
- bottomBarPixelDistance: pixel distance from data point to bottom of error bar (0 if none)

Output as JSON array:
[
  {"x": <x>, "y": <y>, "topBarPixelDistance": <top>, "bottomBarPixelDistance": <bottom>}
]
------------------------------------------------------------

Sample output response:
[
  {
    "x": 177.0,
    "y": 94.2,
    "topBarPixelDistance": 100.7,
    "bottomBarPixelDistance": 52.7
  },
  {
    "x": 207.2,
    "y": 105.5,
    "topBarPixelDistance": 98.2,
    "bottomBarPixelDistance": 39.9
  }
]


## 9. Prepare Dataset for Training

In [9]:
def prepare_training_example(sample: Dict, processor) -> Dict:
    """
    Prepare a single training example with image and text.
    """
    # Load and resize image
    image = Image.open(sample['image_path']).convert('RGB')
    if max(image.size) > IMAGE_MAX_SIZE:
        ratio = IMAGE_MAX_SIZE / max(image.size)
        new_size = (int(image.size[0] * ratio), int(image.size[1] * ratio))
        image = image.resize(new_size, Image.BILINEAR)
    
    # Create prompts
    input_prompt = create_input_prompt(sample['input_points'])
    output_response = create_output_response(sample['output_points'])
    
    # Create conversation format
    messages = [
        {
            "role": "user",
            "content": [
                {"type": "image", "image": image},
                {"type": "text", "text": f"{SYSTEM_PROMPT}\n\n{input_prompt}"}
            ]
        },
        {
            "role": "assistant",
            "content": output_response
        }
    ]
    
    # Apply chat template
    text = processor.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=False
    )
    
    return {
        "text": text,
        "image": image,
        "image_path": sample['image_path']
    }

# Prepare training data
print("Preparing training examples...")
prepared_data = []

for i, sample in enumerate(tqdm(training_samples)):
    try:
        prepared = prepare_training_example(sample, processor)
        prepared_data.append(prepared)
    except Exception as e:
        print(f"Error preparing sample {i}: {e}")
        continue

print(f"\nPrepared {len(prepared_data)} training examples")

# Show sample
if prepared_data:
    print(f"\nSample text length: {len(prepared_data[0]['text'])} chars")
    print(f"Sample text preview:")
    print(prepared_data[0]['text'][:500])

Preparing training examples...


100%|██████████| 600/600 [00:10<00:00, 59.88it/s]


Prepared 600 training examples

Sample text length: 4298 chars
Sample text preview:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
<|vision_start|><|image_pad|><|vision_end|>You are a precise error bar detection system for scientific plots.
Given an image of a scientific plot and data point coordinates, detect the error bars.
For each point, output the pixel distance from the data point to the top and bottom of the error bar.
If no error bar exists for a point, output 0 for both distances.

Analyze this scientific plot image and detect error bars for





## 10. Custom Training Loop (for VLM)

In [10]:
from torch.utils.data import DataLoader
from torch.optim import AdamW
from transformers import get_linear_schedule_with_warmup
import random
from tqdm.auto import tqdm as tqdm_auto

def collate_fn(batch, processor):
    """
    Custom collate function for VLM training.
    """
    texts = [item['text'] for item in batch]
    images = [item['image'] for item in batch]

    inputs = processor(
        text=texts,
        images=images,
        padding=True,
        truncation=True,
        max_length=MAX_SEQ_LENGTH,
        return_tensors="pt"
    )

    inputs['labels'] = inputs['input_ids'].clone()
    return inputs


def train_vlm_lora(
    model,
    processor,
    train_data,
    num_epochs=NUM_EPOCHS,
    batch_size=BATCH_SIZE,
    learning_rate=LEARNING_RATE,
    gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
    save_steps=SAVE_STEPS,
):
    """
    Custom training loop for VLM with LoRA.
    """

    random.shuffle(train_data)

    optimizer = AdamW(model.parameters(), lr=learning_rate)

    total_steps = (len(train_data) // batch_size) * num_epochs // gradient_accumulation_steps
    warmup_steps = int(total_steps * WARMUP_RATIO)

    scheduler = get_linear_schedule_with_warmup(
        optimizer,
        num_warmup_steps=warmup_steps,
        num_training_steps=total_steps
    )

    print("\nTraining Configuration:")
    print(f"  Samples: {len(train_data)} | Epochs: {num_epochs} | Steps: {total_steps}")
    print(f"  Batch: {batch_size} x {gradient_accumulation_steps} = {batch_size * gradient_accumulation_steps}")
    print(f"  LR: {learning_rate} | Warmup: {warmup_steps} steps")

    model.train()
    global_step = 0

    epoch_progress = tqdm_auto(range(num_epochs), desc="Training Epochs", position=0)

    for epoch in epoch_progress:
        print("\n" + "=" * 60)
        print(f"Epoch {epoch + 1}/{num_epochs}")
        print("=" * 60)

        epoch_loss = 0.0
        num_batches = 0

        batch_progress = tqdm_auto(
            range(0, len(train_data), batch_size),
            desc=f"Epoch {epoch + 1} Batches",
            position=1,
            leave=False
        )

        for i in batch_progress:
            batch = train_data[i:i + batch_size]

            try:
                inputs = collate_fn(batch, processor)
                inputs = {k: v.to(model.device) for k, v in inputs.items()}

                outputs = model(**inputs)
                loss = outputs.loss / gradient_accumulation_steps
                loss.backward()

                epoch_loss += loss.item()
                num_batches += 1

                if num_batches % gradient_accumulation_steps == 0:
                    optimizer.step()
                    scheduler.step()
                    optimizer.zero_grad()
                    global_step += 1

                    current_loss = epoch_loss / num_batches
                    batch_progress.set_postfix({
                        'loss': f'{current_loss:.4f}',
                        'step': f'{global_step}/{total_steps}',
                        'lr': f'{scheduler.get_last_lr()[0]:.2e}'
                    })

                    if global_step % LOGGING_STEPS == 0:
                        print(
                            f"Step {global_step}/{total_steps} | "
                            f"Loss: {current_loss:.4f} | "
                            f"LR: {scheduler.get_last_lr()[0]:.2e}"
                        )

                    if global_step % save_steps == 0:
                        checkpoint_path = os.path.join(
                            CHECKPOINT_DIR, f"checkpoint-{global_step}"
                        )
                        model.save_pretrained(checkpoint_path)
                        print(f"Saved checkpoint: {checkpoint_path}")

                del inputs, outputs, loss
                if torch.cuda.is_available():
                    torch.cuda.empty_cache()

            except torch.cuda.OutOfMemoryError:
                print("OOM encountered, skipping batch")
                optimizer.zero_grad()
                if torch.cuda.is_available():
                    torch.cuda.empty_cache()
                continue

        avg_epoch_loss = epoch_loss / num_batches if num_batches > 0 else 0.0
        epoch_progress.set_postfix({'epoch_loss': f'{avg_epoch_loss:.4f}'})
        print(f"Epoch {epoch + 1} complete | Avg Loss: {avg_epoch_loss:.4f}")

    print("\n" + "=" * 60)
    print("TRAINING COMPLETE")
    print("=" * 60)

    return model

## 11. Train the Model

In [None]:
# Train
model = train_vlm_lora(
    model=model,
    processor=processor,
    train_data=prepared_data,
    num_epochs=NUM_EPOCHS,
    batch_size=BATCH_SIZE,
    learning_rate=LEARNING_RATE,
    gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
    save_steps=SAVE_STEPS,
)


Training Configuration:
  Samples: 600 | Epochs: 1 | Steps: 37
  Batch: 1 x 16 = 16
  LR: 0.0002 | Warmup: 1 steps


Training Epochs:   0%|          | 0/1 [00:00<?, ?it/s]


Epoch 1/1


Epoch 1 Batches:   0%|          | 0/600 [00:00<?, ?it/s]

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


## 12. Save Final Model

In [None]:
# Save final LoRA adapter
final_checkpoint_path = os.path.join(CHECKPOINT_DIR, "final_lora_adapter")

print(f"Saving LoRA adapter to: {final_checkpoint_path}")
model.save_pretrained(final_checkpoint_path)
processor.save_pretrained(final_checkpoint_path)

print(f"\nCheckpoint saved!")
print(f"Contents:")
for f in os.listdir(final_checkpoint_path):
    size = os.path.getsize(os.path.join(final_checkpoint_path, f)) / 1024
    print(f"  - {f} ({size:.1f} KB)")

## 13. Upload to HuggingFace Hub

In [None]:
# Create repository
api = HfApi()

try:
    create_repo(
        repo_id=HF_REPO_ID,
        repo_type="model",
        private=False,
        exist_ok=True,
    )
    print(f"Repository created/verified: {HF_REPO_ID}")
except Exception as e:
    print(f"Repo note: {e}")

# Push to Hub
print(f"\nUploading to HuggingFace Hub: {HF_REPO_ID}")

model.push_to_hub(
    repo_id=HF_REPO_ID,
    commit_message="Upload fine-tuned Qwen2.5-VL error bar detector",
)

processor.push_to_hub(
    repo_id=HF_REPO_ID,
    commit_message="Upload processor",
)

print(f"\n{'='*60}")
print(f"MODEL UPLOADED: https://huggingface.co/{HF_REPO_ID}")
print(f"{'='*60}")

---
# Part 2: Inference on Test Data
---

## 14. Load Fine-tuned Model for Inference

In [None]:
# Clear training memory
gc.collect()
if torch.cuda.is_available():
    torch.cuda.empty_cache()

# Set model to evaluation mode
model.eval()

print("Model ready for inference!")

## 15. Inference Function

In [None]:
def infer_error_bars(image_path: str, data_points: List[Dict]) -> Optional[Dict]:
    """
    Infer error bars for given data points in an image.
    
    Args:
        image_path: Path to the plot image
        data_points: List of {"x": float, "y": float}
    
    Returns:
        Dict with measurements or None if failed
    """
    try:
        # Load and resize image
        image = Image.open(image_path).convert('RGB')
        if max(image.size) > IMAGE_MAX_SIZE:
            ratio = IMAGE_MAX_SIZE / max(image.size)
            new_size = (int(image.size[0] * ratio), int(image.size[1] * ratio))
            image = image.resize(new_size, Image.BILINEAR)
        
        # Create prompt
        input_prompt = create_input_prompt(data_points)
        
        # Create messages
        messages = [
            {
                "role": "user",
                "content": [
                    {"type": "image", "image": image},
                    {"type": "text", "text": f"{SYSTEM_PROMPT}\n\n{input_prompt}"}
                ]
            }
        ]
        
        # Apply chat template
        text = processor.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True
        )
        
        # Process inputs
        inputs = processor(
            text=[text],
            images=[image],
            padding=True,
            return_tensors="pt"
        ).to(model.device)
        
        # Generate
        num_points = len(data_points)
        max_tokens = min(2048, max(512, num_points * 80))
        
        with torch.no_grad():
            generated_ids = model.generate(
                **inputs,
                max_new_tokens=max_tokens,
                do_sample=False,
                num_beams=1,
                pad_token_id=processor.tokenizer.pad_token_id,
            )
        
        # Decode
        generated_ids_trimmed = [
            out_ids[len(in_ids):]
            for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
        ]
        response = processor.batch_decode(
            generated_ids_trimmed,
            skip_special_tokens=True
        )[0]
        
        # Parse response
        result = parse_response(response, data_points)
        
        # Cleanup
        del inputs, generated_ids, image
        
        return result
        
    except Exception as e:
        print(f"Inference error: {e}")
        return None

def parse_response(response_text: str, original_points: List[Dict]) -> Optional[Dict]:
    """
    Parse model response to extract error bar measurements.
    """
    try:
        # Clean response
        cleaned = response_text.strip()
        
        # Remove markdown code blocks
        if '```json' in cleaned:
            start = cleaned.find('```json') + 7
            end = cleaned.find('```', start)
            if end > start:
                cleaned = cleaned[start:end].strip()
        elif '```' in cleaned:
            start = cleaned.find('```') + 3
            end = cleaned.find('```', start)
            if end > start:
                cleaned = cleaned[start:end].strip()
        
        # Find JSON
        if cleaned.startswith('['):
            json_str = cleaned
        else:
            start_idx = cleaned.find('[')
            end_idx = cleaned.rfind(']') + 1
            if start_idx >= 0 and end_idx > start_idx:
                json_str = cleaned[start_idx:end_idx]
            else:
                return None
        
        # Parse JSON
        parsed = json.loads(json_str)
        
        # Convert to standard format
        measurements = []
        for item in parsed:
            x = float(item.get('x', 0))
            y = float(item.get('y', 0))
            top_dist = float(item.get('topBarPixelDistance', 0))
            bottom_dist = float(item.get('bottomBarPixelDistance', 0))
            
            measurements.append({
                "data_point": {"x": x, "y": y},
                "upper_error_bar": {"x": x, "y": y - top_dist},
                "lower_error_bar": {"x": x, "y": y + bottom_dist},
                "topBarPixelDistance": top_dist,
                "bottomBarPixelDistance": bottom_dist
            })
        
        return {"measurements": measurements}
        
    except json.JSONDecodeError as e:
        print(f"JSON parse error: {e}")
        return None
    except Exception as e:
        print(f"Parse error: {e}")
        return None

print("Inference functions defined!")

## 16. Test on Sample Image

In [None]:
# Load test data
test_label_files = sorted([f for f in os.listdir(TEST_INPUT_LABELS) if f.endswith('.json')])[:1]

if test_label_files:
    test_file = test_label_files[0]
    print(f"Testing on: {test_file}\n")
    
    # Load input labels (x, y only)
    with open(os.path.join(TEST_INPUT_LABELS, test_file), 'r') as f:
        test_input = json.load(f)
    
    image_file = test_input['image_file']
    image_path = os.path.join(TEST_IMAGES, image_file)
    
    # Get data points
    data_points = []
    for line_data in test_input.get('data_points', []):
        for pt in line_data.get('points', []):
            data_points.append({"x": round(pt['x'], 1), "y": round(pt['y'], 1)})
    
    print(f"Image: {image_path}")
    print(f"Data points: {len(data_points)}")
    print(f"First point: {data_points[0]}")
    
    # Run inference
    result = infer_error_bars(image_path, data_points)
    
    if result and 'measurements' in result:
        print(f"\nInference successful!")
        print(f"Got {len(result['measurements'])} measurements")
        
        print("\nFirst 3 measurements:")
        for i, m in enumerate(result['measurements'][:3]):
            print(f"  [{i+1}] Point: ({m['data_point']['x']:.1f}, {m['data_point']['y']:.1f})")
            print(f"       Top: {m['topBarPixelDistance']:.1f}px, Bottom: {m['bottomBarPixelDistance']:.1f}px")
    else:
        print("Inference failed")

## 17. Process All Test Data

In [None]:
# Process all test files
all_test_files = sorted([f for f in os.listdir(TEST_INPUT_LABELS) if f.endswith('.json')])
print(f"Processing {len(all_test_files)} test files...\n")

all_predictions = {}
all_results = []
failed_count = 0
processed_count = 0

for i, test_file in enumerate(all_test_files):
    try:
        # Load input
        with open(os.path.join(TEST_INPUT_LABELS, test_file), 'r') as f:
            test_input = json.load(f)
        
        image_file = test_input['image_file']
        image_path = os.path.join(TEST_IMAGES, image_file)
        
        # Get all data points
        all_points = []
        for line_data in test_input.get('data_points', []):
            for pt in line_data.get('points', []):
                all_points.append({"x": round(pt['x'], 1), "y": round(pt['y'], 1)})
        
        # Run inference
        result = infer_error_bars(image_path, all_points)
        
        if result and 'measurements' in result:
            all_predictions[test_file] = {
                'image_file': image_file,
                'measurements': result['measurements']
            }
            processed_count += 1
        else:
            failed_count += 1
        
        # Progress
        if (i + 1) % 10 == 0:
            print(f"✓ Processed {i+1}/{len(all_test_files)} | Success: {processed_count} | Failed: {failed_count}")
        
        # Clear cache
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
            
    except Exception as e:
        failed_count += 1
        if failed_count <= 5:
            print(f"Error on {test_file}: {e}")

print(f"\n{'='*60}")
print(f"PROCESSING COMPLETE")
print(f"Processed: {processed_count}")
print(f"Failed: {failed_count}")
print(f"{'='*60}")

## 18. Evaluate Against Ground Truth

In [None]:
def calculate_metrics(predictions: Dict, ground_truth_dir: str) -> Dict:
    """
    Calculate evaluation metrics.
    """
    all_top_errors = []
    all_bottom_errors = []
    
    for json_file, pred_data in predictions.items():
        try:
            # Load ground truth
            gt_path = os.path.join(ground_truth_dir, json_file)
            if not os.path.exists(gt_path):
                continue
            
            with open(gt_path, 'r') as f:
                gt_data = json.load(f)
            
            # Collect all GT points
            gt_points = []
            for line_data in gt_data:
                for pt in line_data.get('points', []):
                    if pt.get('label', '') not in ['xmin', 'xmax', 'ymin', 'ymax']:
                        gt_points.append(pt)
            
            # Compare with predictions
            pred_measurements = pred_data['measurements']
            
            for pred_m, gt_pt in zip(pred_measurements, gt_points):
                pred_top = pred_m.get('topBarPixelDistance', 0)
                pred_bottom = pred_m.get('bottomBarPixelDistance', 0)
                gt_top = gt_pt.get('topBarPixelDistance', 0)
                gt_bottom = gt_pt.get('bottomBarPixelDistance', 0)
                
                all_top_errors.append(abs(pred_top - gt_top))
                all_bottom_errors.append(abs(pred_bottom - gt_bottom))
                
        except Exception as e:
            continue
    
    if not all_top_errors:
        return None
    
    all_mean_errors = [(t + b) / 2 for t, b in zip(all_top_errors, all_bottom_errors)]
    
    metrics = {
        'num_points': len(all_top_errors),
        'mean_top_error': np.mean(all_top_errors),
        'mean_bottom_error': np.mean(all_bottom_errors),
        'mean_overall_error': np.mean(all_mean_errors),
        'median_top_error': np.median(all_top_errors),
        'median_bottom_error': np.median(all_bottom_errors),
        'accuracy_5px': sum(1 for e in all_mean_errors if e <= 5) / len(all_mean_errors) * 100,
        'accuracy_10px': sum(1 for e in all_mean_errors if e <= 10) / len(all_mean_errors) * 100,
        'accuracy_20px': sum(1 for e in all_mean_errors if e <= 20) / len(all_mean_errors) * 100,
    }
    
    return metrics

# Calculate metrics
if all_predictions:
    metrics = calculate_metrics(all_predictions, TEST_GROUND_TRUTH)
    
    if metrics:
        print("\n" + "="*60)
        print("EVALUATION RESULTS")
        print("="*60)
        print(f"\nTotal Points Evaluated: {metrics['num_points']}")
        print(f"\nPixel Error:")
        print(f"  Mean Top Error: {metrics['mean_top_error']:.2f} px")
        print(f"  Mean Bottom Error: {metrics['mean_bottom_error']:.2f} px")
        print(f"  Mean Overall Error: {metrics['mean_overall_error']:.2f} px")
        print(f"\nAccuracy:")
        print(f"  Within 5px: {metrics['accuracy_5px']:.1f}%")
        print(f"  Within 10px: {metrics['accuracy_10px']:.1f}%")
        print(f"  Within 20px: {metrics['accuracy_20px']:.1f}%")
        print("="*60)
else:
    print("No predictions to evaluate")

## 19. Save Predictions

In [None]:
# Save predictions
OUTPUT_PREDICTIONS_DIR = "/kaggle/working/predictions"
os.makedirs(OUTPUT_PREDICTIONS_DIR, exist_ok=True)

print(f"Saving {len(all_predictions)} prediction files...\n")

for json_file, pred_data in all_predictions.items():
    try:
        # Convert to output format
        output = {
            "image_file": pred_data['image_file'],
            "model": "Chartqwen",
            "error_bars": [{
                "lineName": "",
                "points": [
                    {
                        "data_point": m['data_point'],
                        "upper_error_bar": m['upper_error_bar'],
                        "lower_error_bar": m['lower_error_bar']
                    }
                    for m in pred_data['measurements']
                ]
            }]
        }
        
        output_path = os.path.join(OUTPUT_PREDICTIONS_DIR, json_file)
        with open(output_path, 'w') as f:
            json.dump(output, f, indent=2)
            
    except Exception as e:
        print(f"Error saving {json_file}: {e}")

print(f"Predictions saved to: {OUTPUT_PREDICTIONS_DIR}")

# Create ZIP
import zipfile
from datetime import datetime

zip_path = f"/kaggle/working/qwen_vl_predictions_{datetime.now().strftime('%Y%m%d_%H%M%S')}.zip"

with zipfile.ZipFile(zip_path, 'w', zipfile.ZIP_DEFLATED) as zipf:
    for f in os.listdir(OUTPUT_PREDICTIONS_DIR):
        if f.endswith('.json'):
            zipf.write(os.path.join(OUTPUT_PREDICTIONS_DIR, f), f"predictions/{f}")

print(f"ZIP created: {zip_path}")

## Summary

This notebook demonstrated:

### Fine-tuning:
- **Model**: Qwen2.5-VL-7B-Instruct (Vision-Language Model)
- **Method**: LoRA with 4-bit quantization
- **Task**: Error bar detection in scientific plots
- **Input**: Image + data point coordinates (x, y)
- **Output**: Error bar distances (topBarPixelDistance, bottomBarPixelDistance)

### Data Format:
- **Training Labels**: JSON with `points` array containing `x`, `y`, `topBarPixelDistance`, `bottomBarPixelDistance`
- **Test Input**: JSON with `image_file` and `data_points` (x, y only)
- **Test Ground Truth**: Same format as training labels

### Key Features:
- Custom training loop for VLM with image inputs
- Memory-efficient training with gradient accumulation
- Checkpointing and HuggingFace Hub upload
- Comprehensive evaluation metrics

### Model Path:
- HuggingFace: `Sayeem26s/qwen-vl-errorbar-detector`