Skip to content

abinshoby/Overthinking

Repository files navigation

Overthinking Causes Hallucination: Tracing Confounder Propagation in Vision Language Models


Overview

Official implementation for our CVPR 2026 paper "Overthinking Causes Hallucination: Tracing Confounder Propagation in Vision Language Models".

Our key finding: Vision-Language Models can "overthink" and hallucinate. As VLMs process an image across decoder layers, they keep revising their belliefs about the visual content. When an incorrect hypothesis emerges, it can act as a confounder that propagates through subsequent layers, eventually leading to a hallucinated answer. We introduce an Overthinking Score that captures this internal instability and helps detect hallucinations.

The pipeline supports three VLMs out of the box: LLaVA-1.5, Qwen3-VL, and Gemma3.


Method

For each generated token, we probe the model's internal states and extract three families of features:

  • Layer-wise entropy (H_i) — uncertainty of the next-token distribution at each transformer layer
  • Image attention (IA_i) — how strongly the last token attends to visual tokens at each layer
  • Text attention (TA_i) — how strongly the last token attends to non-visual (text) tokens at each layer
  • Overthinking score — a composite metric combining mean entropy and prediction diversity across layers

These features are used to train a binary classifier (Logistic Regression, Gradient Boosting, or MLP) to distinguish grounded tokens (label 0) from hallucinated tokens (label 1).


Repository Structure

.
├── feature_extract.py        # Feature extraction pipeline (LLaVA, Qwen3, Gemma3)
├── train_evaluate.py         # Classifier training & evaluation (LR, GB, MLP)
├── Dockerfile                # Docker environment (NVIDIA PyTorch 23.12 base)
├── docker_run.sh             # Launch script with GPU support + Jupyter
├── requirements.txt          # Python dependencies
└── data/
    ├── LLaVA-1.5/
    │   ├── llava_train.jsonl
    │   └── llava_test.jsonl
    ├── Qwen3/
    │   ├── qwen_train.jsonl
    │   └── qwen_test.jsonl
    └── gemma/
        ├── gemma_train.jsonl
        └── gemma_test.jsonl

Environment Setup

Two Dockerfiles are provided depending on the target model:

Dockerfile Models Base Image
Dockerfile LLaVA-1.5 nvcr.io/nvidia/pytorch:23.12-py3
Dockerfile_new_transformer Qwen3-VL, Gemma3 nvcr.io/nvidia/pytorch:23.12-py3 + PyTorch 2.6, Transformers 4.57

Qwen3-VL and Gemma3 require a newer transformers library (4.57) and updated PyTorch (2.6.0), which are installed in Dockerfile_new_transformer.

Build

# For LLaVA-1.5
docker build -f Dockerfile -t detect .

# For Qwen3-VL or Gemma3
docker build -f Dockerfile_new_transformer -t detect_new .

Launch

bash docker_run.sh

The script starts a container with full GPU access, mounts your data at /workspace/data and your code at /workspace/code, and launches a Jupyter Notebook server on port 8788.


Data Format

Each .jsonl file contains one JSON object per line with the following fields:

Field Type Description
image_id str Unique identifier for the image
image str Filename of the image (relative to image_dir)
description str Caption/description generated by the VLM
grounded_tokens list[str] Tokens that are visually grounded
hallucinated_tokens list[str] Tokens that are hallucinated
annotations any Ground-truth annotations
captions any Reference captions

Images are expected in a directory such as /workspace/data/COCODataset/val2014/val2014.


Usage

Step 1: Feature Extraction

Extract internal model features for each token in the dataset:

python feature_extract.py \
  --input_data_path ./data/LLaVA-1.5/llava_train.jsonl \
  --output_data_path ./data/LLaVA-1.5/train_features.csv \
  --image_dir /path/to/COCO/val2014 \
  --model_path /path/to/llava-v1.5-7b

For Qwen3-VL:

python feature_extract.py \
  --input_data_path ./data/Qwen3/qwen_train.jsonl \
  --output_data_path ./data/Qwen3/train_features.csv \
  --image_dir /path/to/COCO/val2014 \
  --model_path /path/to/Qwen3-VL

For Gemma3:

python feature_extract.py \
  --input_data_path ./data/gemma/gemma_train.jsonl \
  --output_data_path ./data/gemma/train_features.csv \
  --image_dir /path/to/COCO/val2014 \
  --model_path /path/to/gemma3

Arguments:

Argument Default Description
--input_data_path ./data/LLaVA-1.5/train_migrate_2.jsonl Path to input JSONL file
--output_data_path ./data/LLaVA-1.5/train_features.csv Path to save extracted features (CSV)
--image_dir /workspace/data/COCODataset/val2014/val2014 Directory containing COCO images
--model_path /workspace/data/pretrained/llava-v1.5-7b Path to pre-trained VLM checkpoint

Step 2: Train & Evaluate Classifier

Train a hallucination detector on the extracted features:

python train_evaluate.py \
  --train_path ./data/LLaVA-1.5/train_features.csv \
  --test_path  ./data/LLaVA-1.5/test_features.csv \
  --model_type LR \
  --output_csv ./results/metrics.csv \
  --model_save_dir ./saved_models

Arguments:

Argument Default Description
--train_path ./data/LLaVA-1.5/train.csv Path to training features CSV
--test_path ./data/LLaVA-1.5/test.csv Path to test features CSV
--model_type LR Classifier type: LR, GB, or MLP
--output_csv ./results/metrics.csv Where to append test metrics
--model_save_dir ./saved_models Directory to save the best model

Supported classifiers:

  • LR — Logistic Regression
  • GB — Gradient Boosting
  • MLP — Multi-Layer Perceptron

All classifiers are validated on a held-out 10% split of the training data. The best model and threshold are saved to model_save_dir.


Outputs

  • Feature CSV — one row per (image, token, position) triple, with entropy, overthinking score, attention, and top-k token columns
  • ./results/metrics.csv — appended with test-set metrics (accuracy, macro F1, AUC, AP, per-class precision/recall/F1) and best hyperparameters for each run
  • ./saved_models/best_<MODEL>.pkl — serialised best classifier (and threshold for GB)

Supported Models

Model Identifier (in path) Notes
LLaVA-1.5 llava Requires ./models/LLaVA submodule
Qwen3-VL qwen Requires transformers >= 4.x with Qwen3VLForConditionalGeneration
Gemma3 gemma Requires transformers >= 4.x with Gemma3ForConditionalGeneration

The model type is inferred automatically from the model path.


Citation

If you find this work useful, please cite our paper:

@inproceedings{overthinking2026cvpr,
  title     = {Overthinking Causes Hallucination: Tracing Confounder Propagation in Vision Language Models},
  booktitle = {Findings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026},
}

License

This project is released for research purposes. Please refer to the respective model licenses (LLaVA, Qwen3, Gemma3) and the COCO dataset license for usage restrictions.

About

Overthinking Causes Hallucination: Tracing Confounder Propagation in Vision Language Models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors