Overthinking Causes Hallucination: Tracing Confounder Propagation in Vision Language Models

Overview

Official implementation for our CVPR 2026 paper "Overthinking Causes Hallucination: Tracing Confounder Propagation in Vision Language Models".

Our key finding: Vision-Language Models can "overthink" and hallucinate. As VLMs process an image across decoder layers, they keep revising their belliefs about the visual content. When an incorrect hypothesis emerges, it can act as a confounder that propagates through subsequent layers, eventually leading to a hallucinated answer. We introduce an Overthinking Score that captures this internal instability and helps detect hallucinations.

The pipeline supports three VLMs out of the box: LLaVA-1.5, Qwen3-VL, and Gemma3.

Method

For each generated token, we probe the model's internal states and extract three families of features:

Layer-wise entropy (H_i) — uncertainty of the next-token distribution at each transformer layer
Image attention (IA_i) — how strongly the last token attends to visual tokens at each layer
Text attention (TA_i) — how strongly the last token attends to non-visual (text) tokens at each layer
Overthinking score — a composite metric combining mean entropy and prediction diversity across layers

These features are used to train a binary classifier (Logistic Regression, Gradient Boosting, or MLP) to distinguish grounded tokens (label 0) from hallucinated tokens (label 1).

Repository Structure

.
├── feature_extract.py        # Feature extraction pipeline (LLaVA, Qwen3, Gemma3)
├── train_evaluate.py         # Classifier training & evaluation (LR, GB, MLP)
├── Dockerfile                # Docker environment (NVIDIA PyTorch 23.12 base)
├── docker_run.sh             # Launch script with GPU support + Jupyter
├── requirements.txt          # Python dependencies
└── data/
    ├── LLaVA-1.5/
    │   ├── llava_train.jsonl
    │   └── llava_test.jsonl
    ├── Qwen3/
    │   ├── qwen_train.jsonl
    │   └── qwen_test.jsonl
    └── gemma/
        ├── gemma_train.jsonl
        └── gemma_test.jsonl

Environment Setup

Two Dockerfiles are provided depending on the target model:

Dockerfile	Models	Base Image
`Dockerfile`	LLaVA-1.5	`nvcr.io/nvidia/pytorch:23.12-py3`
`Dockerfile_new_transformer`	Qwen3-VL, Gemma3	`nvcr.io/nvidia/pytorch:23.12-py3` + PyTorch 2.6, Transformers 4.57

Qwen3-VL and Gemma3 require a newer transformers library (4.57) and updated PyTorch (2.6.0), which are installed in Dockerfile_new_transformer.

Build

# For LLaVA-1.5
docker build -f Dockerfile -t detect .

# For Qwen3-VL or Gemma3
docker build -f Dockerfile_new_transformer -t detect_new .

Launch

bash docker_run.sh

The script starts a container with full GPU access, mounts your data at /workspace/data and your code at /workspace/code, and launches a Jupyter Notebook server on port 8788.

Data Format

Each .jsonl file contains one JSON object per line with the following fields:

Field	Type	Description
`image_id`	str	Unique identifier for the image
`image`	str	Filename of the image (relative to `image_dir`)
`description`	str	Caption/description generated by the VLM
`grounded_tokens`	list[str]	Tokens that are visually grounded
`hallucinated_tokens`	list[str]	Tokens that are hallucinated
`annotations`	any	Ground-truth annotations
`captions`	any	Reference captions

Images are expected in a directory such as /workspace/data/COCODataset/val2014/val2014.

Usage

Step 1: Feature Extraction

Extract internal model features for each token in the dataset:

python feature_extract.py \
  --input_data_path ./data/LLaVA-1.5/llava_train.jsonl \
  --output_data_path ./data/LLaVA-1.5/train_features.csv \
  --image_dir /path/to/COCO/val2014 \
  --model_path /path/to/llava-v1.5-7b

For Qwen3-VL:

python feature_extract.py \
  --input_data_path ./data/Qwen3/qwen_train.jsonl \
  --output_data_path ./data/Qwen3/train_features.csv \
  --image_dir /path/to/COCO/val2014 \
  --model_path /path/to/Qwen3-VL

For Gemma3:

python feature_extract.py \
  --input_data_path ./data/gemma/gemma_train.jsonl \
  --output_data_path ./data/gemma/train_features.csv \
  --image_dir /path/to/COCO/val2014 \
  --model_path /path/to/gemma3

Arguments:

Argument	Default	Description
`--input_data_path`	`./data/LLaVA-1.5/train_migrate_2.jsonl`	Path to input JSONL file
`--output_data_path`	`./data/LLaVA-1.5/train_features.csv`	Path to save extracted features (CSV)
`--image_dir`	`/workspace/data/COCODataset/val2014/val2014`	Directory containing COCO images
`--model_path`	`/workspace/data/pretrained/llava-v1.5-7b`	Path to pre-trained VLM checkpoint

Step 2: Train & Evaluate Classifier

Train a hallucination detector on the extracted features:

python train_evaluate.py \
  --train_path ./data/LLaVA-1.5/train_features.csv \
  --test_path  ./data/LLaVA-1.5/test_features.csv \
  --model_type LR \
  --output_csv ./results/metrics.csv \
  --model_save_dir ./saved_models

Arguments:

Argument	Default	Description
`--train_path`	`./data/LLaVA-1.5/train.csv`	Path to training features CSV
`--test_path`	`./data/LLaVA-1.5/test.csv`	Path to test features CSV
`--model_type`	`LR`	Classifier type: `LR`, `GB`, or `MLP`
`--output_csv`	`./results/metrics.csv`	Where to append test metrics
`--model_save_dir`	`./saved_models`	Directory to save the best model

Supported classifiers:

LR — Logistic Regression
GB — Gradient Boosting
MLP — Multi-Layer Perceptron

All classifiers are validated on a held-out 10% split of the training data. The best model and threshold are saved to model_save_dir.

Outputs

Feature CSV — one row per (image, token, position) triple, with entropy, overthinking score, attention, and top-k token columns
./results/metrics.csv — appended with test-set metrics (accuracy, macro F1, AUC, AP, per-class precision/recall/F1) and best hyperparameters for each run
./saved_models/best_<MODEL>.pkl — serialised best classifier (and threshold for GB)

Supported Models

Model	Identifier (in path)	Notes
LLaVA-1.5	`llava`	Requires `./models/LLaVA` submodule
Qwen3-VL	`qwen`	Requires `transformers >= 4.x` with `Qwen3VLForConditionalGeneration`
Gemma3	`gemma`	Requires `transformers >= 4.x` with `Gemma3ForConditionalGeneration`

The model type is inferred automatically from the model path.

Citation

If you find this work useful, please cite our paper:

@inproceedings{overthinking2026cvpr,
  title     = {Overthinking Causes Hallucination: Tracing Confounder Propagation in Vision Language Models},
  booktitle = {Findings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026},
}

License

This project is released for research purposes. Please refer to the respective model licenses (LLaVA, Qwen3, Gemma3) and the COCO dataset license for usage restrictions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overthinking Causes Hallucination: Tracing Confounder Propagation in Vision Language Models

Overview

Method

Repository Structure

Environment Setup

Build

Launch

Data Format

Usage

Step 1: Feature Extraction

Step 2: Train & Evaluate Classifier

Outputs

Supported Models

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
models/LLaVA		models/LLaVA
Dockerfile		Dockerfile
Dockerfile_new_transformer		Dockerfile_new_transformer
README.md		README.md
docker_run.sh		docker_run.sh
extract_features.py		extract_features.py
requirements.txt		requirements.txt
train_evaluate.py		train_evaluate.py

Folders and files

Latest commit

History

Repository files navigation

Overthinking Causes Hallucination: Tracing Confounder Propagation in Vision Language Models

Overview

Method

Repository Structure

Environment Setup

Build

Launch

Data Format

Usage

Step 1: Feature Extraction

Step 2: Train & Evaluate Classifier

Outputs

Supported Models

Citation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages