This is the official PyTorch implementation for our ICCV 2025 paper:
Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context
> Ge Zheng1,2* Jiaye Qian2* Jiajin Tang2 Sibei Yang1†
> 1School of Computer Science and Engineering, Sun Yat-sen University 2ShanghaiTech University
Large Vision-Language Models (LVLMs) have made significant progress in recent years but are also prone to hallucination issues. They exhibit more hallucinations in longer, free-form responses, often attributed to accumulated uncertainties. In this paper, we ask: Does increased hallucination result solely from length-induced errors, or is there a deeper underlying mechanism? After a series of preliminary experiments and findings, we suggest that the risk of hallucinations is not caused by length itself but by the increased reliance on context for coherence and completeness in longer responses. Building on these insights, we propose a novel “induce-detect-suppress” framework that actively induces hallucinations through deliberately designed contexts, leverages induced instances for early detection of high-risk cases, and ultimately suppresses potential object-level hallucinations during actual decoding. Our approach achieves consistent, significant improvements across all benchmarks, demonstrating its efficacy. The strong detection and improved hallucination mitigation not only validate our framework but, more importantly, re-validate our hypothesis on context. Rather than solely pursuing performance gains, this study aims to provide new insights and serves as a first step toward a deeper exploration of hallucinations in LVLMs’ longer responses.
Our code requires Python ≥ 3.9. When evaluating different models, we use specific versions of the transformers library for each model family. Due to API changes across different versions of transformers, using other versions may result in errors. Our code includes version assertions in certain modules to prevent unexpected behaviors. The versions are listed below:
| Model | transformers Version |
|---|---|
| LLaVA 1.5 | 4.37.2 |
| Qwen VL | 4.32.0 |
| MiniGPT-4 | 4.30.0 |
| Qwen2 VL | 4.45.0 |
| Janus Pro | 4.48.3 |
We recommend following the official installation instructions provided on each model's GitHub repository for setting up their dependencies.
Additionally, to evaluate CHAIR and AMBER, install the following:
pip install spacy nltk "numpy<2"
python -m spacy download en_core_web_lgYou need to specify the paths in playground/path_table.py, replacing the path/to/xxx placeholders with your actual paths.
To evaluate CHAIR and AMBER, you must download the COCO and AMBER datasets. Links are provided below:
For the COCO dataset, please specify the path to the val2014/ folder which contains the image files directly. For the AMBER dataset, please use the path to the data/ folder from the repository above. We assume that images for AMBER are under the data/image/ folder in AMBER root.
To evaluate MiniGPT-4, you need to specify the root path to MiniGPT-4 official repository, then set up the MiniGPT-4 first. We assume that the config file for MiniGPT-4 is located in the MiniGPT-4 repository under eval_configs/minigpt4_llama2_eval.yaml. Please ensure the correct configuration file path is specified based on the model architecture you are evaluating.
To evaluate on the CHAIR benchmark using greedy decoding, run:
python decontext.py \
--model [model] \
--method [method] \
--eval chair \
--fixed TrueThe --fixed True flag ensures the evaluation uses a fixed set of 500 questions rather than randomly sampling 500 questions.
To evaluate on the AMBER benchmark using greedy decoding, run:
python decontext.py \
--model [model] \
--method [method] \
--eval amber \
--split g \
--change-prompt TrueThe --split g flag specifies evaluation on the generative subset only.
The available options for [model] are:
llavafor LLaVA v1.5 7Bqwenvlfor Qwen VLminigpt4for MiniGPT 4qwen2vlfor Qwen2 VLjanusprofor Janus Pro
The available options for [method] are:
baselinefor the vanilla model evaluationvcdfor Visual Contrastive Decodingicdfor Instruction Contrastive Decodingpaifor Paying More Attention to Imagecodefor Countering Description Contrastive Decodinghaltrapperfor ours
You can add --sample to enable nucleus sampling, or --num_beams 5 to enable beam search.
At the end of evaluation, the results will be printed, and the detailed model inference outputs will be automatically saved in the result/ directory as a .jsonl file. A corresponding configuration record of this inference will also be saved as a YAML file with the suffix -config.yaml. If you want to manually evaluate existing .jsonl results only, run the following commands:
For CHAIR
python -m playground.eval \
[path/to/model-outputs.jsonl] \
--eval chair \
--fixed TrueFor AMBER
python -m playground.eval \
[path/to/model-outputs.jsonl] \
--eval amber \
--split g \
--change-prompt TrueOur approach involves two main steps: generating hallucination candidates for each image, and subsequently mitigating hallucinations. Since the candidate generation step is relatively slow and the results can be reused, our implementation automatically stores hallucination candidates for each image processed by a model in the cache/ folder. You can manually delete these cache files to clear the cached data.
What's more, due to this caching mechanism, inference on the same image will be significantly faster during subsequent runs compared to the initial processing.
Our implementation incorporates or modifies code from the following open-source repositories. We extend our sincere gratitude to the authors of these projects (listed in no particular order):
- junyangwang0410/AMBER
- IVY-LVLM/CODE
- hillzhang1999/ICD
- kinsDev/Janus-Pro
- haotian-liu/LLaVA
- Vision-CAIR/MiniGPT-4
- LALBJ/PAI
- QwenLM/Qwen-VL
- huggingface/transformers
- DAMO-NLP-SG/VCD
If you find our work useful, please cite us as:
@InProceedings{Zheng_2025_HalTrapper,
author = {Zheng, Ge and Qian, Jiaye and Tang, Jiajin and Yang, Sibei},
title = {Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2025},
pages = {4101-4113}
}