Tell Model Where to Look: Mitigating Hallucinations in MLLMs by Vision-Guided Attention

Under Construction ...

Tell Model Where to Look: Mitigating Hallucinations in MLLMs by Vision-Guided Attention

We propose Vision-Guided Attention, a method that guides visual attention by visual grounding.

Setup

Environment

conda create -n vga -y python=3.10
conda activate vga

pip install -r requirements.txt

Note: The dependencies are referred to LLaVA-v1.5. For LLaVA-Next and Qwen2.5-VL-Instruct, you can also easily set up the environment by following the instructions from their official repositories.

Datasets

All benchmarks need to be processed into structurally consistent JSON files.

Some samples could be found in data/samples.json.

Evaluate VGA

Quick start

We developed a shell script scripts/all.sh that can execute benchmarks end-to-end.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
eval		eval
llava		llava
llava_next		llava_next
qwen2_5_vl		qwen2_5_vl
scripts		scripts
vcd_utils		vcd_utils
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Under Construction ...

Tell Model Where to Look: Mitigating Hallucinations in MLLMs by Vision-Guided Attention

Setup

Environment

Datasets

Evaluate VGA

Quick start

About

Uh oh!

Releases

Packages

Languages

beta-nlp/VGA

Folders and files

Latest commit

History

Repository files navigation

Under Construction ...

Tell Model Where to Look: Mitigating Hallucinations in MLLMs by Vision-Guided Attention

Setup

Environment

Datasets

Evaluate VGA

Quick start

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages