CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models [Project][arXiv]

Official implementation of 'CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models'.

📄 Table of contents

Abstract
Environment Setup
Default Setting
Project Structure
Evaluate Models on Benchmarks
Download Datasets

✏️ Abstract

Large Multi-modal Models (LMMs) have recently demonstrated remarkable abilities in visual context understanding and coherent response generation. However, alongside these advancements, the issue of hallucinations has emerged as a significant challenge, producing erroneous responses that are unrelated to the visual contents. In this paper, we introduce a novel contrastive-based decoding method, COuntering DEscription Contrastive Decoding (CODE), which leverages self-generated descriptions as contrasting references during the decoding phase of LMMs to address hallucination issues. CODE utilizes the comprehensive descriptions from model itself as visual counterpart to correct and improve response alignment with actual visual content. By dynamically adjusting the information flow and distribution of next-token predictions in the LMM's vocabulary, CODE enhances the coherence and informativeness of generated responses. Extensive experiments demonstrate that our method significantly reduces hallucinations and improves cross-modal consistency across various benchmarks and cutting-edge LMMs. Our method provides a simple yet effective decoding strategy that can be integrated to existing LMM frameworks without additional training.

👀 Environment Setup

conda create -n code -y python=3.9
conda activate code

# install packaging, pytorch
pip3 install packaging torch torchvision torchaudio

# install dependencies
pip install -r requirements.txt
pip install -e transformers

👏 Default Setting

Before executing the code, you must complete the YAML file below by specifying the folder paths and API keys.

# default_settings.yaml
settings:
  log_folder: <LOG FOLDER>
  data_folder: <DATA FOLDER>
  openai_api_key: <OPENAI API KEY>

🏠 Project Structure

Here is the project structure.

The project structure primarily includes four directories: benchmarks, file_utils, models, and tools. The file evaluate.py is used to perform evaluations on benchmarks.

.
├── benchmarks                   # 6 Evaluation Benchmarks (+Chair)
│   ├── __init__.py             
│   ├── base_eval_dataset.py
│   ├── coco-chair.py
│   ├── llavabench.py
│   ├── llava-qa90.py
│   ├── mmhalbench.py
│   ├── mmvp.py
│   ├── pope.py
│   └── realworld-qa.py
├── file_utils
│   ├── __pycache__
│   └── result_file_manage.py
├── huggingface_file           # modified huggingface code
│   └── modules
├── models                     # 6 Models
│   ├── __init__.py
│   ├── base_model.py
│   ├── contrastive_decoding
│   ├── emu2-chat.py
│   ├── internlm-xc2.py
│   ├── intern-vl.py
│   ├── llava-model-hf.py
│   ├── llava-next.py
│   └── yi-vl.py
├── default_settings.yaml
├── evaluate.py
├── README.md
├── requirements.txt
└── transformers               # modified transformers
    ├── README.md
    ├── setup.py
    └── src

✅ Benchmark Folder Structure

You must first prepare the benchmark dataset. According to the folder structure provided, please make sure to place the image files in the designated directories.

.
├── llavabench
│   ├── 001.jpg
│   ├── 002.jpg
│   └── ...
├── llava-qa90
│   ├── 000000020650.jpg
│   ├── 000000034096.jpg
│   └── ...
├── mmhalbench
│   ├── 10172500456_1f40b6bd38_o.jpg
│   ├── 11715451803_24861529ab_o.jpg
│   └── ...
├── mmvp
│   └── MMVP Images
│       ├── 1.jpg
│       ├── 2.jpg
│       └── ...
├── realworldqa
│   ├── annotations.json
│   └── images
│       ├── 0.jpg
│       ├── 1.jpg
│       └── ...
└── pope
    ├── COCO_val2014_000000001171.jpg
    ├── COCO_val2014_000000003845.jpg
    └── ...

🔨 Evaluate Models on Benchmarks

Run the evaluation code

# activate the environment
conda activate CODE

# evaluate <model_name> on <benchmark_name> with CODE DECODING 
python evaluate.py --models <model_name> --datasets <benchmark_name> --alt-text --contrastive --cd_alpha <cd_alpha>

Select result file

The list of log files will be displayed. You can start the evaluation from the beginning by selecting the new result file. If you select the existing file, you can continue evaluation.

<<<<<=====Result file=====>>>>>
Here's the list of result files for the given model and benchmark: 
1. New result file
2. llavabench_emu2-chat_results_0731_v1.jsonl
Enter the number of your selection(1-2):

Check the results

The results will be revealed in the console or you can check the log files from the log directory. In default_settings.yaml, you can designate log_folder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models [Project][arXiv]

📄 Table of contents

✏️ Abstract

👀 Environment Setup

👏 Default Setting

🏠 Project Structure

✅ Benchmark Folder Structure

🔨 Evaluate Models on Benchmarks

⬇️ Download Datasets

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
benchmarks		benchmarks
file_utils		file_utils
huggingface_file/modules/transformers_modules		huggingface_file/modules/transformers_modules
models		models
tools		tools
transformers		transformers
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
default_settings.yaml		default_settings.yaml
evaluate.py		evaluate.py
example.jpg		example.jpg
requirements.txt		requirements.txt

License

IVY-LVLM/CODE

Folders and files

Latest commit

History

Repository files navigation

CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models [Project][arXiv]

📄 Table of contents

✏️ Abstract

👀 Environment Setup

👏 Default Setting

🏠 Project Structure

✅ Benchmark Folder Structure

🔨 Evaluate Models on Benchmarks

⬇️ Download Datasets

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages