What if...?: Counterfactual Inception to Mitigate Hallucination Effects in Large Multimodal

Official implementation of 'What if...?: Counterfactual Inception to Mitigate Hallucination Effects in Large Multimodal Models'.

📄 Table of contents

Summary
Environment Setup
Default Setting
Project Structure
Benchmark Folder Structure
Generate Counterfactual Keywords with GPT-4V
Evaluate Models on Benchmarks
Add new prompts
Download Datasets

✏️ Summary

This paper presents a way of enhancing the reliability of Large Multimodal Models (LMMs) in addressing hallucination effects, where models generate incorrect or unrelated responses. Without additional instruction tuning paradigm, we introduce Counterfactual Inception, a novel method that implants counterfactual thoughts into LMMs using carefully chosen, misaligned counterfactual keywords. This method is grounded in the concept of counterfactual thinking, a cognitive process where humans consider alternative realities and outcomes. By applying this human-like reasoning mechanism to LMMs, we aim to reduce hallucination effects and improve the models' trustworthiness. We also propose Dual-modality Verification Process (DVP), a rigorous framework for selecting optimal counterfactual keywords to trigger counterfactual thinking into LMMs, concurrently considering visual and linguistic context. Our extensive experiments across various LMMs, including both open-source and proprietary models, corroborate that our method significantly mitigates hallucination phenomena across different datasets.

👀 Environment Setup

conda create -n CFI -y python=3.9
conda activate CFI

# install pytorch
pip3 install torch torchvision torchaudio

# install dependencies
pip install -r requirements.txt
pip install -e .

👏 Default Setting

Before executing the code, you must complete the YAML file below by specifying the folder paths and API keys.

# default_settings.yaml
settings:
  log_folder: <LOG FOLDER>
  counterfactual_folder: <COUNTERFACTUAL FOLDER>
  data_folder: <DATA PARENT FOLDER>
  openai_api_key: <OPENAI API KEY>
  gemini_api_key: <GEMINI API KEY>

🏠 Project Structure

Here is the project structure.

The project structure primarily includes four directories: benchmarks, file_utils, models, and tools. The file evaluate.py is used to perform evaluations on benchmarks, while generate_counterfactual_keywords_gpt4v.py is designated for generating counterfactual keywords using gpt4v.

.
├── benchmarks                # 5 Evaluation Benchmarks
│   ├── llavabench.py
│   ├── llava-qa90.py
│   ├── mmhalbench.py
│   ├── mmvp.py
│   └── pope.py
├── file_utils
│   ├── counterfactual_file_manage.py
│   ├── counterfactual_utilization_prompt_manage.py
│   └── result_file_manage.py
├── models                    # 6 Models
│   ├── cog-vlm.py
│   ├── gemini.py
│   ├── gpt4v.py
│   ├── llava-model-hf.py
│   ├── qwen-vl.py
│   └── yi-vl.py
├── tools
│   ├── clip_similarity.py
│   ├── nli_score.py
│   └── read_yaml.py
├── prompts
│   ├── counterfactual_inception
│   └── counterfactual_keywords_generation
├── default_settings.yaml                         # Default settings before run
├── evaluate.py                                   # Evaluate models on Benchmarks
├── generate_counterfactual_keywords_gpt4v.py     # Generate counterfactual keywords
├── LICENSE
├── requirements.txt
└── README.md

✅ Benchmark Folder Structure

To generate and evaluate counterfactual keywords, you must first prepare the benchmark dataset. According to the folder structure provided, ensure to place the image files in the designated directories.

.
├── llavabench
│   ├── 001.jpg
│   ├── 002.jpg
│   └── ...
├── llava-qa90
│   ├── 000000020650.jpg
│   ├── 000000034096.jpg
│   └── ...
├── mmhalbench
│   ├── 10172500456_1f40b6bd38_o.jpg
│   ├── 11715451803_24861529ab_o.jpg
│   └── ...
├── mmvp
│   └── MMVP Images
│       ├── 1.jpg
│       ├── 2.jpg
│       └── ...
└── pope
    ├── COCO_val2014_000000001171.jpg
    ├── COCO_val2014_000000003845.jpg
    └── ...

🔑 Generate Counterfactual Keywords with GPT-4V

Run the generation code

# activate the environment
conda activate CFI

# generate the counterfactual keywords of <benchmark_name> with <model_name>
python generate_counterfactual_keywords_gpt4v.py --models <model_name> --datasets <benchmark_name>

Select Counterfactual prompt file

<<<<<=====Counterfactual Prompt File=====>>>>>
1: short_version.txt
2: detailed_version.txt
Please select the counterfactual prompt file: 1

Select Counterfactual file

If you choose an existing file, you can proceed with the continuous generation of counterfactual keywords.

<<<<<=====CounterFactual file=====>>>>>
Here's the list of counterfactual files for the given model and benchmark: 
1. mmhalbench_openai-gpt4_counterfactuals_0221_v1.jsonl
2. Create a new counterfactual file
3. Get counterfactual file of other model
===========================================================================
Enter the number of your selection(1-3): 1

🔨 Evaluate Models on Benchmarks

Run the evaluation code

# activate the environment
conda activate CFI

# evaluate <model_name> on <benchmark_name>
python evaluate.py --models <model_name> --datasets <benchmark_name>

# evaluate <model_name> on <benchmark_name> with counterfactual inception
python evaluate.py --models <model_name> --datasets <benchmark_name> --counterfactual

Select Counterfactual keyword file

The list currently displays only the counterfactual keywords generated by the model being evaluated.

To select counterfactual keywords created by a different model, choose option '2. Get counterfactual file of other model'.

<<<<<=====CounterFactual file=====>>>>>
Here's the list of counterfactual files for the given model and benchmark: 
1. mmhalbench_gpt4v_counterfactuals_0221_v1.jsonl
2. Get counterfactual file of other model
===========================================================================
Enter the number of your selection(1-2): 1

Select Result file to record the evaluation results

If you choose existing file, you can record continuously from the last record of the file.

<<<<<=====Result file=====>>>>>
Here's the list of result files for the given model and benchmark: 
1. New result file
2. mmhalbench_gpt4v_results_0314_v1_cf.jsonl
3. mmhalbench_gpt4v_results_0314_v2_cf.jsonl
Enter the number of your selection(1-3): 1

Select Counterfactual prompt file

<<<<<=====Counterfactual Prompt File=====>>>>>
Here's the list of counterfactual prompt files: 
1. long_prompt.txt
2. short_prompt.txt
Enter the number of your selection(1-2): 1

➕ Add new prompts

You can add prompts for Counterfactual Inception and Counterfactual keyword generation.

For Counterfactual Inception, you can add a new prompt to the txt file located in 'prompts/counterfactual_inception'. Like the below example, you need to include placeholders for the counterfactual keywords and task prompt, denoted as {counterfactual_keyword} and {text_prompt}, respectively.

# prompts/counterfactual_inception/short_prompt.txt
Please use counterfactual keywords that are different from the facts as a guide to understand the image well. Then, answer the questions.
Counterfactual keywords: {counterfactual_keyword}.
Question: {text_prompt}

For Counterfactual keyword generation, you can add new prompt in txt file at 'prompts/counterfactual_keywords_generation'.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What if...?: Counterfactual Inception to Mitigate Hallucination Effects in Large Multimodal

📄 Table of contents

✏️ Summary

👀 Environment Setup

👏 Default Setting

🏠 Project Structure

✅ Benchmark Folder Structure

🔑 Generate Counterfactual Keywords with GPT-4V

🔨 Evaluate Models on Benchmarks

➕ Add new prompts

⬇️ Download Datasets

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.device_map		.device_map
benchmarks		benchmarks
file_utils		file_utils
models		models
prompts		prompts
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
default_settings.yaml		default_settings.yaml
evaluate.py		evaluate.py
generate_counterfactual_keywords_gpt4v.py		generate_counterfactual_keywords_gpt4v.py
requirements.txt		requirements.txt

License

IVY-LVLM/Counterfactual-Inception

Folders and files

Latest commit

History

Repository files navigation

What if...?: Counterfactual Inception to Mitigate Hallucination Effects in Large Multimodal

📄 Table of contents

✏️ Summary

👀 Environment Setup

👏 Default Setting

🏠 Project Structure

✅ Benchmark Folder Structure

🔑 Generate Counterfactual Keywords with GPT-4V

🔨 Evaluate Models on Benchmarks

➕ Add new prompts

⬇️ Download Datasets

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages