Official implementation of 'What if...?: Counterfactual Inception to Mitigate Hallucination Effects in Large Multimodal Models'.
- Summary
- Environment Setup
- Default Setting
- Project Structure
- Benchmark Folder Structure
- Generate Counterfactual Keywords with GPT-4V
- Evaluate Models on Benchmarks
- Add new prompts
- Download Datasets
This paper presents a way of enhancing the reliability of Large Multimodal Models (LMMs) in addressing hallucination effects, where models generate incorrect or unrelated responses. Without additional instruction tuning paradigm, we introduce Counterfactual Inception, a novel method that implants counterfactual thoughts into LMMs using carefully chosen, misaligned counterfactual keywords. This method is grounded in the concept of counterfactual thinking, a cognitive process where humans consider alternative realities and outcomes. By applying this human-like reasoning mechanism to LMMs, we aim to reduce hallucination effects and improve the models' trustworthiness. We also propose Dual-modality Verification Process (DVP), a rigorous framework for selecting optimal counterfactual keywords to trigger counterfactual thinking into LMMs, concurrently considering visual and linguistic context. Our extensive experiments across various LMMs, including both open-source and proprietary models, corroborate that our method significantly mitigates hallucination phenomena across different datasets.
conda create -n CFI -y python=3.9
conda activate CFI
# install pytorch
pip3 install torch torchvision torchaudio
# install dependencies
pip install -r requirements.txt
pip install -e .
Before executing the code, you must complete the YAML file below by specifying the folder paths and API keys.
# default_settings.yaml
settings:
log_folder: <LOG FOLDER>
counterfactual_folder: <COUNTERFACTUAL FOLDER>
data_folder: <DATA PARENT FOLDER>
openai_api_key: <OPENAI API KEY>
gemini_api_key: <GEMINI API KEY>
Here is the project structure.
The project structure primarily includes four directories: benchmarks, file_utils, models, and tools. The file evaluate.py is used to perform evaluations on benchmarks, while generate_counterfactual_keywords_gpt4v.py is designated for generating counterfactual keywords using gpt4v.
.
├── benchmarks # 5 Evaluation Benchmarks
│ ├── llavabench.py
│ ├── llava-qa90.py
│ ├── mmhalbench.py
│ ├── mmvp.py
│ └── pope.py
├── file_utils
│ ├── counterfactual_file_manage.py
│ ├── counterfactual_utilization_prompt_manage.py
│ └── result_file_manage.py
├── models # 6 Models
│ ├── cog-vlm.py
│ ├── gemini.py
│ ├── gpt4v.py
│ ├── llava-model-hf.py
│ ├── qwen-vl.py
│ └── yi-vl.py
├── tools
│ ├── clip_similarity.py
│ ├── nli_score.py
│ └── read_yaml.py
├── prompts
│ ├── counterfactual_inception
│ └── counterfactual_keywords_generation
├── default_settings.yaml # Default settings before run
├── evaluate.py # Evaluate models on Benchmarks
├── generate_counterfactual_keywords_gpt4v.py # Generate counterfactual keywords
├── LICENSE
├── requirements.txt
└── README.md
To generate and evaluate counterfactual keywords, you must first prepare the benchmark dataset. According to the folder structure provided, ensure to place the image files in the designated directories.
.
├── llavabench
│ ├── 001.jpg
│ ├── 002.jpg
│ └── ...
├── llava-qa90
│ ├── 000000020650.jpg
│ ├── 000000034096.jpg
│ └── ...
├── mmhalbench
│ ├── 10172500456_1f40b6bd38_o.jpg
│ ├── 11715451803_24861529ab_o.jpg
│ └── ...
├── mmvp
│ └── MMVP Images
│ ├── 1.jpg
│ ├── 2.jpg
│ └── ...
└── pope
├── COCO_val2014_000000001171.jpg
├── COCO_val2014_000000003845.jpg
└── ...
- Run the generation code
# activate the environment
conda activate CFI
# generate the counterfactual keywords of <benchmark_name> with <model_name>
python generate_counterfactual_keywords_gpt4v.py --models <model_name> --datasets <benchmark_name>
- Select Counterfactual prompt file
<<<<<=====Counterfactual Prompt File=====>>>>>
1: short_version.txt
2: detailed_version.txt
Please select the counterfactual prompt file: 1
- Select Counterfactual file
If you choose an existing file, you can proceed with the continuous generation of counterfactual keywords.
<<<<<=====CounterFactual file=====>>>>>
Here's the list of counterfactual files for the given model and benchmark:
1. mmhalbench_openai-gpt4_counterfactuals_0221_v1.jsonl
2. Create a new counterfactual file
3. Get counterfactual file of other model
===========================================================================
Enter the number of your selection(1-3): 1
- Run the evaluation code
# activate the environment
conda activate CFI
# evaluate <model_name> on <benchmark_name>
python evaluate.py --models <model_name> --datasets <benchmark_name>
# evaluate <model_name> on <benchmark_name> with counterfactual inception
python evaluate.py --models <model_name> --datasets <benchmark_name> --counterfactual
- Select Counterfactual keyword file
The list currently displays only the counterfactual keywords generated by the model being evaluated.
To select counterfactual keywords created by a different model, choose option '2. Get counterfactual file of other model'.
<<<<<=====CounterFactual file=====>>>>>
Here's the list of counterfactual files for the given model and benchmark:
1. mmhalbench_gpt4v_counterfactuals_0221_v1.jsonl
2. Get counterfactual file of other model
===========================================================================
Enter the number of your selection(1-2): 1
- Select Result file to record the evaluation results
If you choose existing file, you can record continuously from the last record of the file.
<<<<<=====Result file=====>>>>>
Here's the list of result files for the given model and benchmark:
1. New result file
2. mmhalbench_gpt4v_results_0314_v1_cf.jsonl
3. mmhalbench_gpt4v_results_0314_v2_cf.jsonl
Enter the number of your selection(1-3): 1
- Select Counterfactual prompt file
<<<<<=====Counterfactual Prompt File=====>>>>>
Here's the list of counterfactual prompt files:
1. long_prompt.txt
2. short_prompt.txt
Enter the number of your selection(1-2): 1
You can add prompts for Counterfactual Inception and Counterfactual keyword generation.
For Counterfactual Inception, you can add a new prompt to the txt file located in 'prompts/counterfactual_inception'. Like the below example, you need to include placeholders for the counterfactual keywords and task prompt, denoted as {counterfactual_keyword} and {text_prompt}, respectively.
# prompts/counterfactual_inception/short_prompt.txt
Please use counterfactual keywords that are different from the facts as a guide to understand the image well. Then, answer the questions.
Counterfactual keywords: {counterfactual_keyword}.
Question: {text_prompt}
For Counterfactual keyword generation, you can add new prompt in txt file at 'prompts/counterfactual_keywords_generation'.