Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hello, this is a very good job. May I ask, which specific file is HALC in? Can readme write it more clearly? #2

Open
qppwdd0324 opened this issue Apr 16, 2024 · 3 comments

Comments

@qppwdd0324
Copy link

No description provided.

@BillChan226
Copy link
Owner

Hi,

Thanks for your interest in this project!!

Sorry if the Readme is not written clear enough, here is a snippet from Readme which describes how to run caption generation for CHAIR and POPE.

🪑 Running CHAIR evaluation for LVLMs object hallucination

Following Evaluating Object Hallucination in Large Vision-Language Models, we used "Please describe this image in detail." as the prompt to query LVLM for captions of the 500 images randomly sampled from COCO 2014 Val datast. Under root directory, run

python run_scripts/caption_generation.py --model [LVLM Backbone] --data_path [COCO_DIR] -d [Decoding Strategy] --num_samples 500 --seed [SEED] --gpu-id [GPU_IDs] --output_dir ./generated_captions/ --debugging 1

--debugging 1 will print the intermediate hallucination correction process of HALC.

🤵‍♂️ Running POPE evaluation for LVLMs object hallucination

Since OPOPE evaluates directly based on the caption generated for each image, it follows the caption generation procedure for CHAIR and differs in the subsequent metric calculation. To collect samples for the conventional POPE evaluation, under root directory, run

python run_scripts/pope_eval.py --model [LVLM Backbone] --data_path [COCO_DIR] -d [Decoding Strategy] --pope_type [random/popular/adversarial] --num_images 100 --seed [SEED] --gpu_id [GPU_IDs] --output_dir ./generated_captions/

You can also directly run the demo file here to test single image captioning. To run this demo, you can put the directory of the image you want to evaluate in this list, and then run

python run_scripts/demo_inference.py --model [LVLM Backbone] -d [Decoding Strategy] --seed [SEED]

We hope this would be helpful for you to run HALC and we will improve the Readme later to be more clear. If there's further questions please don't hesitate to ask:)

@qppwdd0324
Copy link
Author

Thank you for your prompt reply. May I ask what these functions in halc.py represent? Their comments are the same: "The method uses a list of context windows rooted from the DINO detection one and applies the contrasting decoding method to each context window pair to get a list of contrasting logits. Then we use the..."
Uploading 捕获.JPG…

@BillChan226
Copy link
Owner

Sorry, the image seems not to be uploaded successfully. These functions with the same comment are mainly different contrasting methods to contrast the various sampled FOV logits with each other. You can view this line to see how they are being used. Eventually we have used the context_layer_double_multi_contrastive_decoding function as described in our paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants