Adding MobileCaptureVQA to the benchmark #127

arnaudstiegler · 2024-06-20T20:54:44Z

Hi team,
I'd be interested to see whether we could add the MobileCaptureVQA dataset on this benchmark.

This VQA dataset focused on mobile capture (i.e. images taken from a phone), that aims at assessing models on extraction capabilities specifically for mobile capture.
Contrarily to existing VQA benchmarks (DocVQA, ChartVQA), it puts the emphasis on mobile-capture-specific noise such as bad lighting, document skew, and provides a much higher variability of text in the wild (can be a receipt, a bottle of wine, food packaging, etc..). Similarly to other VQA datasets, it is meant to be purely extractive, i.e. the answer to the question is written somewhere in the image (which allows for easy scoring).

The dataset is already available on HuggingFace: https://huggingface.co/datasets/arnaudstiegler/mobile_capture_vqa
It contains ~850 questions for ~120 unique images.

I'd be happy to contribute the code to add the dataset if there's any interest!

Here's one sample from the dataset (question/answers is at the top)

kcz358 · 2024-06-22T05:45:56Z

Hi, feel free to contribute dataset and benchmarks into our pipeline. Once you create a PR, we will try to review your code and working on it for a merge

…MMs-Lab#127) * Resolve conflict when merge the kr_ego with internal_main_dev * fix the bug of file overwrite * Optimize the inference of videochatgpt dataset * Resolve conflict * delete repeated line * reformat the code * rename the file name for inference results * group the same task together for cvrr and videochatgpt * group the same task together for videochatgpt and cvrr * reformat the code * fix the bug of videochatgpt_consistency multiocessing * Rename the metric from submission to subtask * fix the bug of consistency where different answers agre generated in pred2 * add accuracy into the evaluation of cvrr * add accuracy metric to cvrr dataset * remove duplicate rows when merging from main branch * Refactor videochatgpt_gen and videochatgpt_temporal for correct score parsing * enable the webm video loader for llavavid as required in cvrr dataset * Refactor process_results function to handle full_docs in videochatgpt task * add tqdm to consistency gpt_eval * Refactor the cvrr for correct aggregate logic * change backend to decord for videochatgpt eval * Fix for mkv video path * add perceptiontest dataset test split * doublecheck and optimize the code in egoschema * rename metric name of perceptiontest * add perceptiontest_validation dataset * remove egoschema aggregate function name * add temcompass mc dataset * remove redundant files * add tempcompass yes_no, captioning, caption_matching subsets * add all the 5 aspects as metrics * reformat the output dict for successful match * remove redundant aggregation function in videochatgpt and rename some function names * remove redundant aggregation function in activitynetqa and video_detail_description * remove redundant aggregate functions in cvrr * remove redundant rows in perception test * use black ./ to reformat code * debug: load webm file is now successful * Remove perceptiontest and perceptiontest_val default template YAML files * put post prompt in yaml for empcompass dataset * align gpt eval model name for cvrr and debug the tempcompass in case match is unsuccessful * "debug tempcompass captioning for multi-choice, and optimze matching rule in egoschema" * "add a period to each option in egoschema becuase mplugowl always end with period" * "add readme for egoschema" * "change readme citation for egoschema" * "delete redundant print, lint the repo" * "add higher_is_better for submission metric to avoid warning" --------- Co-authored-by: Bo Li <drluodian@gmail.com> Co-authored-by: kcz358 <kaichenzhang358@outlook.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding MobileCaptureVQA to the benchmark #127

Adding MobileCaptureVQA to the benchmark #127

arnaudstiegler commented Jun 20, 2024 •

edited

Loading

kcz358 commented Jun 22, 2024

Adding MobileCaptureVQA to the benchmark #127

Adding MobileCaptureVQA to the benchmark #127

Comments

arnaudstiegler commented Jun 20, 2024 • edited Loading

kcz358 commented Jun 22, 2024

arnaudstiegler commented Jun 20, 2024 •

edited

Loading