Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding MobileCaptureVQA to the benchmark #127

Open
arnaudstiegler opened this issue Jun 20, 2024 · 1 comment
Open

Adding MobileCaptureVQA to the benchmark #127

arnaudstiegler opened this issue Jun 20, 2024 · 1 comment

Comments

@arnaudstiegler
Copy link

arnaudstiegler commented Jun 20, 2024

Hi team,
I'd be interested to see whether we could add the MobileCaptureVQA dataset on this benchmark.

This VQA dataset focused on mobile capture (i.e. images taken from a phone), that aims at assessing models on extraction capabilities specifically for mobile capture.
Contrarily to existing VQA benchmarks (DocVQA, ChartVQA), it puts the emphasis on mobile-capture-specific noise such as bad lighting, document skew, and provides a much higher variability of text in the wild (can be a receipt, a bottle of wine, food packaging, etc..). Similarly to other VQA datasets, it is meant to be purely extractive, i.e. the answer to the question is written somewhere in the image (which allows for easy scoring).

The dataset is already available on HuggingFace: https://huggingface.co/datasets/arnaudstiegler/mobile_capture_vqa
It contains ~850 questions for ~120 unique images.

I'd be happy to contribute the code to add the dataset if there's any interest!

Here's one sample from the dataset (question/answers is at the top)
Screenshot 2024-06-19 at 3 58 53 PM

@kcz358
Copy link
Contributor

kcz358 commented Jun 22, 2024

Hi, feel free to contribute dataset and benchmarks into our pipeline. Once you create a PR, we will try to review your code and working on it for a merge

Dannoopsy pushed a commit to Dannoopsy/lmms-eval that referenced this issue Jun 25, 2024
…MMs-Lab#127)

* Resolve conflict when merge the kr_ego with internal_main_dev

* fix the bug of file overwrite

* Optimize the inference of videochatgpt dataset

* Resolve conflict

* delete repeated line

* reformat the code

* rename the file name for inference results

* group the same task together for cvrr and videochatgpt

* group the same task together for videochatgpt and cvrr

* reformat the code

* fix the bug of videochatgpt_consistency multiocessing

* Rename the metric from submission to subtask

* fix the bug of consistency where different answers agre generated in pred2

* add accuracy into the evaluation of cvrr

* add accuracy metric to cvrr dataset

* remove duplicate rows when merging from main branch

* Refactor videochatgpt_gen and videochatgpt_temporal for correct score parsing

* enable the webm video loader for llavavid as required in cvrr dataset

* Refactor process_results function to handle full_docs in videochatgpt task

* add tqdm to consistency gpt_eval

* Refactor the cvrr for correct aggregate logic

* change backend to decord for videochatgpt eval

* Fix for mkv video path

* add perceptiontest dataset test split

* doublecheck and optimize the code in egoschema

* rename metric name of perceptiontest

* add perceptiontest_validation dataset

* remove egoschema aggregate function name

* add temcompass mc dataset

* remove redundant files

* add tempcompass yes_no, captioning, caption_matching subsets

* add all the 5 aspects as metrics

* reformat the output dict for successful match

* remove redundant aggregation function in videochatgpt and rename some function names

* remove redundant aggregation function in activitynetqa and video_detail_description

* remove redundant aggregate functions in cvrr

* remove redundant rows in perception test

* use black ./ to reformat code

* debug: load webm file is now successful

* Remove perceptiontest and perceptiontest_val default template YAML files

* put post prompt in yaml for empcompass dataset

* align gpt eval model name for cvrr and debug the tempcompass in case match is unsuccessful

* "debug tempcompass captioning for multi-choice, and optimze matching rule in egoschema"

* "add a period to each option in egoschema becuase mplugowl always end with period"

* "add readme for egoschema"

* "change readme citation for egoschema"

* "delete redundant print, lint the repo"

* "add higher_is_better for submission metric to avoid warning"

---------

Co-authored-by: Bo Li <drluodian@gmail.com>
Co-authored-by: kcz358 <kaichenzhang358@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants