Skip to content

Remove hideInDatasets for multimodal tasks #1495

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 20, 2025
Merged

Remove hideInDatasets for multimodal tasks #1495

merged 2 commits into from
Jun 20, 2025

Conversation

merveenoyan
Copy link
Contributor

More and more datasets are showing up for multimodal tasks, and some authors are picking wrong task tags because hideInDataset is true, so removing them

Copy link
Member

@Vaibhavs10 Vaibhavs10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any intuition of how many such cases are there?

@pcuenca
Copy link
Member

pcuenca commented May 26, 2025

Yes, a few examples could be great for better understanding. I see, for example, this one that could possibly be assigned an image-text-to-text tag, but I wonder if other VQA datasets, such as the Cauldron, should have the same.

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke offline with Merve and had another look at things.

I'd be supportive of merging, given that:

But please, let's wait for vb to come back and see if he has additional insight!

Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no objection!

Copy link
Member

@Vaibhavs10 Vaibhavs10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pulling the numbers! Only recommendation/ suggestion would be to tag a few more datasets for the following:

any-to-any (13), visual-document-retrieval (8)

atleast so we have one page full of datasets.

@pcuenca
Copy link
Member

pcuenca commented Jun 20, 2025

Can we maybe merge this PR? We can always iterate later.

@merveenoyan
Copy link
Contributor Author

merveenoyan commented Jun 20, 2025

sorry through the releases I couldn't work on this, @pcuenca I'm currently opening automatic PRs to a lot of models, I think it's ok to merge this

@merveenoyan
Copy link
Contributor Author

I have opened more than 100 PRs, merging this, thanks a ton!

@merveenoyan merveenoyan merged commit 0e2b369 into main Jun 20, 2025
4 of 5 checks passed
@merveenoyan merveenoyan deleted the mm-datasets branch June 20, 2025 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants