ROME/README.md at main · K-Square-00/ROME · GitHub

ROME: Evaluating Pre-trained Vision-Language Models on Reasoning beyond Visual Common Sense

The dataset can be downloaded via below link: https://drive.google.com/drive/folders/17D-zgMaJ8UP-piNfDzCTd5ZpiCmbWtB8?usp=sharing

Disclaimer

The images of color, shape and material categories are generated via DALL·E which subject to the Content Policy (https://labs.openai.com/policies/content-policy) and Terms (https://openai.com/api/policies/terms/) of OpenAI.

The images of size and positional relation are generated via Image Creator from Microsoft Bing which subject to the Content Policy (https://www.bing.com/images/create/contentpolicy?FORM=GEN2CP) and Terms (https://www.bing.com/new/termsofuse?FORM=GENTOS) of Microsoft Bing.

Citation

To be updated