You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! According to XTD-10 repo, the test set contains 800 images from MSCOCO train set. During training you also use MSCOCO train set – it seems you have data leak. Or may be I don't understand something.
The text was updated successfully, but these errors were encountered:
Now that you mention it, it looks like XTD includes train images in their translated captions. Which, in my humble opinion, is a rather weird decision... At least when there's still data from val+test that they have not used... ?
So yes, there seems to be data leakage in our evaluation.
We're currently working on creating a better evaluation system at CLIP_BENCHMARK, and we are working towards creating some multilingual evaluations.
The evaluations at this repo should be updated when such evaluations are available.
How did you evaluate Table 1 in the original paper ('Cross-lingual and Multilingual CLIP')? The space of retrievable images were the 1k images from XTD-10 dataset? Because there's null interesection between the images of that dataset and the MSCOCO 2014 test set.
Hello! According to XTD-10 repo, the test set contains 800 images from MSCOCO train set. During training you also use MSCOCO train set – it seems you have data leak. Or may be I don't understand something.
The text was updated successfully, but these errors were encountered: