You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey, I believe there's a bug when evaluating hard-negative augmented training. Your code uses open_clip, which in turn supports both the original ViT-B/32 architecture, which uses QuickGELU (they name it ViT-B-32-quickgelu) and their "standard" one (ViT-B-32).
The problem is that ViT-B-32 also seems to have been used for evaluation (by specifying a path to pretrained instead of "openai"). However, this will make QuickGELU not to be used but GELU because the hardcoded if path won't be triggered. And this affects the results. This is an error-prone behavior from open_clip, in my humble opinion. The needed change to fix it would be to use ViT-B-32-quickgelu in the evaluation or the flag --force-quick-gelu.
How do I know that you ran it this way for evaluation (that you ran into this bug)? Because: When I use GELU, I can reproduce your numbers from Table 6, but when I use QuickGELU, I get different numbers. I'm reproducing the numbers using a fork of open_clip and running my own evaluation of SugarCrepe using the checkpoints you shared.
Hey, I believe there's a bug when evaluating hard-negative augmented training. Your code uses open_clip, which in turn supports both the original ViT-B/32 architecture, which uses QuickGELU (they name it
ViT-B-32-quickgelu
) and their "standard" one (ViT-B-32
).When you use their code, you specify the
model
(config) andpretrained
checkpoint, where thepretrained
checkpoint is either a name supported for the given model or a path. They support "openai" checkpoint for bothViT-B-32
and ViT-B-32-quickgelu
(and similarly forRN50
) because they hardcode this pretrained checkpoint name to change it to a QuickGELU implementation, regardless of which one of these two was used.The problem is that
ViT-B-32
also seems to have been used for evaluation (by specifying a path topretrained
instead of "openai"). However, this will make QuickGELU not to be used but GELU because the hardcodedif
path won't be triggered. And this affects the results. This is an error-prone behavior from open_clip, in my humble opinion. The needed change to fix it would be to useViT-B-32-quickgelu
in the evaluation or the flag--force-quick-gelu
.How do I know that you ran it this way for evaluation (that you ran into this bug)? Because: When I use GELU, I can reproduce your numbers from Table 6, but when I use QuickGELU, I get different numbers. I'm reproducing the numbers using a fork of open_clip and running my own evaluation of SugarCrepe using the checkpoints you shared.
Next, I compare the results I obtained with the ones reported by you for two checkpoints:
Numbers for NegCLIP FT:
ViT-B-32
ViT-B-32-quickgelu
Numbers for ViT-B-32 fine-tuned with Replace:
ViT-B-32
ViT-B-32-quickgelu
BTW, the original NegCLIP paper also seems to have had this issue.
The numbers improve considerably for other benchmarks, such as ImageNet (I have also tried with others). For examples, see the numbers for Replace:
ViT-B-32
ViT-B-32-quickgelu
As we can see, the numbers are much closer to the original OpenAI-pre-trained CLIP numbers when fixing this bug.
The text was updated successfully, but these errors were encountered: