Evaluation bug when using GELU vs QuickGELU -- changes the results for the trained models #7

bryant1410 · 2023-12-23T03:45:40Z

Hey, I believe there's a bug when evaluating hard-negative augmented training. Your code uses open_clip, which in turn supports both the original ViT-B/32 architecture, which uses QuickGELU (they name it ViT-B-32-quickgelu) and their "standard" one (ViT-B-32).

When you use their code, you specify the model (config) and pretrained checkpoint, where the pretrained checkpoint is either a name supported for the given model or a path. They support "openai" checkpoint for both ViT-B-32 and ViT-B-32-quickgelu (and similarly for RN50) because they hardcode this pretrained checkpoint name to change it to a QuickGELU implementation, regardless of which one of these two was used.

The problem is that ViT-B-32 also seems to have been used for evaluation (by specifying a path to pretrained instead of "openai"). However, this will make QuickGELU not to be used but GELU because the hardcoded if path won't be triggered. And this affects the results. This is an error-prone behavior from open_clip, in my humble opinion. The needed change to fix it would be to use ViT-B-32-quickgelu in the evaluation or the flag --force-quick-gelu.

How do I know that you ran it this way for evaluation (that you ran into this bug)? Because: When I use GELU, I can reproduce your numbers from Table 6, but when I use QuickGELU, I get different numbers. I'm reproducing the numbers using a fork of open_clip and running my own evaluation of SugarCrepe using the checkpoints you shared.

Next, I compare the results I obtained with the ones reported by you for two checkpoints:

Numbers for NegCLIP FT:

Model	Replace-obj	Replace-att	Replace-rel	Swap-obj	Swap-att	Add-obj	Add-att
Reported by you	92.68	85.91	76.46	75.20	75.38	88.80	82.80
My evaluation with `ViT-B-32`	92.62	85.91	76.81	75.61	75.08	88.80	82.95
My evaluation with `ViT-B-32-quickgelu`	93.83	88.20	74.54	75.61	76.88	89.91	85.12

Numbers for ViT-B-32 fine-tuned with Replace:

Model	Replace-obj	Replace-att	Replace-rel	Swap-obj	Swap-att	Add-obj	Add-att
Reported by you	93.46	90.36	81.01	73.98	75.23	90.93	87.86
My evaluation using `ViT-B-32`	93.46	90.23	80.94	73.98	75.53	90.93	88.01
My evaluation using `ViT-B-32-quickgelu`	95.34	89.97	80.01	75.61	76.58	90.93	87.27

BTW, the original NegCLIP paper also seems to have had this issue.

The numbers improve considerably for other benchmarks, such as ImageNet (I have also tried with others). For examples, see the numbers for Replace:

Model	ImageNet
My evaluation using `ViT-B-32`	52.9
My evaluation using `ViT-B-32-quickgelu`	59.1
My evaluation for OpenAI CLIP	63.4

As we can see, the numbers are much closer to the original OpenAI-pre-trained CLIP numbers when fixing this bug.

The text was updated successfully, but these errors were encountered:

bryant1410 mentioned this issue Dec 23, 2023

Error-prone behavior of supporting the "openai" checkpoint with non-QuickGELU models mlfoundations/open_clip#771

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation bug when using GELU vs QuickGELU -- changes the results for the trained models #7

Evaluation bug when using GELU vs QuickGELU -- changes the results for the trained models #7

bryant1410 commented Dec 23, 2023 •

edited

Loading

Evaluation bug when using GELU vs QuickGELU -- changes the results for the trained models #7

Evaluation bug when using GELU vs QuickGELU -- changes the results for the trained models #7

Comments

bryant1410 commented Dec 23, 2023 • edited Loading

bryant1410 commented Dec 23, 2023 •

edited

Loading