Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a lot of questions about how to reproduce and cite the experimental results #16

Closed
vicmax opened this issue Oct 18, 2022 · 6 comments
Closed

Comments

@vicmax
Copy link

vicmax commented Oct 18, 2022

Thank you for sharing the code of your paper! I have a lot of questions about how to reproduce and cite the experimental results of your paper.

  1. My first question is how to reproduce the results on the original paper. I noticed that a new version (UNO v2) was released with higher performance, and meanwhile, a lot of hyperparameters have been changed. I wonder what values these hyperparameters were set in your original paper. Here is my guess:
  • For multi-view, only two large crops were used to build the swapped predictions loss. The two small crops were added in UNO v2.
  • For the base learning rate, it was originally set to base_lr=0.1 as described in the paper, not the value base_lr=0.4 in the current commit.
  • For the batch size, it was originally set to 512 as described in the paper. (Actually, I am not sure since it was set to 256 in the earlier commits of this repo).
  • For the discovery epochs, it was originally set to max_epochs=200 for all datasets as described in the paper, not like max_epochs=500 for cifar10/cifar100-20/cifar100-50 and max_epochs=60 for ImageNet in the current commit.
  • As for data augmentations, Solarize and Equalize were just added at UNOv2, not used in the original paper.
  1. My second question is how to cite the experimental results. I noticed that a lot of training tricks (i.e., doubled training epochs, using extra two small crops for multi-view, and more data augmentation transformations) were used in UNOv2. However, some tricks have already made an unfair comparison with the previous work. For example, the representative previous work RS[1,2] just used batch_size=128, max_epoch=200, and without any complex augmentations used in your paper.

And as shown in your update as well as in my experiments, some changes, like the number of discovery epochs or the batch size, had significant effects on the final performance. So I am really confused about how to make fair comparisons...

References:
[1] Automatically discovering and learning new visual categories with ranking statistics. ICLR 2020.
[2] Autonovel: Automatically discovering and learning novel visual categories. TPAMI, 2021.

@DonkeyShot21
Copy link
Owner

Hi! Thanks for the interest in our work.

An easy solution is to checkout the commit before the addition of UNOv2, run one of the commands in the readme and print the args and the transforms. Then you can reuse those hyperparam for UNOv2, if that is what you would like to do.

However, an even easier solution would be to just mention boh UNO v1 and v2. For UNOv1 you can use the result in the paper and compare to RS. For UNOv2 you can use the results on the readme and compare with RS+ (uses 400 epochs). The batch size does not impact UNO too much so I would not worry about that. The best thing in my opinion is laying down the results in your paper and let the reviewer decide.

Regarding augmentations, multicrop does not use that much more compute because crops are resized to a lower resolution. Moreover it is reasonable that different methods work better with different augmentations. For instance, I found UNO to be extremely robust and tolerant to different augmentations, while RS decreases performance with stronger augmentations. I think it is fair to compare with the best set of augmentations for each method. This is what people have been doing in the SSL literature: SimCLR, BYOL, SwAV and DINO all have different augmentations that fit the method.

I hope that helps.

@vicmax
Copy link
Author

vicmax commented Oct 18, 2022

Hi @DonkeyShot21

Thank you for your reply! It helps!

And I did have some observations by running your code.

The setting of the four groups listed in the following are (from top to bottom):

group index of wandb "Name" different settings task agnostic/test/acc
1-CIFAR100-UNO-discover-resnet18-80_20_bs256_smallcrop batch_size=256, num_small_crops=2 0.6661$\pm$0.03
2-CIFAR100-UNO-discover-resnet18-80_20_bs256_woSmallcrop batch_size=256, num_small_crops=0 0.7321$\pm$0.02
3-CIFAR100-UNO-discover-resnet18-80_20_bs512_woSmallcrop batch_size=512, num_small_crops=0 0.7648$\pm$0.009
4-CIFAR100-discover-resnet18-80_20 batch_size=512, num_small_crops=2 0.7803$\pm$0.006

Other arguments are kept as same (e.g., max_epoch=500, base_lr=0.4 (maybe should be set smaller for smaller batch size), num_large_crops=2)

We can observe that smaller batch size will lead to worse performance (and larger perturbations)...

uno

@DonkeyShot21
Copy link
Owner

Of course when you modify the batch size you need to retune the hyperparams, at least the learning rate! It is not normal that your accuracy on the unlab set is going down.

@vicmax
Copy link
Author

vicmax commented Oct 18, 2022

Hi @DonkeyShot21

Thank you. I will do more experiments.

And I just wonder why you adjusted the learning rate from 0.1 to 0.4 in UNO v2. Will it bring benefits? Or is that just because longer training epochs can bear a larger learning rate as we have longer steps under small learning rates during the process of annealing?

@DonkeyShot21
Copy link
Owner

I think I tuned it but I don't remember exactly. Also remember that for very small batch sizes you might need a queue (take a look at SwAV's code)

@vicmax
Copy link
Author

vicmax commented Oct 18, 2022

Thank you very much for your detailed replies! I don't have further questions so far.

Have a nice day!

@vicmax vicmax closed this as completed Oct 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants