-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
swapped_prediction computation #4
Comments
Yes, you are right, it is a small typo, I changed the code multiple times and I forgot to remove the loop. Anyway, the swapped assignment is performed in the same way, since I only index the view axis inside the |
yeah... preds = F.log_softmax(preds / 0.1, dim=2) on the last dim to do log_softmax. |
Yes, you are right. For some reason, this still works. Let me look into it. |
Ok. |
Hi, I fixed it and ran CIFAR100-20. I got similar results for CIFAR100-20 on the test set, while performance is slightly worse on the training set. I am now trying to do some hyperparameter tuning. I'll upload the fix as soon as possible. |
nice~ 👍 |
Hi, I have some good news. It seems that normalizing on the correct dimension improves performance quite significantly. I needed to tune the parameters a bit, but I just had one run hit 55% on the training set ( I also went back to my logs and found that ImageNet experiments were probably run without the bug, while all other datasets were affected. I will upload the new version as soon as I finish running experiments. |
okay, |
I have just merged a pull request that fixes this bug. Closing. Thanks @DeepTecher |
UNO/main_discover.py
Line 180 in 50022c9
Thanks for your nice work~
However, I have a question why we compute swapped_prediction for the length of num_head. It seems swapped_prediction is the same computation.
Hope your reply
The text was updated successfully, but these errors were encountered: