Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swapped_prediction computation #4

Closed
DeepTecher opened this issue Sep 28, 2021 · 9 comments
Closed

swapped_prediction computation #4

DeepTecher opened this issue Sep 28, 2021 · 9 comments

Comments

@DeepTecher
Copy link

loss_cluster.append(self.swapped_prediction(logits, targets))

Thanks for your nice work~

However, I have a question why we compute swapped_prediction for the length of num_head. It seems swapped_prediction is the same computation.

Hope your reply

@DonkeyShot21
Copy link
Owner

Yes, you are right, it is a small typo, I changed the code multiple times and I forgot to remove the loop. Anyway, the swapped assignment is performed in the same way, since I only index the view axis inside the swapped_assignment method, and the cross-entropy loss compares tensors element-wise. Also, the losses are then averaged, so nothing should change.

@DeepTecher
Copy link
Author

yeah...
But it still has a problem on cross_entropy_loss function.
if I guess right, the dims of preds is [num_views, bach_size, num_label+num_unlabel], our F.log_softmax should be

preds = F.log_softmax(preds / 0.1, dim=2)

on the last dim to do log_softmax.
we do dim=1 will work on batch_size. Is it right?

@DonkeyShot21
Copy link
Owner

Yes, you are right. For some reason, this still works. Let me look into it.

@DeepTecher
Copy link
Author

Ok.
if you have a new conclusion, please let me know
many thanks

@DonkeyShot21
Copy link
Owner

Hi, I fixed it and ran CIFAR100-20. I got similar results for CIFAR100-20 on the test set, while performance is slightly worse on the training set. I am now trying to do some hyperparameter tuning. I'll upload the fix as soon as possible.

@DeepTecher
Copy link
Author

nice~ 👍

@DonkeyShot21
Copy link
Owner

Hi, I have some good news. It seems that normalizing on the correct dimension improves performance quite significantly. I needed to tune the parameters a bit, but I just had one run hit 55% on the training set (unlab/train/acc) and 56% on the test set (unlab/test/acc) for CIFAR100-50. I am testing if the same parameters work on the other settings.

I also went back to my logs and found that ImageNet experiments were probably run without the bug, while all other datasets were affected. I will upload the new version as soon as I finish running experiments.

@DeepTecher
Copy link
Author

okay,
I cannot wait for the newest result.

@DonkeyShot21
Copy link
Owner

I have just merged a pull request that fixes this bug. Closing. Thanks @DeepTecher

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants