-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about the cons weight. #10
Comments
Hi,
There are a few factors that I can think of.
For a start, the original mean teacher averages the MSE consistency loss
over the class dimension:
https://github.com/CuriousAI/mean-teacher/blob/546348ff863c998c26be4339021425df973b4a36/pytorch/mean_teacher/losses.py#L27
Note that size_average=False results in F.mse_loss computing the sum of the
MSE loss, followed by a division by num_classes, then:
https://github.com/CuriousAI/mean-teacher/blob/546348ff863c998c26be4339021425df973b4a36/pytorch/main.py#L263
in which the consistency loss is divided by the mini-batch size, so they
compute the average of the mse loss over all dimensions.
Given that the mean teacher paper stated that you have to scale the
consistency loss weight with the number of classes (I seem to recall), we
figured that we would sum over the class dimension and use the same
consistency weight all over.
To account for this on a 10-class dataset, divide the loss weight by 10.
Now take a look at:
https://github.com/CuriousAI/mean-teacher/blob/546348ff863c998c26be4339021425df973b4a36/pytorch/experiments/cifar10_test.py#L35
They draw a batch of 128 unsupervised samples for each batch of 31
supervised samples, so a ratio of 4:1.
We on the other hand use a ratio of 1:1.
If we consider that when training ImageNet networks using large batch sizes
(e.g. 1024 or more) people tend to scale the learning rate linearly with
the batch size, it makes sense that you would need a 4x higher consistency
loss weight when using 4x the unsupervised samples per batch.
So this accounts for a further 4x difference.
Accounting for these differences, they consistency loss is 'in effect' 2.5x
higher than ours. From the results of my parameter sweeps, I don't recall
this making a huge difference.
I hope this helps.
Kind regards
Geoff
…On Sun, 9 Jan 2022 at 07:26, xbchen ***@***.***> wrote:
Hi. In mean teacher, the consistency weight is 100. but in this work, all
consistency weight is 1. Isn't this value too small?
Can you tell me the details of setting this parameter, because I have seen
other work(such as
CPS, cvpr2021) that uses a consistency weight of around 100 when
reproducing the mean-teacher method as well.
Looking forward to your help.
best,
—
Reply to this email directly, view it on GitHub
<#10>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG3D7TOA7CNRLXRPW2WKZTUVE2BLANCNFSM5LRRFPBQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Thank you very much for your detailed reply! Since when I use large consistency loss weight, mean-teacher does not work properly and instead leads to degradation of segmentation performance. Your answer is very helpful for me to understand it. Best, |
Glad I could help. |
Thanks. |
Hi. In mean teacher, the consistency weight is 100. but in this work, all consistency weight is 1. Isn't this value too small?
Can you tell me the details of setting this parameter, because I have seen other work(such as
CPS, cvpr2021) that uses a consistency weight of around 100 when reproducing the mean-teacher method as well.
Looking forward to your help.
best,
The text was updated successfully, but these errors were encountered: