Question about the cons weight. #10

CuberrChen · 2022-01-09T07:26:02Z

Hi. In mean teacher, the consistency weight is 100. but in this work, all consistency weight is 1. Isn't this value too small?
Can you tell me the details of setting this parameter, because I have seen other work(such as
CPS, cvpr2021) that uses a consistency weight of around 100 when reproducing the mean-teacher method as well.

Looking forward to your help.

best,

Britefury · 2022-01-09T15:43:25Z

Hi, There are a few factors that I can think of. For a start, the original mean teacher averages the MSE consistency loss over the class dimension: https://github.com/CuriousAI/mean-teacher/blob/546348ff863c998c26be4339021425df973b4a36/pytorch/mean_teacher/losses.py#L27 Note that size_average=False results in F.mse_loss computing the sum of the MSE loss, followed by a division by num_classes, then: https://github.com/CuriousAI/mean-teacher/blob/546348ff863c998c26be4339021425df973b4a36/pytorch/main.py#L263 in which the consistency loss is divided by the mini-batch size, so they compute the average of the mse loss over all dimensions. Given that the mean teacher paper stated that you have to scale the consistency loss weight with the number of classes (I seem to recall), we figured that we would sum over the class dimension and use the same consistency weight all over. To account for this on a 10-class dataset, divide the loss weight by 10. Now take a look at: https://github.com/CuriousAI/mean-teacher/blob/546348ff863c998c26be4339021425df973b4a36/pytorch/experiments/cifar10_test.py#L35 They draw a batch of 128 unsupervised samples for each batch of 31 supervised samples, so a ratio of 4:1. We on the other hand use a ratio of 1:1. If we consider that when training ImageNet networks using large batch sizes (e.g. 1024 or more) people tend to scale the learning rate linearly with the batch size, it makes sense that you would need a 4x higher consistency loss weight when using 4x the unsupervised samples per batch. So this accounts for a further 4x difference. Accounting for these differences, they consistency loss is 'in effect' 2.5x higher than ours. From the results of my parameter sweeps, I don't recall this making a huge difference. I hope this helps. Kind regards Geoff

…

On Sun, 9 Jan 2022 at 07:26, xbchen ***@***.***> wrote: Hi. In mean teacher, the consistency weight is 100. but in this work, all consistency weight is 1. Isn't this value too small? Can you tell me the details of setting this parameter, because I have seen other work(such as CPS, cvpr2021) that uses a consistency weight of around 100 when reproducing the mean-teacher method as well. Looking forward to your help. best, — Reply to this email directly, view it on GitHub <#10>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAG3D7TOA7CNRLXRPW2WKZTUVE2BLANCNFSM5LRRFPBQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

CuberrChen · 2022-01-10T01:36:37Z

Thank you very much for your detailed reply!

Since when I use large consistency loss weight, mean-teacher does not work properly and instead leads to degradation of segmentation performance.

Your answer is very helpful for me to understand it.

Best,

Britefury · 2022-01-10T08:14:14Z

Glad I could help.
It's often worth doing a manual sweep on these hyper-parameters in order to find the best value.
For loss weights, I sweep on an exponential scale, so I try perhaps the following values: 0.01, 0.03, 0.1, 0.3, 1.0, 3.0, 10.0, 30.0, 100.0. Close enough not to miss an optimal value while allowing you to cover a range. That's pretty much what I did to find the values that we use here.

CuberrChen · 2022-01-10T08:42:35Z

Thanks.

CuberrChen closed this as completed Jan 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the cons weight. #10

Question about the cons weight. #10

CuberrChen commented Jan 9, 2022

Britefury commented Jan 9, 2022 via email

CuberrChen commented Jan 10, 2022

Britefury commented Jan 10, 2022

CuberrChen commented Jan 10, 2022 •

edited

Loading

Question about the cons weight. #10

Question about the cons weight. #10

Comments

CuberrChen commented Jan 9, 2022

Britefury commented Jan 9, 2022 via email

CuberrChen commented Jan 10, 2022

Britefury commented Jan 10, 2022

CuberrChen commented Jan 10, 2022 • edited Loading

CuberrChen commented Jan 10, 2022 •

edited

Loading