Why add 0 loss to the original loss? #7

ghost · 2021-02-25T16:09:09Z

Lines 285 to 288 in 552e29e

    
           # for avoid backward the unused params 
        
           loss += 0 * sum(p.sum() for p in net.module.alpha_prob_parameters()) 
        
           loss += 0 * sum(p.sum() for p in net.module.alpha_gate_parameters()) 
        
           loss += 0 * sum(p.sum() for p in net.module.net_parameters())

What is this part of the codes aimed at?
I'll appreciate it if anyone could explain that.

cuiyuhao1996 · 2021-02-26T11:49:17Z

It is a simple trick for distributed training ("distributeddataparallel') with NCCL. If you use NCCL for GPU communication in Pytorch, the unused parameters can result in some errors. However, it would not affect the multi-thread method "dataparallel ".

ghost · 2021-02-27T10:44:24Z

@cuiyuhao1996 I got it. Thanks a lot.

MIL-VLG closed this as completed Mar 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why add 0 loss to the original loss? #7

Why add 0 loss to the original loss? #7

ghost commented Feb 25, 2021 •

edited by ghost

cuiyuhao1996 commented Feb 26, 2021

ghost commented Feb 27, 2021

Why add 0 loss to the original loss? #7

Why add 0 loss to the original loss? #7

Comments

ghost commented Feb 25, 2021 • edited by ghost

cuiyuhao1996 commented Feb 26, 2021

ghost commented Feb 27, 2021

ghost commented Feb 25, 2021 •

edited by ghost