Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why add 0 loss to the original loss? #7

Closed
ghost opened this issue Feb 25, 2021 · 2 comments
Closed

Why add 0 loss to the original loss? #7

ghost opened this issue Feb 25, 2021 · 2 comments

Comments

@ghost
Copy link

ghost commented Feb 25, 2021

mmnas/search_vqa.py

Lines 285 to 288 in 552e29e

# for avoid backward the unused params
loss += 0 * sum(p.sum() for p in net.module.alpha_prob_parameters())
loss += 0 * sum(p.sum() for p in net.module.alpha_gate_parameters())
loss += 0 * sum(p.sum() for p in net.module.net_parameters())

What is this part of the codes aimed at?
I'll appreciate it if anyone could explain that.

@cuiyuhao1996
Copy link
Member

It is a simple trick for distributed training ("distributeddataparallel') with NCCL. If you use NCCL for GPU communication in Pytorch, the unused parameters can result in some errors. However, it would not affect the multi-thread method "dataparallel ".

@ghost
Copy link
Author

ghost commented Feb 27, 2021

@cuiyuhao1996 I got it. Thanks a lot.

@MIL-VLG MIL-VLG closed this as completed Mar 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants