Skip to content

Proper way to log things when using DDP #6501

You must be logged in to vote

Hi all,
Sorry we have not got back to you in time, let me try to answer some of your questions:

  1. Is validation_epoch_end only called on rank 0?

No, it is called by all processes

  1. What does the sync_dist flag do:

Here is the essential code:
https://github.com/PyTorchLightning/pytorch-lightning/blob/a72a7992a283f2eb5183d129a8cf6466903f1dc8/pytorch_lightning/core/step_result.py#L108-L115
If sync_dist=True then it will as default call the sync_ddp function which will sum the value across all processes using torch.distributed.all_reduce
https://github.com/PyTorchLightning/pytorch-lightning/blob/a72a7992a283f2eb5183d129a8cf6466903f1dc8/pytorch_lightning/utilities/distributed.py#L120
Use this …

Replies: 5 comments 28 replies

You must be logged in to vote
1 reply
@jandonov

You must be logged in to vote
2 replies
@jandonov

@rudaoshi

You must be logged in to vote
3 replies
@williamFalcon

@jandonov

@jandonov

You must be logged in to vote
0 replies

You must be logged in to vote
22 replies
@Alec-Stashevsky

@SkafteNicki

@krunolp

@mfoglio

@davidgill97

Answer selected by jandonov
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment