Will ModelCheckpoint save checkpoints in all ranks? #17388
Unanswered
pengzhenghao
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This question confuses me since I need to run the validation on the rank 0 only and report the result from rank 0 only.
If the ModelCheckpoint saves checkpoints in all ranks (which is unnecessary?), since I need to monitor the validation metrics, which is only available in rank0, the ModelCheckpoint will fails in other ranks.
Is it possible to let ModelCheckpoint only do monitoring in rank0? Thanks!
Beta Was this translation helpful? Give feedback.
All reactions