How to correctly set the T_max variable (maximum number of iterations) for the CosineAnnealingLR scheduler in DDP training #17307
Unanswered
liutianlin0121
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there! I have a question regarding how to correctly set the
T_max
variable (maximum number of iterations) for theCosineAnnealingLR
scheduler in DDP training.Suppose I am only using 1 GPU and wish to anneal the learning rate per each batch. In that case, I would simply set
T_max
tomax_epochs * len(train_loader)
, wherelen(train_loader)
is the number of batches in my dataset.Now, let's consider the scenario where I am using DDP training with 2 GPUs. Since each GPU has visibility into only half of the dataset, the number of batches for each GPU is effectively halved. In that case, to achieve consistent behavior, should I set
T_max
tomax_epochs * len(train_loader) / 2
in the CosineAnnealingLR learning rate scheduler?Thank you!
Beta Was this translation helpful? Give feedback.
All reactions