-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Gradient accumulation with the distributed trainer #3537
Conversation
allennlp/training/trainer.py
Outdated
@@ -343,21 +353,20 @@ def _train_epoch(self, epoch: int) -> Dict[str, float]: | |||
train_generator_tqdm = train_generator | |||
|
|||
cumulative_batch_size = 0 | |||
for batch in train_generator_tqdm: | |||
for batches in lazy_groups_of(train_generator_tqdm, self._num_gradient_accumulation_steps): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Maybe accumulation_groups
(or batch_group
or whatever) instead of batches
? Might be confusing to a reader see lines like cur_batch = sum(training_util.get_batch_size(batch) for batch in batches)
.
allennlp/training/trainer.py
Outdated
@@ -323,7 +331,9 @@ def _train_epoch(self, epoch: int) -> Dict[str, float]: | |||
|
|||
# Get tqdm for the training batches | |||
train_generator = self.iterator(self.train_data, num_epochs=1, shuffle=self.shuffle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lazy_groups_of
call should be here to create a new generator and tqdm
should be applied to that. As in
allennlp/allennlp/training/trainer.py
Line 306 in 3dda5ac
raw_train_generator = self.iterator(self.train_data, num_epochs=1, shuffle=self.shuffle) |
@brendan-ai2, I took some license with the naming, and fixed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
I re-did gradient accumulation with the distributed trainer.