Gradient accumulation with the distributed trainer #3537

dirkgr · 2019-12-17T19:34:22Z

I re-did gradient accumulation with the distributed trainer.

brendan-ai2 · 2019-12-17T22:19:10Z

allennlp/training/trainer.py

@@ -343,21 +353,20 @@ def _train_epoch(self, epoch: int) -> Dict[str, float]:
            train_generator_tqdm = train_generator

        cumulative_batch_size = 0
-        for batch in train_generator_tqdm:
+        for batches in lazy_groups_of(train_generator_tqdm, self._num_gradient_accumulation_steps):


Nit: Maybe accumulation_groups (or batch_group or whatever) instead of batches? Might be confusing to a reader see lines like cur_batch = sum(training_util.get_batch_size(batch) for batch in batches).

brendan-ai2 · 2019-12-17T22:23:42Z

allennlp/training/trainer.py

@@ -323,7 +331,9 @@ def _train_epoch(self, epoch: int) -> Dict[str, float]:

        # Get tqdm for the training batches
        train_generator = self.iterator(self.train_data, num_epochs=1, shuffle=self.shuffle)


The lazy_groups_of call should be here to create a new generator and tqdm should be applied to that. As in

allennlp/allennlp/training/trainer.py

Line 306 in 3dda5ac

raw_train_generator = self.iterator(self.train_data, num_epochs=1, shuffle=self.shuffle)

previously. Otherwise tqdm will give an inaccurate count of the batches.

dirkgr · 2019-12-17T22:37:43Z

@brendan-ai2, I took some license with the naming, and fixed the tqdm issue while I was at it.

brendan-ai2

LGTM. Thanks!

Gradient accumulation with the distributed trainer

7197010

dirkgr self-assigned this Dec 17, 2019

dirkgr added 2 commits December 17, 2019 11:35

Productivity through formatting

fd53386

Productivity through formatting

6603584

dirkgr requested a review from brendan-ai2 December 17, 2019 21:41

brendan-ai2 reviewed Dec 17, 2019

View reviewed changes

Changed names everywhere and fix wrong count in TQDM

e322672

dirkgr mentioned this pull request Dec 17, 2019

Gradient accumulation #3512

Closed

brendan-ai2 approved these changes Dec 17, 2019

View reviewed changes

dirkgr merged commit b85c86c into master Dec 17, 2019

dirkgr deleted the GradientAccumulation2 branch December 17, 2019 23:06

This was referenced Dec 17, 2019

[Feature request] Gradient Accumulation #3469

Closed

Accumulating gradients #2112

Closed

jankrepl mentioned this pull request May 15, 2020

Create initial version of setup.py allenai/scibert#86

Merged

ruanchaves mentioned this pull request Aug 17, 2020

[Feature Request] Upgrade to the latest AllenNLP version and allow gradient accumulation izuna385/Zero-Shot-Entity-Linking#3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient accumulation with the distributed trainer #3537

Gradient accumulation with the distributed trainer #3537

dirkgr commented Dec 17, 2019

brendan-ai2 Dec 17, 2019

brendan-ai2 Dec 17, 2019

dirkgr commented Dec 17, 2019

brendan-ai2 left a comment

		@@ -323,7 +331,9 @@ def _train_epoch(self, epoch: int) -> Dict[str, float]:

		# Get tqdm for the training batches
		train_generator = self.iterator(self.train_data, num_epochs=1, shuffle=self.shuffle)

Gradient accumulation with the distributed trainer #3537

Gradient accumulation with the distributed trainer #3537

Conversation

dirkgr commented Dec 17, 2019

brendan-ai2 Dec 17, 2019

Choose a reason for hiding this comment

brendan-ai2 Dec 17, 2019

Choose a reason for hiding this comment

dirkgr commented Dec 17, 2019

brendan-ai2 left a comment

Choose a reason for hiding this comment