FairScale integration #5242

epwalsh · 2021-06-03T20:43:58Z

Still TODO ☑️ 👇

(stars indicate relative difficulty / complexity / how much I'm dreading this item)

For reviewers of this PR, I would suggest you start by looking at the new functionality provided in allennlp/nn/parallel/ and allennlp/nn/checkpoint/. Then look at allennlp/modules/transformer/t5.py to see how these features are integrated into a model. The training config in the models PR is a complete example of how to specify these options in a config.

epwalsh · 2021-06-29T22:26:47Z

@jacobdanovitch you might find this interesting!

dirkgr

100 lines of code for distributed training. 1400 lines for checkpointing. I see why you gave it all those stars.

I'm not a big fan of those wrappers, or wrapper factories. They change the model init API in unintuitive ways, they are annoying to pass around, they mess with serialization. Is there some way we could do this from outside the model, cleanly? For example, what if we gave regexes that tell the trainer which modules to wrap? Or maybe an API where the model can optionally return a list of modules that it would like wrapped by the trainer during initialization? What scenarios would be broken if we had that approach?

I also seem to remember you saying that this wrapper factory approach mirrors the approach that FairScale took. Can you point me to some examples of that?

dirkgr · 2021-07-01T01:08:34Z

CHANGELOG.md

+- The type of the `grad_norm` parameter of `GradientDescentTrainer` is now `Union[float, bool]`,
+  with a default value of `False`. `False` means gradients are not rescaled and the gradient
+  norm is never even calculated. `True` means the gradients are still not rescaled but the gradient
+  norm is calculated and passed on to callbacks. A `float` value means gradients are rescaled.


I haven't seen the code yet, but I'm not too wild about this API. That means you have to know whether some other component needs the gradient norm or not. I'd rather provide a function called get_grad_norm() or something like that, which calculates it lazily.

README.md

allennlp/models/model.py

allennlp/nn/module.py

allennlp/nn/parallel/ddp_wrapper.py

allennlp/nn/util.py

allennlp/training/checkpointer.py

allennlp/training/gradient_descent_trainer.py

dirkgr · 2021-07-01T20:53:17Z

allennlp/nn/parallel/ddp_wrapper.py

+        return amp.GradScaler()
+
+
+class DdpWrapper(Registrable):


This is not really a wrapper though. This is more like a wrapper factory. Is there any scenario where we would create this object and then wrap multiple models with it?

It's a terrible name. I've renamed it DdpAccelerator.

scripts/py2md.py

dirkgr · 2021-07-13T18:57:53Z

allennlp/models/model.py

-        "model", it gets specified as "ddp_wrapper" in the "distributed" part of the config, and is then
-        passed in to the model separately.
+        "model", it gets specified as "ddp_accelerator" in the "distributed" part of the config, and is then
+        passed in to the model automatically.


Why? Can we just pass it to the model and save ourselves one exception?

epwalsh added 30 commits April 27, 2021 11:17

start

a4d7165

fix up

8a82679

start

88f5206

fix up

ba8f1ef

Merge branch 'fairscale' of github.com:allenai/allennlp into fairscale

56998d2

DdpWrapper

a991275

generalize GradScaler

fdff8bb

OSS

178c689

fp16 params

43a86c2

idk

453e75a

undo CHANGELOG for now

5895bf6

revert API

5738bb4

refactor

4645a4b

CHANGELOG

dbf502f

fixes

91184c8

refactor

2548b0f

wrap modules

8a06ab7

refactor

d12c017

fix when no checkpointer

6d85e13

fix merge conflicts

c606451

fix loading

0571b47

fix cicular import issue

338028f

upgrade fairscale

8e803ae

Merge branch 'main' into fairscale

04dde82

fix race condition when extracting files with cached_path

3ad797d

fix merge conflicts

6d7593b

better logging

9d7730b

improve logging

7cc8f7d

fix

c7c856b

fix

f8c42a3

changelog clean up

d8fa9bb

epwalsh marked this pull request as ready for review June 29, 2021 22:24

epwalsh changed the title ~~[WIP] FairScale integration~~ FairScale integration Jun 29, 2021

epwalsh requested review from dirkgr and AkshitaB June 29, 2021 22:25

Merge branch 'main' into fairscale

6072a6a

dirkgr suggested changes Jul 1, 2021

View reviewed changes

epwalsh and others added 9 commits July 7, 2021 16:38

make fairscale a required dependency

afc81c6

rename 'get_grad_scaler' -> 'init_grad_scaler'

173828f

make hooks private methods

0bc1d19

make _post_load_state_dict pure

5258dc8

fix comment

7a130cb

use hardlink

3378a0c

rename DdpWrapper -> DdpAccelerator

7dcd9e9

Merge branch 'main' into fairscale

984ac6c

format fix

33496e2

epwalsh requested a review from dirkgr July 8, 2021 22:48

dirkgr linked an issue Jul 13, 2021 that may be closed by this pull request

Add option to set find_unused_parameters (if necessary?) #5228

Closed

dirkgr approved these changes Jul 13, 2021

View reviewed changes

epwalsh and others added 8 commits July 14, 2021 16:00

update FairScale to latest release

90757a9

fix GradientDescientTrainer.get_best_weights_path

b82f027

fix typo

50db06c

Merge branch 'main' into fairscale

920ef23

clarify docstring

a0a239e

Merge branch 'main' into fairscale

b62b0c3

update CHANGELOG

b84cf85

revert CI patch

2436671

epwalsh merged commit ca656fc into main Jul 19, 2021

epwalsh deleted the fairscale branch July 19, 2021 23:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FairScale integration #5242

FairScale integration #5242

epwalsh commented Jun 3, 2021 •

edited

Loading

epwalsh commented Jun 29, 2021

dirkgr left a comment

dirkgr Jul 1, 2021

dirkgr Jul 1, 2021

epwalsh Jul 8, 2021

dirkgr Jul 13, 2021

FairScale integration #5242

FairScale integration #5242

Conversation

epwalsh commented Jun 3, 2021 • edited Loading

epwalsh commented Jun 29, 2021

dirkgr left a comment

Choose a reason for hiding this comment

dirkgr Jul 1, 2021

Choose a reason for hiding this comment

dirkgr Jul 1, 2021

Choose a reason for hiding this comment

epwalsh Jul 8, 2021

Choose a reason for hiding this comment

dirkgr Jul 13, 2021

Choose a reason for hiding this comment

epwalsh commented Jun 3, 2021 •

edited

Loading