FairScale integration and T5-11B fine-tuning #271

epwalsh · 2021-06-03T20:45:01Z

Corresponding PR for allenai/allennlp#5242

Makefile

requirements.txt

dirkgr · 2021-07-06T17:05:03Z

training_config/generation/t5_11b_cnn_dm.jsonnet

+// AMP is currently unusably slow with t5-11b, which be due to a bug bug within
+// FairScale, but I'm not sure yet.


dirkgr · 2021-07-06T17:48:08Z

training_config/generation/t5_11b_cnn_dm.jsonnet

+local wandb_callback = {
+    "type": "wandb",
+    "project": "allennlp-t5",
+    "entity": "allenai-team1",
+    "watch_model": false,
+    "summary_interval": 1,
+    "should_log_parameter_statistics": false,
+    "should_log_learning_rate": false,
+};


I don't know if we want this in a default config. Virtually everyone using this will have to change it.

Though I would be entirely in favor of having a commented-out wandb config in every jsonnet file.

dirkgr · 2021-07-06T17:50:12Z

training_config/generation/t5_11b_cnn_dm.jsonnet

+        [if !debug then "callbacks"]: [wandb_callback],
+    },
+    "distributed": {
+        "cuda_devices": if debug then [0, 1] else [0, 1, 2, 3, 4, 5, 6, 7],


I use this kind of thing to get a variable number of GPUs:
https://github.com/allenai/allennlp-models/blob/main/training_config/vision/vilbert_vqa.jsonnet#L101

dirkgr · 2021-07-06T17:50:56Z

training_config/generation/t5_11b_cnn_dm.jsonnet

@@ -0,0 +1,125 @@
+// =================== Configurable Settings ======================


The filename is confusing if this works for both 11b models and smaller ones.

epwalsh added 5 commits May 4, 2021 19:54

pass ddpwrapper

45f1957

add options to T5 model

16b2bbc

Merge branch 'main' into fairscale

dbb65e7

add weights_path param

a8ccf4c

Merge branch 'main' into fairscale

6a279b0

epwalsh mentioned this pull request Jun 3, 2021

FairScale integration allenai/allennlp#5242

Merged

11 tasks

epwalsh commented Jun 3, 2021

View reviewed changes

Makefile Outdated Show resolved Hide resolved

epwalsh commented Jun 3, 2021

View reviewed changes

requirements.txt Outdated Show resolved Hide resolved

epwalsh and others added 11 commits June 3, 2021 14:05

beam search as a parameter

742a53e

fix

2e5e939

Merge branch 'main' into fairscale

35ff7a1

CHANGELOG

91844b7

add checkpoint_wrapper arg

96b8cef

Merge branch 'main' into fairscale

99ac014

Merge branch 'main' into fairscale

ec0996a

ignore missing weights in state dict if tied

483668a

update

d0119bc

add improved config

8ba0294

update CHANGELOG

bd83917

epwalsh changed the title ~~[WIP] FairScale integration and T5-11B fine-tuning~~ FairScale integration and T5-11B fine-tuning Jun 29, 2021

Merge branch 'main' into fairscale

ac26ced

epwalsh marked this pull request as ready for review June 29, 2021 22:25

epwalsh requested review from dirkgr and AkshitaB June 29, 2021 22:25

dirkgr reviewed Jul 6, 2021

View reviewed changes

epwalsh added 4 commits July 7, 2021 16:12

address comments

b38094b

try fix dep

05017df

try fix again

a60c099

revert

f1fd124

epwalsh and others added 8 commits July 8, 2021 09:28

fix config

4d1c0a7

fix post load state dict hook

6f03e16

rename 'ddp_wrapper' -> 'ddp_accelerator'

748a83e

fix

5dc7f84

Merge branch 'main' into fairscale

27add0a

Merge branch 'main' into fairscale

864feee

update CHANGELOG

427553c

revert CI patch

657f333

epwalsh merged commit db0e21a into main Jul 19, 2021

epwalsh deleted the fairscale branch July 19, 2021 23:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FairScale integration and T5-11B fine-tuning #271

FairScale integration and T5-11B fine-tuning #271

epwalsh commented Jun 3, 2021

dirkgr Jul 6, 2021

dirkgr Jul 6, 2021

dirkgr Jul 6, 2021

dirkgr Jul 6, 2021

		// AMP is currently unusably slow with t5-11b, which be due to a bug bug within
		// FairScale, but I'm not sure yet.

		@@ -0,0 +1,125 @@
		// =================== Configurable Settings ======================

FairScale integration and T5-11B fine-tuning #271

FairScale integration and T5-11B fine-tuning #271

Conversation

epwalsh commented Jun 3, 2021

dirkgr Jul 6, 2021

Choose a reason for hiding this comment

dirkgr Jul 6, 2021

Choose a reason for hiding this comment

dirkgr Jul 6, 2021

Choose a reason for hiding this comment

dirkgr Jul 6, 2021

Choose a reason for hiding this comment