-
Notifications
You must be signed in to change notification settings - Fork 174
FairScale integration and T5-11B fine-tuning #271
Conversation
// AMP is currently unusably slow with t5-11b, which be due to a bug bug within | ||
// FairScale, but I'm not sure yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
local wandb_callback = { | ||
"type": "wandb", | ||
"project": "allennlp-t5", | ||
"entity": "allenai-team1", | ||
"watch_model": false, | ||
"summary_interval": 1, | ||
"should_log_parameter_statistics": false, | ||
"should_log_learning_rate": false, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we want this in a default config. Virtually everyone using this will have to change it.
Though I would be entirely in favor of having a commented-out wandb config in every jsonnet file.
[if !debug then "callbacks"]: [wandb_callback], | ||
}, | ||
"distributed": { | ||
"cuda_devices": if debug then [0, 1] else [0, 1, 2, 3, 4, 5, 6, 7], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use this kind of thing to get a variable number of GPUs:
https://github.com/allenai/allennlp-models/blob/main/training_config/vision/vilbert_vqa.jsonnet#L101
@@ -0,0 +1,125 @@ | |||
// =================== Configurable Settings ====================== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filename is confusing if this works for both 11b models and smaller ones.
Corresponding PR for allenai/allennlp#5242