make sure we initialize accelerator before model#1132
Conversation
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
WalkthroughAdds a new Hydra YAML config for an esm2_t48_15B_UR50D performance test and moves early Accelerate initialization into train.py main() before model construction; removes the post-creation accelerator state log. No other training, dataset, or evaluation logic changed. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CLI as CLI/Launcher
participant Train as train.py
participant Accel as Accelerate
participant Model as Model/Config
participant Trainer as Trainer/Loop
participant Logger as Logger
CLI->>Train: start main()
Train->>Accel: create PartialState()/Accelerator (early)
Accel-->>Train: provides device, process info
Train->>Logger: log local_process_index, num_processes, device
Train->>Model: build model/config on assigned device
Train->>Trainer: configure dataloaders, optimizer, etc.
Trainer->>Accel: prepare components (wrap with accelerator)
Trainer->>Trainer: run training/eval (stop_after_n_steps)
Note over Logger,Trainer: removed post-creation accelerator state print
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested reviewers
Pre-merge checks (2 passed, 1 warning)❌ Failed Checks (1 warning)
✅ Passed Checks (2 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs. 📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
recipes/esm2_accelerate/train.py (2)
39-42: Early Accelerate init fixes device placement; consider PartialState instead.Creating an unused Accelerator just to set the current device works but is heavier than needed. PartialState initializes the distributed state and sets torch device without constructing a full Accelerator instance.
Apply this diff:
- # We need to initialize the Accelerator manually prior to creating our model, otherwise we won't end up setting the - # current torch device and the model creation will all happen on a single GPU, typically leading to an OOM. - _ = Accelerator() + # Initialize Accelerate's distributed state early so torch device is set per process + state = PartialState() + logger.info( + "Accelerate initialized (local_process_index=%s, num_processes=%s, device=%s)", + state.local_process_index, + state.num_processes, + state.device, + )Note: See the import change on Line 22.
22-22: Import PartialState (lighter) instead of Accelerator (unused).Avoid constructing an unused Accelerator instance; use PartialState for early device init.
Apply this diff:
-from accelerate import Accelerator +from accelerate import PartialStaterecipes/esm2_accelerate/hydra_config/L1_15B_perf_test.yaml (2)
12-14: Use an int for warmup_steps (drop underscore).Some YAML parsers/OmegaConf setups won’t accept numeric separators; “20_000” can become a string and break TrainingArguments’ int validation.
Apply this diff:
- warmup_steps: 20_000 + warmup_steps: 20000
7-14: Optional: align Trainer precision with model dtype.Model is created in bfloat16; consider setting bf16: true (and tf32: true on Ampere/Hopper) for better perf and consistent autocast during training.
Example:
trainer: run_name: "esm2_t48_15B_UR50D_perf" per_device_train_batch_size: 12 per_device_eval_batch_size: 12 report_to: "wandb" learning_rate: 1.6e-4 weight_decay: 0.1 - warmup_steps: 20000 + warmup_steps: 20000 + bf16: true + tf32: true
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
recipes/esm2_accelerate/hydra_config/L1_15B_perf_test.yaml(1 hunks)recipes/esm2_accelerate/train.py(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (rust)
🔇 Additional comments (2)
recipes/esm2_accelerate/train.py (1)
22-22: Dependency confirmed: accelerate declared in recipes/esm2_accelerate/requirements.txt No further action needed.recipes/esm2_accelerate/hydra_config/L1_15B_perf_test.yaml (1)
1-3: defaults.yaml present Verified thatrecipes/esm2_accelerate/hydra_config/defaults.yamlexists and resolves correctly.
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
|
/ok to test 710e405 |
We need to initialize the
Acceleratorobject before creating TE layers or they all end up on a single deviceSummary by CodeRabbit
New Features
Bug Fixes