Conversation
…ersion (#1120) Signed-off-by: Stan Kirdey <stan@inflection.ai> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
📝 WalkthroughWalkthroughAdds a guarded import for temporary_distributed_context and wraps Megatron model load-and-save operations inside temporary_distributed_context(backend="gloo") to ensure CPU-based export within a distributed context. Preserves overwrite checks and state reset. No public API changes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as Caller
participant CI as community_import.py
participant TDC as temporary_distributed_context
participant MG as Megatron Loader
participant HF as HF Saver
U->>CI: export(model_path, output_path, overwrite)
alt temporary_distributed_context importable
CI->>TDC: with backend="gloo"
activate TDC
TDC-->>CI: distributed context active
CI->>MG: load_model(skip_temp_dist_context=True)
MG-->>CI: model
CI->>HF: save_to_hf(model, output_path)
HF-->>CI: done
CI->>CI: reset mcore state
deactivate TDC
CI-->>U: success
else Import missing
CI-->>U: raise ImportError("temporary_distributed_context not available")
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
nemo_rl/models/megatron/community_import.py (1)
107-112: Preserve the original ImportError contextRe-raising a fresh
ImportErrorhere hides the original cause, making debugging harder (Ruff B904). Chain the original exception when raising the custom message.- except ImportError: - raise ImportError("megatron.bridge.training is not available.") + except ImportError as err: + raise ImportError("megatron.bridge.training is not available.") from err
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
nemo_rl/models/megatron/community_import.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts
Files:
nemo_rl/models/megatron/community_import.py
nemo_rl/**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)
Files:
nemo_rl/models/megatron/community_import.py
🪛 Ruff (0.13.3)
nemo_rl/models/megatron/community_import.py
112-112: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
112-112: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: CI quality check
- GitHub Check: Lint check
- GitHub Check: Post submodule check comment / Comment on PR
- GitHub Check: Post automodel integration comment / Comment on PR
beep boop [🤖]: Hi @skirdey-inflection 👋,
Summary by CodeRabbit