Skip to content

Fix regex capture for Megatron KD PP layer renaming#1355

Merged
AAnoosheh merged 2 commits intomainfrom
aanoosheh/safer-megatron-pp-regex
Apr 27, 2026
Merged

Fix regex capture for Megatron KD PP layer renaming#1355
AAnoosheh merged 2 commits intomainfrom
aanoosheh/safer-megatron-pp-regex

Conversation

@AAnoosheh
Copy link
Copy Markdown
Contributor

@AAnoosheh AAnoosheh commented Apr 27, 2026

What does this PR do?

Type of change: Bug fix

Previously the regex we had looked for a dot after the integer layer number, but it might not exist sometimes.

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

  • Bug Fixes
    • Improved detection and handling of pipeline-parallel layer indices in model submodule names so numeric layer identifiers at the end of names are recognized and safely replaced. This reduces accidental renaming of other numeric sequences and improves compatibility with varied naming conventions in distillation workflows.

Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
@AAnoosheh AAnoosheh requested a review from jenchen13 April 27, 2026 16:34
@AAnoosheh AAnoosheh self-assigned this Apr 27, 2026
@AAnoosheh AAnoosheh requested a review from a team as a code owner April 27, 2026 16:34
@AAnoosheh AAnoosheh requested a review from kinjalpatel27 April 27, 2026 16:34
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 27, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6c5ad787-dd20-40f4-a8c6-c85a2b0c16eb

📥 Commits

Reviewing files that changed from the base of the PR and between eb73662 and c0312ef.

📒 Files selected for processing (1)
  • modelopt/torch/distill/plugins/megatron.py

📝 Walkthrough

Walkthrough

Updated _adjust_layer_index_for_pp to recognize pipeline-parallel layer indices that appear as a trailing dot-separated numeric segment in submodule names and to replace the matched dot-prefixed numeric segment (e.g., .12) when renaming.

Changes

Cohort / File(s) Summary
Pipeline Parallelism Detection
modelopt/torch/distill/plugins/megatron.py
Regex in _adjust_layer_index_for_pp broadened to match numeric layer indices followed by either a dot or end-of-string; renaming now substitutes the dot-prefixed matched segment rather than bare digits.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: fixing a regex used for Megatron knowledge distillation pipeline-parallel layer renaming.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed PR modifies only regex pattern matching and string replacement logic in megatron.py with no security-sensitive operations or anti-patterns detected against SECURITY.md requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch aanoosheh/safer-megatron-pp-regex

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 27, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-27 20:43 UTC

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modelopt/torch/distill/plugins/megatron.py (1)

166-176: ⚠️ Potential issue | 🟠 Major

Replace only the matched numeric segment, not all identical substrings.

At Line 175, submodule_name.replace(match.group(0), str(new_layer_idx)) replaces all occurrences of the matched digit sequence globally. With the new regex on Line 166 (now matching end-of-string indices), this corrupts unrelated numeric parts—for example, block12.layers.12 becomes block8.layers.8 (both 12s replaced instead of just the second one).

Use span-based replacement instead:

Fix
-    new_submodule_name = submodule_name.replace(match.group(0), str(new_layer_idx))
+    start, end = match.span(0)
+    new_submodule_name = f"{submodule_name[:start]}{new_layer_idx}{submodule_name[end:]}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/distill/plugins/megatron.py` around lines 166 - 176, The
replacement currently uses submodule_name.replace(match.group(0), ...) which
replaces all identical digit sequences; instead use the match span to replace
only the matched segment: get the start/end via match.span(), slice
submodule_name into prefix + new index string + suffix, and assign that to
new_submodule_name (keeping the existing logic that computes offset via
TransformerLayer._get_layer_offset(model_cfg) and new_layer_idx); this ensures
only the specific matched numeric part is changed rather than every occurrence
of match.group(0).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@modelopt/torch/distill/plugins/megatron.py`:
- Around line 166-176: The replacement currently uses
submodule_name.replace(match.group(0), ...) which replaces all identical digit
sequences; instead use the match span to replace only the matched segment: get
the start/end via match.span(), slice submodule_name into prefix + new index
string + suffix, and assign that to new_submodule_name (keeping the existing
logic that computes offset via TransformerLayer._get_layer_offset(model_cfg) and
new_layer_idx); this ensures only the specific matched numeric part is changed
rather than every occurrence of match.group(0).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: bc20afe5-7ae0-4bc1-a159-cf01547d97e6

📥 Commits

Reviewing files that changed from the base of the PR and between 5b41ba4 and eb73662.

📒 Files selected for processing (1)
  • modelopt/torch/distill/plugins/megatron.py

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 27, 2026

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.82%. Comparing base (1ec931c) to head (c0312ef).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/distill/plugins/megatron.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1355      +/-   ##
==========================================
+ Coverage   75.72%   75.82%   +0.10%     
==========================================
  Files         471      471              
  Lines       50375    50375              
==========================================
+ Hits        38146    38199      +53     
+ Misses      12229    12176      -53     
Flag Coverage Δ
examples 41.60% <0.00%> (+0.98%) ⬆️
gpu 58.40% <0.00%> (-0.50%) ⬇️
regression 14.92% <0.00%> (+0.22%) ⬆️
unit 52.75% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@AAnoosheh AAnoosheh enabled auto-merge (squash) April 27, 2026 17:12
@AAnoosheh AAnoosheh requested review from kevalmorabia97 and removed request for kinjalpatel27 April 27, 2026 17:12
@AAnoosheh AAnoosheh added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Apr 27, 2026
@AAnoosheh AAnoosheh disabled auto-merge April 27, 2026 17:20
Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
@AAnoosheh AAnoosheh enabled auto-merge (squash) April 27, 2026 17:25
@AAnoosheh
Copy link
Copy Markdown
Contributor Author

/ok to test c0312ef

@AAnoosheh AAnoosheh merged commit 6e08b13 into main Apr 27, 2026
47 checks passed
@AAnoosheh AAnoosheh deleted the aanoosheh/safer-megatron-pp-regex branch April 27, 2026 20:42
@kevalmorabia97 kevalmorabia97 added the cherry-pick-done Added by bot once PR is cherry-picked to the release branch label May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc cherry-pick-done Added by bot once PR is cherry-picked to the release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants