Fix regex capture for Megatron KD PP layer renaming by AAnoosheh · Pull Request #1355 · NVIDIA/Model-Optimizer

AAnoosheh · 2026-04-27T16:34:25Z

What does this PR do?

Type of change: Bug fix

Previously the regex we had looked for a dot after the integer layer number, but it might not exist sometimes.

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

Bug Fixes
- Improved detection and handling of pipeline-parallel layer indices in model submodule names so numeric layer identifiers at the end of names are recognized and safely replaced. This reduces accidental renaming of other numeric sequences and improves compatibility with varied naming conventions in distillation workflows.

Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>

copy-pr-bot · 2026-04-27T16:34:29Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-27T16:34:39Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6c5ad787-dd20-40f4-a8c6-c85a2b0c16eb

📥 Commits

Reviewing files that changed from the base of the PR and between eb73662 and c0312ef.

📒 Files selected for processing (1)

modelopt/torch/distill/plugins/megatron.py

📝 Walkthrough

Walkthrough

Updated _adjust_layer_index_for_pp to recognize pipeline-parallel layer indices that appear as a trailing dot-separated numeric segment in submodule names and to replace the matched dot-prefixed numeric segment (e.g., .12) when renaming.

Changes

Cohort / File(s)	Summary
Pipeline Parallelism Detection `modelopt/torch/distill/plugins/megatron.py`	Regex in `_adjust_layer_index_for_pp` broadened to match numeric layer indices followed by either a dot or end-of-string; renaming now substitutes the dot-prefixed matched segment rather than bare digits.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main change: fixing a regex used for Megatron knowledge distillation pipeline-parallel layer renaming.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	PR modifies only regex pattern matching and string replacement logic in megatron.py with no security-sensitive operations or anti-patterns detected against SECURITY.md requirements.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch aanoosheh/safer-megatron-pp-regex

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-27T16:37:57Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-27 20:43 UTC

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

modelopt/torch/distill/plugins/megatron.py (1)
166-176: ⚠️ Potential issue | 🟠 Major

Replace only the matched numeric segment, not all identical substrings.

At Line 175, submodule_name.replace(match.group(0), str(new_layer_idx)) replaces all occurrences of the matched digit sequence globally. With the new regex on Line 166 (now matching end-of-string indices), this corrupts unrelated numeric parts—for example, block12.layers.12 becomes block8.layers.8 (both 12s replaced instead of just the second one).

Use span-based replacement instead:
Fix
-    new_submodule_name = submodule_name.replace(match.group(0), str(new_layer_idx))
+    start, end = match.span(0)
+    new_submodule_name = f"{submodule_name[:start]}{new_layer_idx}{submodule_name[end:]}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/distill/plugins/megatron.py` around lines 166 - 176, The
replacement currently uses submodule_name.replace(match.group(0), ...) which
replaces all identical digit sequences; instead use the match span to replace
only the matched segment: get the start/end via match.span(), slice
submodule_name into prefix + new index string + suffix, and assign that to
new_submodule_name (keeping the existing logic that computes offset via
TransformerLayer._get_layer_offset(model_cfg) and new_layer_idx); this ensures
only the specific matched numeric part is changed rather than every occurrence
of match.group(0).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@modelopt/torch/distill/plugins/megatron.py`:
- Around line 166-176: The replacement currently uses
submodule_name.replace(match.group(0), ...) which replaces all identical digit
sequences; instead use the match span to replace only the matched segment: get
the start/end via match.span(), slice submodule_name into prefix + new index
string + suffix, and assign that to new_submodule_name (keeping the existing
logic that computes offset via TransformerLayer._get_layer_offset(model_cfg) and
new_layer_idx); this ensures only the specific matched numeric part is changed
rather than every occurrence of match.group(0).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: bc20afe5-7ae0-4bc1-a159-cf01547d97e6

📥 Commits

Reviewing files that changed from the base of the PR and between 5b41ba4 and eb73662.

📒 Files selected for processing (1)

modelopt/torch/distill/plugins/megatron.py

codecov · 2026-04-27T16:47:57Z

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.82%. Comparing base (1ec931c) to head (c0312ef).
⚠️ Report is 8 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/distill/plugins/megatron.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1355      +/-   ##
==========================================
+ Coverage   75.72%   75.82%   +0.10%     
==========================================
  Files         471      471              
  Lines       50375    50375              
==========================================
+ Hits        38146    38199      +53     
+ Misses      12229    12176      -53

Flag	Coverage Δ
examples	`41.60% <0.00%> (+0.98%)`	⬆️
gpu	`58.40% <0.00%> (-0.50%)`	⬇️
regression	`14.92% <0.00%> (+0.22%)`	⬆️
unit	`52.75% <0.00%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>

AAnoosheh · 2026-04-27T19:41:50Z

/ok to test c0312ef

Make regex capture better

eb73662

Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>

AAnoosheh requested a review from jenchen13 April 27, 2026 16:34

AAnoosheh self-assigned this Apr 27, 2026

AAnoosheh requested a review from a team as a code owner April 27, 2026 16:34

AAnoosheh requested a review from kinjalpatel27 April 27, 2026 16:34

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

jenchen13 approved these changes Apr 27, 2026

View reviewed changes

AAnoosheh enabled auto-merge (squash) April 27, 2026 17:12

AAnoosheh requested review from kevalmorabia97 and removed request for kinjalpatel27 April 27, 2026 17:12

kevalmorabia97 approved these changes Apr 27, 2026

View reviewed changes

AAnoosheh added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Apr 27, 2026

AAnoosheh disabled auto-merge April 27, 2026 17:20

Smarter string replacement

c0312ef

Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>

AAnoosheh enabled auto-merge (squash) April 27, 2026 17:25

AAnoosheh merged commit 6e08b13 into main Apr 27, 2026
47 checks passed

AAnoosheh deleted the aanoosheh/safer-megatron-pp-regex branch April 27, 2026 20:42

kevalmorabia97 mentioned this pull request May 4, 2026

[Cherry-pick] PRs #1352 #1351 #1330 #1354 #1355 #1360 #1342 #1324 #1340 #1368 #1373 #1359 #1361 #1325 #1369 #1370 #1371 #1385

Open

kevalmorabia97 added the cherry-pick-done Added by bot once PR is cherry-picked to the release branch label May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix regex capture for Megatron KD PP layer renaming#1355

Fix regex capture for Megatron KD PP layer renaming#1355
AAnoosheh merged 2 commits intomainfrom
aanoosheh/safer-megatron-pp-regex

AAnoosheh commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented Apr 27, 2026

Uh oh!

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

codecov Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

AAnoosheh commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

AAnoosheh commented Apr 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Apr 27, 2026

Uh oh!

coderabbitai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

AAnoosheh commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AAnoosheh commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading

github-actions Bot commented Apr 27, 2026 •

edited

Loading

codecov Bot commented Apr 27, 2026 •

edited

Loading