fix: PTQ 1GPU, export PP divisibility, hidden states conversations key by ChenhanYu · Pull Request #1293 · NVIDIA/Model-Optimizer

ChenhanYu · 2026-04-18T22:27:56Z

Summary

megatron_lm_ptq.yaml: Qwen3-8B PTQ to single GPU for L40 clusters (TP=1, all tasks)
quantize.sh: Auto-find largest PP dividing model's num_hidden_layers for export step. Qwen3-8B has 36 layers which isn't divisible by 8, causing AssertionError on 8-GPU nodes
compute_hidden_states_trtllm.py: Use messages with conversations fallback, matching the HF version. Fixes KeyError: 'conversations' when data uses OpenAI messages format

Test plan

Qwen3-8B PTQ runs on single L40 GPU
Export PP auto-selects valid divisor (36 layers → PP=6 on 8 GPUs, PP=4 on 4 GPUs, PP=1 on 1 GPU)
EAGLE3 offline pipeline reads data with messages field

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Dataset input handling now supports multiple field formats for enhanced compatibility.
Bug Fixes
- Optimized GPU resource allocation during model quantization with improved pipeline parallelism computation.
- Updated quantization configuration for more efficient resource utilization.

- megatron_lm_ptq.yaml: Qwen3-8B to single GPU for L40 clusters - quantize.sh: auto-find largest PP dividing model num_hidden_layers for export (Qwen3-8B has 36 layers, not divisible by 8) - compute_hidden_states_trtllm.py: use messages with conversations fallback (matching the HF version) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Chenhan Yu <chenhany@nvidia.com>

copy-pr-bot · 2026-04-18T22:28:01Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-04-18T22:28:03Z

📝 Walkthrough

Walkthrough

This PR modifies three areas: a Python dataset processing script to check the messages field before conversations when extracting chat prompts, a quantization shell script to dynamically compute pipeline parallel size based on model config rather than using all GPUs, and a YAML configuration that reduces tensor-parallel settings from 8 to 1 across quantization and evaluation tasks.

Changes

Cohort / File(s)	Summary
Data Processing `examples/speculative_decoding/collect_hidden_states/compute_hidden_states_trtllm.py`	Modified dataset entry field extraction to check `entry.get("messages")` first before falling back to `entry.get("conversations")` when building chat prompts.
Quantization Pipeline `tools/launcher/common/megatron_lm/quantize/quantize.sh`	Added dynamic pipeline parallel (PP) size computation during checkpoint export; reads `num_hidden_layers` from model config JSON and selects largest `pp` value where `num_hidden_layers % pp == 0`, falling back to `TOTAL_GPUS` if read fails.
Configuration `tools/launcher/examples/Qwen/Qwen3-8B/megatron_lm_ptq.yaml`	Reduced tensor-parallel (TP) settings from 8 to 1 across all pipeline steps with corresponding Slurm resource adjustments (`ntasks_per_node` and `gpus_per_node` reduced from 8 to 1).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title comprehensively references all three main changes: PTQ 1GPU configuration, export PP divisibility logic, and hidden states conversations key handling.
Security Anti-Patterns	✅ Passed	PR changes do not introduce any security anti-patterns from SECURITY.md; Python modifications use safe dict.get() fallback, shell script uses safe subprocess practices, YAML contains only configuration adjustments, and no new dependencies or nosec comments were added.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chenhany/fix-ptq-export-conversations

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-18T22:31:18Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-20 19:44 UTC

codecov · 2026-04-18T22:40:32Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.33%. Comparing base (e9a4989) to head (6e456b0).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1293      +/-   ##
==========================================
+ Coverage   76.61%   77.33%   +0.71%     
==========================================
  Files         459      461       +2     
  Lines       49153    49524     +371     
==========================================
+ Hits        37661    38300     +639     
+ Misses      11492    11224     -268

Flag	Coverage Δ
examples	`41.88% <ø> (-0.15%)`	⬇️
regression	`14.98% <ø> (+0.07%)`	⬆️
unit	`52.95% <ø> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tools/launcher/common/megatron_lm/quantize/quantize.sh`:
- Around line 46-59: TOTAL_GPUS and EXPORT_PP can be zero/unvalidated causing an
invalid PP=0 during the export invocation; before using them (the TOTAL_GPUS
assignment and the EXPORT_PP calculation plus the final export invocation that
sets TP/PP/EP/ETP), validate and clamp both to >=1 (and optionally to NUM_GPUS
if provided) and fallback to 1 on errors; update the python one-liners or add a
small shell check after them to coerce non-numeric or zero values to 1 and log
the corrected values so the final export invocation always receives a valid PP
(and TOTAL_GPUS) >=1.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 6f03ef3f-5620-445e-9c40-3500d0b3c23a

📥 Commits

Reviewing files that changed from the base of the PR and between e9a4989 and 6e456b0.

📒 Files selected for processing (3)

examples/speculative_decoding/collect_hidden_states/compute_hidden_states_trtllm.py
tools/launcher/common/megatron_lm/quantize/quantize.sh
tools/launcher/examples/Qwen/Qwen3-8B/megatron_lm_ptq.yaml

coderabbitai · 2026-04-20T15:59:04Z

 TOTAL_GPUS=$(python3 -c "import torch; print(torch.cuda.device_count())" 2>/dev/null || echo ${NUM_GPUS:-1})
-echo "=== Exporting ${MLM_MODEL_CFG} ${QUANT_CFG} (PP=${TOTAL_GPUS}) ==="
+EXPORT_PP=$(python3 -c "
+import json, os
+cfg = os.path.join('${HF_MODEL_CKPT}', 'config.json')
+n_layers = json.load(open(cfg)).get('num_hidden_layers', 1) if os.path.exists(cfg) else 1
+gpus = ${TOTAL_GPUS}
+pp = gpus
+while pp > 1 and n_layers % pp != 0:
+    pp -= 1
+print(pp)
+" 2>/dev/null || echo ${TOTAL_GPUS})
+echo "=== Exporting ${MLM_MODEL_CFG} ${QUANT_CFG} (PP=${EXPORT_PP}, ${TOTAL_GPUS} GPUs) ==="
 export MLM_EXTRA_ARGS=
-TP=1 PP=${TOTAL_GPUS} EP=1 ETP=1 MLM_MODEL_CKPT=${MLM_MODEL_SAVE} ${EXPORT_EXE} ${MLM_MODEL_CFG}
+TP=1 PP=${EXPORT_PP} EP=1 ETP=1 MLM_MODEL_CKPT=${MLM_MODEL_SAVE} ${EXPORT_EXE} ${MLM_MODEL_CFG}


⚠️ Potential issue | 🟠 Major

Validate and clamp GPU/PP values before export invocation.

At Line 46, torch.cuda.device_count() can return 0; then Line 55 can emit EXPORT_PP=0, and Line 59 runs export with invalid PP=0. Also, the values are used without numeric validation.

🔧 Suggested hardening patch

-TOTAL_GPUS=$(python3 -c "import torch; print(torch.cuda.device_count())" 2>/dev/null || echo ${NUM_GPUS:-1}) +TOTAL_GPUS=$(python3 -c "import torch; print(torch.cuda.device_count())" 2>/dev/null || echo "${NUM_GPUS:-1}") +if ! [[ "${TOTAL_GPUS}" =~ ^[0-9]+$ ]] || [[ "${TOTAL_GPUS}" -lt 1 ]]; then + TOTAL_GPUS=1 +fi EXPORT_PP=$(python3 -c " import json, os cfg = os.path.join('${HF_MODEL_CKPT}', 'config.json') n_layers = json.load(open(cfg)).get('num_hidden_layers', 1) if os.path.exists(cfg) else 1 -gpus = ${TOTAL_GPUS} +gpus = int('${TOTAL_GPUS}') pp = gpus while pp > 1 and n_layers % pp != 0: pp -= 1 print(pp) -" 2>/dev/null || echo ${TOTAL_GPUS}) +" 2>/dev/null || echo "${TOTAL_GPUS}") echo "=== Exporting ${MLM_MODEL_CFG} ${QUANT_CFG} (PP=${EXPORT_PP}, ${TOTAL_GPUS} GPUs) ===" export MLM_EXTRA_ARGS= -TP=1 PP=${EXPORT_PP} EP=1 ETP=1 MLM_MODEL_CKPT=${MLM_MODEL_SAVE} ${EXPORT_EXE} ${MLM_MODEL_CFG} +TP=1 PP="${EXPORT_PP}" EP=1 ETP=1 MLM_MODEL_CKPT="${MLM_MODEL_SAVE}" ${EXPORT_EXE} "${MLM_MODEL_CFG}"

🧰 Tools

🪛 Shellcheck (0.11.0)

[info] 46-46: Double quote to prevent globbing and word splitting.

(SC2086)

[info] 56-56: Double quote to prevent globbing and word splitting.

(SC2086)

[info] 59-59: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/launcher/common/megatron_lm/quantize/quantize.sh` around lines 46 - 59, TOTAL_GPUS and EXPORT_PP can be zero/unvalidated causing an invalid PP=0 during the export invocation; before using them (the TOTAL_GPUS assignment and the EXPORT_PP calculation plus the final export invocation that sets TP/PP/EP/ETP), validate and clamp both to >=1 (and optionally to NUM_GPUS if provided) and fallback to 1 on errors; update the python one-liners or add a small shell check after them to coerce non-numeric or zero values to 1 and log the corrected values so the final export invocation always receives a valid PP (and TOTAL_GPUS) >=1.

#1293) ## Summary - **megatron_lm_ptq.yaml**: Qwen3-8B PTQ to single GPU for L40 clusters (TP=1, all tasks) - **quantize.sh**: Auto-find largest PP dividing model's `num_hidden_layers` for export step. Qwen3-8B has 36 layers which isn't divisible by 8, causing `AssertionError` on 8-GPU nodes - **compute_hidden_states_trtllm.py**: Use `messages` with `conversations` fallback, matching the HF version. Fixes `KeyError: 'conversations'` when data uses OpenAI `messages` format ## Test plan - [x] Qwen3-8B PTQ runs on single L40 GPU - [x] Export PP auto-selects valid divisor (36 layers → PP=6 on 8 GPUs, PP=4 on 4 GPUs, PP=1 on 1 GPU) - [x] EAGLE3 offline pipeline reads data with `messages` field 🤖 Generated with [Claude Code](https://claude.com/claude-code)  ## Summary by CodeRabbit * **New Features** * Dataset input handling now supports multiple field formats for enhanced compatibility. * **Bug Fixes** * Optimized GPU resource allocation during model quantization with improved pipeline parallelism computation. * Updated quantization configuration for more efficient resource utilization.  Signed-off-by: Chenhan Yu <chenhany@nvidia.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ChenhanYu marked this pull request as ready for review April 20, 2026 15:53

ChenhanYu requested a review from a team as a code owner April 20, 2026 15:53

ChenhanYu requested a review from yeyu-nvidia April 20, 2026 15:53

coderabbitai Bot reviewed Apr 20, 2026

View reviewed changes

yeyu-nvidia approved these changes Apr 20, 2026

View reviewed changes

ChenhanYu merged commit 355c6b7 into main Apr 20, 2026
44 checks passed

ChenhanYu deleted the chenhany/fix-ptq-export-conversations branch April 20, 2026 19:44

kevalmorabia97 added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Apr 21, 2026

kevalmorabia97 added the cherry-pick-done Added by bot once PR is cherry-picked to the release branch label Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: PTQ 1GPU, export PP divisibility, hidden states conversations key#1293

fix: PTQ 1GPU, export PP divisibility, hidden states conversations key#1293
ChenhanYu merged 1 commit intomainfrom
chenhany/fix-ptq-export-conversations

ChenhanYu commented Apr 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented Apr 18, 2026

Uh oh!

coderabbitai Bot commented Apr 18, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Apr 18, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 18, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ChenhanYu commented Apr 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Apr 18, 2026

Uh oh!

coderabbitai Bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ChenhanYu commented Apr 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 18, 2026 •

edited

Loading

github-actions Bot commented Apr 18, 2026 •

edited

Loading

codecov Bot commented Apr 18, 2026 •

edited

Loading