fix: PTQ 1GPU, export PP divisibility, hidden states conversations key#1293
fix: PTQ 1GPU, export PP divisibility, hidden states conversations key#1293
Conversation
- megatron_lm_ptq.yaml: Qwen3-8B to single GPU for L40 clusters - quantize.sh: auto-find largest PP dividing model num_hidden_layers for export (Qwen3-8B has 36 layers, not divisible by 8) - compute_hidden_states_trtllm.py: use messages with conversations fallback (matching the HF version) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Chenhan Yu <chenhany@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughWalkthroughThis PR modifies three areas: a Python dataset processing script to check the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1293 +/- ##
==========================================
+ Coverage 76.61% 77.33% +0.71%
==========================================
Files 459 461 +2
Lines 49153 49524 +371
==========================================
+ Hits 37661 38300 +639
+ Misses 11492 11224 -268
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tools/launcher/common/megatron_lm/quantize/quantize.sh`:
- Around line 46-59: TOTAL_GPUS and EXPORT_PP can be zero/unvalidated causing an
invalid PP=0 during the export invocation; before using them (the TOTAL_GPUS
assignment and the EXPORT_PP calculation plus the final export invocation that
sets TP/PP/EP/ETP), validate and clamp both to >=1 (and optionally to NUM_GPUS
if provided) and fallback to 1 on errors; update the python one-liners or add a
small shell check after them to coerce non-numeric or zero values to 1 and log
the corrected values so the final export invocation always receives a valid PP
(and TOTAL_GPUS) >=1.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 6f03ef3f-5620-445e-9c40-3500d0b3c23a
📒 Files selected for processing (3)
examples/speculative_decoding/collect_hidden_states/compute_hidden_states_trtllm.pytools/launcher/common/megatron_lm/quantize/quantize.shtools/launcher/examples/Qwen/Qwen3-8B/megatron_lm_ptq.yaml
| TOTAL_GPUS=$(python3 -c "import torch; print(torch.cuda.device_count())" 2>/dev/null || echo ${NUM_GPUS:-1}) | ||
| echo "=== Exporting ${MLM_MODEL_CFG} ${QUANT_CFG} (PP=${TOTAL_GPUS}) ===" | ||
| EXPORT_PP=$(python3 -c " | ||
| import json, os | ||
| cfg = os.path.join('${HF_MODEL_CKPT}', 'config.json') | ||
| n_layers = json.load(open(cfg)).get('num_hidden_layers', 1) if os.path.exists(cfg) else 1 | ||
| gpus = ${TOTAL_GPUS} | ||
| pp = gpus | ||
| while pp > 1 and n_layers % pp != 0: | ||
| pp -= 1 | ||
| print(pp) | ||
| " 2>/dev/null || echo ${TOTAL_GPUS}) | ||
| echo "=== Exporting ${MLM_MODEL_CFG} ${QUANT_CFG} (PP=${EXPORT_PP}, ${TOTAL_GPUS} GPUs) ===" | ||
| export MLM_EXTRA_ARGS= | ||
| TP=1 PP=${TOTAL_GPUS} EP=1 ETP=1 MLM_MODEL_CKPT=${MLM_MODEL_SAVE} ${EXPORT_EXE} ${MLM_MODEL_CFG} | ||
| TP=1 PP=${EXPORT_PP} EP=1 ETP=1 MLM_MODEL_CKPT=${MLM_MODEL_SAVE} ${EXPORT_EXE} ${MLM_MODEL_CFG} |
There was a problem hiding this comment.
Validate and clamp GPU/PP values before export invocation.
At Line 46, torch.cuda.device_count() can return 0; then Line 55 can emit EXPORT_PP=0, and Line 59 runs export with invalid PP=0. Also, the values are used without numeric validation.
🔧 Suggested hardening patch
-TOTAL_GPUS=$(python3 -c "import torch; print(torch.cuda.device_count())" 2>/dev/null || echo ${NUM_GPUS:-1})
+TOTAL_GPUS=$(python3 -c "import torch; print(torch.cuda.device_count())" 2>/dev/null || echo "${NUM_GPUS:-1}")
+if ! [[ "${TOTAL_GPUS}" =~ ^[0-9]+$ ]] || [[ "${TOTAL_GPUS}" -lt 1 ]]; then
+ TOTAL_GPUS=1
+fi
EXPORT_PP=$(python3 -c "
import json, os
cfg = os.path.join('${HF_MODEL_CKPT}', 'config.json')
n_layers = json.load(open(cfg)).get('num_hidden_layers', 1) if os.path.exists(cfg) else 1
-gpus = ${TOTAL_GPUS}
+gpus = int('${TOTAL_GPUS}')
pp = gpus
while pp > 1 and n_layers % pp != 0:
pp -= 1
print(pp)
-" 2>/dev/null || echo ${TOTAL_GPUS})
+" 2>/dev/null || echo "${TOTAL_GPUS}")
echo "=== Exporting ${MLM_MODEL_CFG} ${QUANT_CFG} (PP=${EXPORT_PP}, ${TOTAL_GPUS} GPUs) ==="
export MLM_EXTRA_ARGS=
-TP=1 PP=${EXPORT_PP} EP=1 ETP=1 MLM_MODEL_CKPT=${MLM_MODEL_SAVE} ${EXPORT_EXE} ${MLM_MODEL_CFG}
+TP=1 PP="${EXPORT_PP}" EP=1 ETP=1 MLM_MODEL_CKPT="${MLM_MODEL_SAVE}" ${EXPORT_EXE} "${MLM_MODEL_CFG}"🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 46-46: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 56-56: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 59-59: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/launcher/common/megatron_lm/quantize/quantize.sh` around lines 46 - 59,
TOTAL_GPUS and EXPORT_PP can be zero/unvalidated causing an invalid PP=0 during
the export invocation; before using them (the TOTAL_GPUS assignment and the
EXPORT_PP calculation plus the final export invocation that sets TP/PP/EP/ETP),
validate and clamp both to >=1 (and optionally to NUM_GPUS if provided) and
fallback to 1 on errors; update the python one-liners or add a small shell check
after them to coerce non-numeric or zero values to 1 and log the corrected
values so the final export invocation always receives a valid PP (and
TOTAL_GPUS) >=1.
#1293) ## Summary - **megatron_lm_ptq.yaml**: Qwen3-8B PTQ to single GPU for L40 clusters (TP=1, all tasks) - **quantize.sh**: Auto-find largest PP dividing model's `num_hidden_layers` for export step. Qwen3-8B has 36 layers which isn't divisible by 8, causing `AssertionError` on 8-GPU nodes - **compute_hidden_states_trtllm.py**: Use `messages` with `conversations` fallback, matching the HF version. Fixes `KeyError: 'conversations'` when data uses OpenAI `messages` format ## Test plan - [x] Qwen3-8B PTQ runs on single L40 GPU - [x] Export PP auto-selects valid divisor (36 layers → PP=6 on 8 GPUs, PP=4 on 4 GPUs, PP=1 on 1 GPU) - [x] EAGLE3 offline pipeline reads data with `messages` field 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Dataset input handling now supports multiple field formats for enhanced compatibility. * **Bug Fixes** * Optimized GPU resource allocation during model quantization with improved pipeline parallelism computation. * Updated quantization configuration for more efficient resource utilization. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Chenhan Yu <chenhany@nvidia.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
num_hidden_layersfor export step. Qwen3-8B has 36 layers which isn't divisible by 8, causingAssertionErroron 8-GPU nodesmessageswithconversationsfallback, matching the HF version. FixesKeyError: 'conversations'when data uses OpenAImessagesformatTest plan
messagesfield🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes