Skip to content

GLM-4.7 MTP support#792

Merged
cjluo-nv merged 9 commits intomainfrom
zhiyu/glm-4.7-mtp-support
Feb 4, 2026
Merged

GLM-4.7 MTP support#792
cjluo-nv merged 9 commits intomainfrom
zhiyu/glm-4.7-mtp-support

Conversation

@Edwardf0t1
Copy link
Contributor

@Edwardf0t1 Edwardf0t1 commented Jan 16, 2026

What does this PR do?

Type of change: ?

Overview: Enable GLM-4.7 PTQ workflow, including loading the standalone MTP modules and export as-is.

Usage

python3 hf_ptq.py --pyt_ckpt_path /home/omniml_data_3/models/GLM-4.7 --qformat nvfp4_mlp_only --export_path /home/omniml_data_3/zhiyuc/checkpoints/GLM-4.7-NVFP4-0203 --trust_remote_code

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes

Additional Information

Summary by CodeRabbit

  • New Features

    • Added quantization support for GLM-4.7 model with automatic handling of specialized layer architecture.
    • Added image-text data calibration capabilities for Nemotron VL model quantization.
  • Documentation

    • Updated support matrix to reflect newly supported models and quantization features.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 16, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@codecov
Copy link

codecov bot commented Jan 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.73%. Comparing base (e024097) to head (3bda6d8).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #792      +/-   ##
==========================================
- Coverage   73.73%   73.73%   -0.01%     
==========================================
  Files         196      196              
  Lines       20412    20450      +38     
==========================================
+ Hits        15050    15078      +28     
- Misses       5362     5372      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
@Edwardf0t1 Edwardf0t1 force-pushed the zhiyu/glm-4.7-mtp-support branch from ac6b609 to 39c6195 Compare February 4, 2026 00:14
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 4, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
📝 Walkthrough

Walkthrough

Adds PTQ support for GLM-4.7 models by implementing utilities to load MTP layer weights from separate files and automatically exclude these layers from quantization during the PTQ process. Includes documentation updates reflecting the new model support.

Changes

Cohort / File(s) Summary
Documentation
CHANGELOG.rst, examples/llm_ptq/README.md
Added changelog entries and support matrix documentation for GLM-4.7 PTQ support, including footnotes describing MTP layer behavior and exclusion from quantization.
MTP Weight Loading & Integration
examples/llm_ptq/example_utils.py
Introduces load_mtp_weights_if_needed() utility function to inspect safetensors indices, load non-standard weight shards, and identify MTP layer prefixes. Integrates into model initialization in get_model() to attach discovered prefixes to the model instance.
Quantization Config Exclusion
examples/llm_ptq/hf_ptq.py, modelopt/torch/export/unified_export_hf.py
Reads MTP layer prefixes and constructs exclusion patterns in quantization configuration to prevent these layers from being quantized during PTQ processing.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'GLM-4.7 MTP support' directly describes the main change: adding support for GLM-4.7's MTP (Modular Tensor Parallel) modules in the PTQ workflow.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch zhiyu/glm-4.7-mtp-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
@Edwardf0t1 Edwardf0t1 marked this pull request as ready for review February 4, 2026 01:45
@Edwardf0t1 Edwardf0t1 requested review from a team as code owners February 4, 2026 01:45
@Edwardf0t1 Edwardf0t1 self-assigned this Feb 4, 2026
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
@cjluo-nv cjluo-nv merged commit 452c5a0 into main Feb 4, 2026
37 checks passed
@cjluo-nv cjluo-nv deleted the zhiyu/glm-4.7-mtp-support branch February 4, 2026 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants