fix: preserve q/k/v quantizer mapping in AST attention patching by Brumbelow · Pull Request #1307 · NVIDIA/Model-Optimizer

Brumbelow · 2026-04-21T02:43:58Z

Summary

Preserve q/k/v quantizer wiring when register_attention_for_kv_quant() patches AST-generated attention wrappers.

Motivation

The old AST patching logic relied on breadth-first ast.walk() order, which can visit nested and sequential attention matmuls in a different order than runtime evaluation. That could attach q/k/v quantizers to the wrong operands.

Changes

switch attention matmul collection to deterministic post-order traversal
patch the first matmul as q/k score computation and the second as attention/value aggregation
keep the transpose wrapper only on the key operand for per-token KV-cache quantization
add sequential unit coverage for torch.matmul, torch.bmm, and @
assert that q, k, and v quantizers see the expected tensors while preserving forward outputs

Testing

Run with:

python -m pytest tests/unit/torch/quantization/plugins/test_attention_quant.py
python -m pytest tests/unit/torch/quantization/test_quantize_replace.py
pre-commit run --all-files

Checklist

Backward compatible
Followed guidance, no copied code.
Added tests
No docs changes (no API changes)

Additional information:
Closes #1064.

Summary by CodeRabbit

New Features
- Enhanced attention quantization with improved operand instrumentation and more accurate quantizer application order.
- Better determinism when identifying quantization targets within attention mechanisms.
Tests
- Added comprehensive test coverage for attention quantization verification.
- New parametrized tests validate quantizer behavior and ensure numerical correctness of quantized attention outputs across different attention implementations.

Signed-off-by: Andrew Brumbelow <andrewbrumbelow@gmail.com>

copy-pr-bot · 2026-04-21T02:44:01Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-21T02:44:13Z

📝 Walkthrough

Walkthrough

The changes fix quantizer wiring in attention mechanisms by replacing breadth-first AST traversal with depth-first post-order traversal for node collection, updating operand indexing logic via a new helper, and reordering which quantizers are applied to BMM and binary matmul operations. Tests validate correct quantizer invocation.

Changes

Cohort / File(s)	Summary
Attention Plugin Core Logic `modelopt/torch/quantization/plugins/attention.py`	Introduced `collect_attention_nodes()` for depth-first post-order AST traversal; added `get_operand_indices()` helper to determine which operands to instrument; generalized transpose behavior via `transpose_quantizers` collection; reordered quantizer application targets for `len(bmm_nodes)==2` and `len(bin_matmul_nodes)==2` cases to patch correct operand indices.
Attention Quantization Tests `tests/unit/torch/quantization/plugins/test_attention_quant.py`	Added three sequential attention modules (`SequentialMatmulAttention`, `SequentialBMMAttention`, `SequentialBinMatmulAttention`) that compute attention via explicit matmul/bmm/`@` operations; introduced `RecordingIdentityQuantizer` to record cloned inputs; added parametrized test `test_kv_quant_sequential_attention_wiring` that validates quantizers are invoked exactly once with expected q, k, v operands.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 18.75% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: preserve q/k/v quantizer mapping in AST attention patching' clearly and specifically summarizes the main change: fixing quantizer mapping preservation in AST attention patching.
Linked Issues check	✅ Passed	The PR addresses issue `#1064` by implementing deterministic AST traversal, preserving correct q/k/v quantizer wiring, and adding test coverage for sequential attention modules.
Out of Scope Changes check	✅ Passed	All changes are directly related to fixing the q/k/v quantizer mapping issue: AST patching logic updates, operand indexing, and corresponding test cases are all in scope.
Security Anti-Patterns	✅ Passed	Security review of attention.py and test_attention_quant.py found no anti-patterns: no eval/exec, torch.load with weights_only=False, numpy.load with allow_pickle=True, trust_remote_code=True, or # nosec suppressions.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

fix: preserve q/k/v quantizer mapping in AST attention patching

3cccc41

Signed-off-by: Andrew Brumbelow <andrewbrumbelow@gmail.com>

Brumbelow requested a review from a team as a code owner April 21, 2026 02:43

Brumbelow requested a review from sychen52 April 21, 2026 02:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: preserve q/k/v quantizer mapping in AST attention patching#1307

fix: preserve q/k/v quantizer mapping in AST attention patching#1307
Brumbelow wants to merge 1 commit intoNVIDIA:mainfrom
Brumbelow:fix/issue-1064-kv-attention-ast-ordering

Brumbelow commented Apr 21, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Apr 21, 2026

Uh oh!

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Brumbelow commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Testing

Checklist

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Apr 21, 2026

Uh oh!

coderabbitai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Brumbelow commented Apr 21, 2026 •

edited

Loading

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading