fix: preserve q/k/v quantizer mapping in AST attention patching#1307
Open
Brumbelow wants to merge 1 commit intoNVIDIA:mainfrom
Open
fix: preserve q/k/v quantizer mapping in AST attention patching#1307Brumbelow wants to merge 1 commit intoNVIDIA:mainfrom
Brumbelow wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Signed-off-by: Andrew Brumbelow <andrewbrumbelow@gmail.com>
Contributor
📝 WalkthroughWalkthroughThe changes fix quantizer wiring in attention mechanisms by replacing breadth-first AST traversal with depth-first post-order traversal for node collection, updating operand indexing logic via a new helper, and reordering which quantizers are applied to BMM and binary matmul operations. Tests validate correct quantizer invocation. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Preserve q/k/v quantizer wiring when
register_attention_for_kv_quant()patches AST-generated attention wrappers.Motivation
The old AST patching logic relied on breadth-first
ast.walk()order, which can visit nested and sequential attention matmuls in a different order than runtime evaluation. That could attach q/k/v quantizers to the wrong operands.Changes
torch.matmul,torch.bmm, and@Testing
Run with:
python -m pytest tests/unit/torch/quantization/plugins/test_attention_quant.pypython -m pytest tests/unit/torch/quantization/test_quantize_replace.pypre-commit run --all-filesChecklist
Additional information:
Closes #1064.
Summary by CodeRabbit
New Features
Tests