Conversation
Birch-san
pushed a commit
to Birch-san/coremltools
that referenced
this pull request
Nov 27, 2022
+ cosine schedule and unet config
4 tasks
john-rocky
added a commit
to john-rocky/coremltools
that referenced
this pull request
May 3, 2026
…ANE (macOS 26)
When OpPalettizerConfig is configured with enable_per_channel_scale=True,
palettize_weights wraps the constexpr_lut_to_dense output in a
constexpr_blockwise_shift_scale op (data=<dense fp16 weight>, scale=<per-channel
fp16>). On macOS 26, the MPSGraph backend lowering for that constexpr op fails
verification when targeting the Apple Neural Engine:
'mps.dequantize' op operand apple#2 must be tensor of quantized values,
but got 'tensor<1xf16>'
... failed assertion `original module failed verification'
The MPSGraph lowering of constexpr_blockwise_shift_scale assumes the data
operand is a quantized integer tensor (it lowers to mps.dequantize); with
enable_per_channel_scale=True, the data is the dense fp16 weight, which fails
that assumption. CPU and GPU compute units accept the wrapper and predict
correctly; only the ANE-targeted MIL -> MPSGraph dispatch is broken.
Fix: bake per_channel_scale into the LUT entries at compile time and re-emit
constexpr_lut_to_dense, instead of leaving the scale as a runtime constexpr.
Both data and scale are fp16 and the wrapper's only effect is data * scale, so
the fold is mathematically identical. The failing MPSGraph dispatch is
eliminated entirely, and CPU / GPU numerics stay bit-identical with the prior
behavior. Resulting graph also has one fewer runtime constexpr per palettized
const.
Test updated: TestPalettizeWeights::test_palettization_pcs previously asserted
that the constexpr_blockwise_shift_scale wrapper was emitted; it now asserts
the wrapper is absent (the LUT is pre-scaled). Numerical equivalence vs the
unpalettized model is verified by the existing verify_model_outputs call on
macOS 15+.
Tested:
- test_palettization_pcs: PASS
- All 155 TestPalettizeWeights / TestJointCompressWeights: PASS
- Manual: Qwen3-VL 2B stateful chunk on macOS 26 + M4 ANE:
MPSGraph verification crash gone (was reproducible at every load).
john-rocky
added a commit
to john-rocky/coremltools
that referenced
this pull request
May 6, 2026
…ANE (macOS 26)
When OpPalettizerConfig is configured with enable_per_channel_scale=True,
palettize_weights wraps the constexpr_lut_to_dense output in a
constexpr_blockwise_shift_scale op (data=<dense fp16 weight>, scale=<per-channel
fp16>). On macOS 26, the MPSGraph backend lowering for that constexpr op fails
verification when targeting the Apple Neural Engine:
'mps.dequantize' op operand apple#2 must be tensor of quantized values,
but got 'tensor<1xf16>'
... failed assertion `original module failed verification'
The MPSGraph lowering of constexpr_blockwise_shift_scale assumes the data
operand is a quantized integer tensor (it lowers to mps.dequantize); with
enable_per_channel_scale=True, the data is the dense fp16 weight, which fails
that assumption. CPU and GPU compute units accept the wrapper and predict
correctly; only the ANE-targeted MIL -> MPSGraph dispatch is broken.
Fix: bake per_channel_scale into the LUT entries at compile time and re-emit
constexpr_lut_to_dense, instead of leaving the scale as a runtime constexpr.
Both data and scale are fp16 and the wrapper's only effect is data * scale, so
the fold is mathematically identical. The failing MPSGraph dispatch is
eliminated entirely, and CPU / GPU numerics stay bit-identical with the prior
behavior. Resulting graph also has one fewer runtime constexpr per palettized
const.
Test updated: TestPalettizeWeights::test_palettization_pcs previously asserted
that the constexpr_blockwise_shift_scale wrapper was emitted; it now asserts
the wrapper is absent (the LUT is pre-scaled). Numerical equivalence vs the
unpalettized model is verified by the existing verify_model_outputs call on
macOS 15+.
Tested:
- test_palettization_pcs: PASS
- All 155 TestPalettizeWeights / TestJointCompressWeights: PASS
- Manual: Qwen3-VL 2B stateful chunk on macOS 26 + M4 ANE:
MPSGraph verification crash gone (was reproducible at every load).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.