Skip to content

Bump version number and fix doc link#2

Merged
znation merged 1 commit intomasterfrom
bump_version
Aug 5, 2017
Merged

Bump version number and fix doc link#2
znation merged 1 commit intomasterfrom
bump_version

Conversation

@znation
Copy link
Copy Markdown
Contributor

@znation znation commented Aug 5, 2017

No description provided.

@znation znation merged commit 2064aad into master Aug 5, 2017
@znation znation deleted the bump_version branch August 5, 2017 00:49
Birch-san pushed a commit to Birch-san/coremltools that referenced this pull request Nov 27, 2022
+ cosine schedule and unet config
john-rocky added a commit to john-rocky/coremltools that referenced this pull request May 3, 2026
…ANE (macOS 26)

When OpPalettizerConfig is configured with enable_per_channel_scale=True,
palettize_weights wraps the constexpr_lut_to_dense output in a
constexpr_blockwise_shift_scale op (data=<dense fp16 weight>, scale=<per-channel
fp16>). On macOS 26, the MPSGraph backend lowering for that constexpr op fails
verification when targeting the Apple Neural Engine:

    'mps.dequantize' op operand apple#2 must be tensor of quantized values,
    but got 'tensor<1xf16>'
    ... failed assertion `original module failed verification'

The MPSGraph lowering of constexpr_blockwise_shift_scale assumes the data
operand is a quantized integer tensor (it lowers to mps.dequantize); with
enable_per_channel_scale=True, the data is the dense fp16 weight, which fails
that assumption. CPU and GPU compute units accept the wrapper and predict
correctly; only the ANE-targeted MIL -> MPSGraph dispatch is broken.

Fix: bake per_channel_scale into the LUT entries at compile time and re-emit
constexpr_lut_to_dense, instead of leaving the scale as a runtime constexpr.
Both data and scale are fp16 and the wrapper's only effect is data * scale, so
the fold is mathematically identical. The failing MPSGraph dispatch is
eliminated entirely, and CPU / GPU numerics stay bit-identical with the prior
behavior. Resulting graph also has one fewer runtime constexpr per palettized
const.

Test updated: TestPalettizeWeights::test_palettization_pcs previously asserted
that the constexpr_blockwise_shift_scale wrapper was emitted; it now asserts
the wrapper is absent (the LUT is pre-scaled). Numerical equivalence vs the
unpalettized model is verified by the existing verify_model_outputs call on
macOS 15+.

Tested:
  - test_palettization_pcs:                                    PASS
  - All 155 TestPalettizeWeights / TestJointCompressWeights:   PASS
  - Manual: Qwen3-VL 2B stateful chunk on macOS 26 + M4 ANE:
    MPSGraph verification crash gone (was reproducible at every load).
john-rocky added a commit to john-rocky/coremltools that referenced this pull request May 6, 2026
…ANE (macOS 26)

When OpPalettizerConfig is configured with enable_per_channel_scale=True,
palettize_weights wraps the constexpr_lut_to_dense output in a
constexpr_blockwise_shift_scale op (data=<dense fp16 weight>, scale=<per-channel
fp16>). On macOS 26, the MPSGraph backend lowering for that constexpr op fails
verification when targeting the Apple Neural Engine:

    'mps.dequantize' op operand apple#2 must be tensor of quantized values,
    but got 'tensor<1xf16>'
    ... failed assertion `original module failed verification'

The MPSGraph lowering of constexpr_blockwise_shift_scale assumes the data
operand is a quantized integer tensor (it lowers to mps.dequantize); with
enable_per_channel_scale=True, the data is the dense fp16 weight, which fails
that assumption. CPU and GPU compute units accept the wrapper and predict
correctly; only the ANE-targeted MIL -> MPSGraph dispatch is broken.

Fix: bake per_channel_scale into the LUT entries at compile time and re-emit
constexpr_lut_to_dense, instead of leaving the scale as a runtime constexpr.
Both data and scale are fp16 and the wrapper's only effect is data * scale, so
the fold is mathematically identical. The failing MPSGraph dispatch is
eliminated entirely, and CPU / GPU numerics stay bit-identical with the prior
behavior. Resulting graph also has one fewer runtime constexpr per palettized
const.

Test updated: TestPalettizeWeights::test_palettization_pcs previously asserted
that the constexpr_blockwise_shift_scale wrapper was emitted; it now asserts
the wrapper is absent (the LUT is pre-scaled). Numerical equivalence vs the
unpalettized model is verified by the existing verify_model_outputs call on
macOS 15+.

Tested:
  - test_palettization_pcs:                                    PASS
  - All 155 TestPalettizeWeights / TestJointCompressWeights:   PASS
  - Manual: Qwen3-VL 2B stateful chunk on macOS 26 + M4 ANE:
    MPSGraph verification crash gone (was reproducible at every load).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant