Skip to content

Promote scalar float32 to float64 in AOT signature#153

Merged
voltjia merged 2 commits into
masterfrom
promote-fp32-to-fp64-in-aot-signature
May 9, 2026
Merged

Promote scalar float32 to float64 in AOT signature#153
voltjia merged 2 commits into
masterfrom
promote-fp32-to-fp64-in-aot-signature

Conversation

@voltjia
Copy link
Copy Markdown
Collaborator

@voltjia voltjia commented May 9, 2026

Summary

_build_variant now lifts a scalar fp32 argument to fp64 in the
Triton AOT signature. Without this, a scalar Tensor(0, dtype=ninetoothed.float32)
silently arrives as 0.0f inside the kernel on Triton 3.4.0+ —
triton.tools.compile declares the wrapper parameter as C double
but the cubin still reads 4 bytes as float, so the low half of the
8-byte double (zero for small values like 0.125) gets fed to the
kernel.

Triton ABI sweep

Compiled kernel(scale, out_ptr, n: tl.constexpr) with signature
'fp32, *fp32:16, 16' on every tested version:

Triton wrapper sig for scalar fp32 matches cubin
3.1.0 / 3.2.0 / 3.3.0 float scale
3.4.0 / 3.5.0 / 3.6.0 / 3.7.0 double scale ✗ — cubin still reads 4 bytes

Re-running the sweep with signature 'fp64, ...' gives double scale
consistently, so the wrapper, args[], and cubin agree on 8 bytes
across 3.1.0 – 3.7.0 (latest). The promotion is therefore harmless on
3.1 – 3.3 (no precision lost since the underlying Python value is
already 64-bit) and fixes the silent corruption on 3.4+.

Testing

pytest output:

============================= test session starts ==============================
platform linux -- Python 3.10.16, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/huangjiacheng/ninetoothed
configfile: pyproject.toml
plugins: anyio-4.12.1, xdist-3.8.0, cov-7.0.0, typeguard-4.4.4
collected 214 items

tests/test_add.py .                                                      [  0%]
tests/test_addmm.py ..                                                   [  1%]
tests/test_aot.py ..........                                             [  6%]
tests/test_aot_auto_tuning.py ....                                       [  7%]
tests/test_attention.py ........                                         [ 11%]
tests/test_auto_tuner.py ....                                            [ 13%]
tests/test_clone.py ....                                                 [ 15%]
tests/test_conv2d.py ....                                                [ 17%]
tests/test_data_ptr.py .                                                 [ 17%]
tests/test_debugging.py .                                                [ 18%]
tests/test_dropout.py .                                                  [ 18%]
tests/test_eval.py ........                                              [ 22%]
tests/test_expand.py .                                                   [ 22%]
tests/test_generation.py ............................................... [ 44%]
.............................                                            [ 58%]
tests/test_getitem.py ..........                                         [ 63%]
tests/test_ipynb.py .                                                    [ 63%]
tests/test_jagged.py ................                                    [ 71%]
tests/test_matmul.py ..                                                  [ 71%]
tests/test_max_pool2d.py ..                                              [ 72%]
tests/test_naming.py .......                                             [ 76%]
tests/test_pad.py ................................................       [ 98%]
tests/test_pow.py .                                                      [ 99%]
tests/test_softmax.py .                                                  [ 99%]
tests/test_unsqueeze.py .                                                [100%]

======================= 214 passed in 3311.29s (0:55:11) =======================

@voltjia voltjia merged commit dc51d41 into master May 9, 2026
8 checks passed
@voltjia voltjia deleted the promote-fp32-to-fp64-in-aot-signature branch May 9, 2026 06:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant