Skip to content

test/relaxed_dot: add i16-intermediate overflow boundary cases#164

Open
matthargett wants to merge 1 commit into
WebAssembly:mainfrom
matthargett:relaxed-dot-i16-overflow-test
Open

test/relaxed_dot: add i16-intermediate overflow boundary cases#164
matthargett wants to merge 1 commit into
WebAssembly:mainfrom
matthargett:relaxed-dot-i16-overflow-test

Conversation

@matthargett
Copy link
Copy Markdown

Summary

Adds two new assert_return cases to
test/core/relaxed-simd/relaxed_dot_product.wast that exercise
the i16-intermediate truncation point with inputs that overflow
i16.

Both new cases use a = b = -128 for all 16 lanes:

  • i16x8.relaxed_dot_i8x16_i7x16_s — each i16 lane's pair
    sum is -128*-128 + -128*-128 = 32768, which overflows
    signed-i16. Spec-allowed set: {-32768, 32767} (per-pair wrap
    or saturate).

  • i32x4.relaxed_dot_i8x16_i7x16_add_s with c = 0 — every
    pair sum overflows the same way. After extadd_pairwise_i16x8_s
    and the final +c, the spec-allowed set is
    {-65536, -1, 65534} (the wrap/sat combinations of the two
    pair sums).

A direct-byte-sum implementation (one that skips the i16
truncation point and sums all four byte products directly into
i32) produces 65536 per lane, which is outside that
spec-allowed set. The existing assertions in the file don't
catch that shape because their chosen byte values keep the pair
sum within signed-i16 range — e.g. -128 * -127 * 2 = 32512,
no overflow.

How this came up

While adding relaxed-SIMD support to WAMR (the wasm-micro-
runtime) fast-interp at
rebeckerspecialties/wasm-micro-runtime#3,
a chatgpt-codex-connector code-review bot flagged that our
i32x4.relaxed_dot_i8x16_i7x16_add_s implementation skipped the
i16 truncation step and summed all four byte products directly
into i32. We verified the report, fixed the implementation, and
added a unit test pinned at the chosen wrap value.

Then, as a sanity check, we ran the upstream relaxed-SIMD spec
testsuite against the WAMR build with the fix reverted
expecting it to catch the bug we'd just fixed. It didn't.
All 69 of the relaxed-SIMD assert_returns passed with the
buggy implementation, because every existing test input keeps
the i16 pair sum within range. The bug shape is invisible
to the current test set.

This PR closes that gap.

Verification

  • WAT well-formed: wast2json --enable-relaxed-simd accepts
    the modified file; assertion count goes from 11 → 13.
  • With WAMR fast-interp's fixed implementation in place, both
    new assertions pass (WAMR's result is in the spec-allowed
    set).
  • With the fix reverted, the new add_s assertion fails:
    relaxed_dot_product.wast:118 i32x4.relaxed_dot_i8x16_i7x16_add_s(
      a = v128 -128*16, b = v128 -128*16, c = v128 0*4):
      got      v128 i32x4 65536 65536 65536 65536
      expected one of { -65536, -1, 65534 } per lane
    
    — exactly the bug the original PR fixed, now visible at the
    spec-test layer.

Notes on the chosen either sets

  • For the _s (no-add) variant I list only the two
    homogeneous-across-lanes outcomes {[-32768]*8, [32767]*8},
    matching the existing convention in the file (impls are
    uniform across lanes; per-lane mixed wrap/sat is in principle
    spec-allowed but not enumerated). Happy to expand to the full
    combinatorial set if reviewers prefer.

  • For the _add_s variant the either includes the mixed
    -1 outcome (one pair wraps, one saturates) because that's
    the within-a-single-lane spec-allowed combination, not a
    cross-lane mixing.

  • "Force-i7-to-zero" interpretation (which would yield 0 for
    this input since b has the top bit set) is omitted, matching
    the convention of the existing test at lines 63-71 in the same
    file that also doesn't enumerate that case.

The existing assertions for both `i16x8.relaxed_dot_i8x16_i7x16_s`
and `i32x4.relaxed_dot_i8x16_i7x16_add_s` exercise byte values
where the i16 pair sum fits in i16 (e.g. `-128 * -127 * 2 =
32512`, within range). On those inputs an implementation that
skips the i16 truncation point and sums byte products directly
into the wider lane still produces a spec-allowed answer, so the
bug stays hidden.

Adding two assertions with `a = b = -128` for all 16 lanes. The
i16 pair sum is `-128*-128 + -128*-128 = 32768`, which overflows
signed-i16, so the spec-defined `wrap` and `saturate` paths
diverge:

  - `i16x8.relaxed_dot_i8x16_i7x16_s`:    {-32768, 32767}
  - `i32x4.relaxed_dot_i8x16_i7x16_add_s` (c=0): {-65536, -1, 65534}

A direct-byte-sum implementation produces 32768 / 65536
respectively, both OUTSIDE the spec-allowed set.

How this came up
----------------
While adding relaxed-SIMD support to WAMR (the wasm-micro-
runtime) fast-interp at rebeckerspecialties/wasm-micro-runtime#3,
a `chatgpt-codex-connector` code-review bot flagged that our
`i32x4.relaxed_dot_i8x16_i7x16_add_s` implementation skipped the
i16 truncation step and summed all four byte products directly
into i32. We verified the report, fixed the implementation, and
added a unit test pinned at the chosen wrap value.

Then, as a sanity check, we ran the upstream relaxed-SIMD spec
testsuite against the WAMR build with the fix REVERTED — to
confirm the spec suite would have caught the bug. It didn't:
the existing test inputs all stay within the i16 pair-sum
range, so the missing i16 truncation didn't change any result.
This commit closes that gap.

Verified locally: with the fix in place, WAMR produces results
in the spec-allowed set. With the fix reverted, the new
assertion fails as expected, surfacing the bug.

Signed-off-by: Matt Hargett <plaztiksyke@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant