Skip to content

cranelift(x64): lower bare ctz/clz boolean tests via test+CC#13334

Open
ggreif wants to merge 1 commit into
bytecodealliance:mainfrom
ggreif:gabor/ctz-clz-brif-lowering
Open

cranelift(x64): lower bare ctz/clz boolean tests via test+CC#13334
ggreif wants to merge 1 commit into
bytecodealliance:mainfrom
ggreif:gabor/ctz-clz-brif-lowering

Conversation

@ggreif
Copy link
Copy Markdown
Contributor

@ggreif ggreif commented May 11, 2026

Summary

Follow-up to #13332. That PR added egraph rules collapsing (eq (ctz X) 0) / (ne (ctz X) 0) / (eq (clz X) 0) / (ne (clz X) 0) to direct LSB / sign-bit tests — but only when the comparison is mediated by an explicit icmp. The wasm front-end translates wasm if (ctz X) to brif (ireduce.i32 (ctz.i64 X)) directly (no icmp), so the egraph rules don't fire on the wasm-natural shape.

This PR closes the gap by specialising is_nonzero in the x64 backend — the helper that all brif/select/trapif lowerings funnel through.

Rules

In cranelift/codegen/src/isa/x64/inst.isle:

(rule 3 (is_nonzero (ctz (ty_32_or_64 ty) val))
      (CondResult.CC (x64_test ty val (RegMemImm.Imm 1)) (CC.Z)))
(rule 3 (is_nonzero (ireduce _ (ctz (ty_32_or_64 ty) val)))
      (CondResult.CC (x64_test ty val (RegMemImm.Imm 1)) (CC.Z)))
(rule 3 (is_nonzero (clz (ty_32_or_64 ty) val))
      (let ((gpr Gpr val)) (CondResult.CC (x64_test ty gpr gpr) (CC.NS))))
(rule 3 (is_nonzero (ireduce _ (clz (ty_32_or_64 ty) val)))
      (let ((gpr Gpr val)) (CondResult.CC (x64_test ty gpr gpr) (CC.NS))))

The ireduce variant catches the wasm front-end's i32.wrap_i64 over a 64-bit ctz/clz — a no-op on values in [0, bitwidth].

Test deltas (tests/disas/ctz-clz-bool-condition.wat)

consumer before after
if_ctz_bare_i32 5 insns (bsfl + cmovel + test + jne) 2 (testl $1, %edx; je)
if_ctz_bare_i64 5 insns (bsfq + cmovq + test + jne) 2 (testq $1, %rdx; je)
if_clz_bare_i32 7 insns (bsr + cmov + sub + test + jne) 2 (testl + jns)

The icmp-mediated cases (collapsed by #13332's egraph rules) are unchanged. The numeric-comparison negative test ((ctz X) == 4) stays untouched.

Motivation

Motoko's moc codegen emits i64.ctz X; i32.wrap_i64; if for compactness/sign tests in the EOP backend (see caffeinelabs/motoko#6103). Before this PR, that lowers to 5 native instructions per dispatch; after, 2.

A concrete idiomatic example: in Motoko, the let-else pattern over Result

let #ok payload = queryProp(...) else return defaultValue;

desugars to a 2-arm refutable variant match (#ok vs #err). The variant-tag hashes are hash("ok") = 0x611C (LSB 0) and hash("err") = 0x4D0765 (LSB 1) — they differ exactly at the LSB. The planned variant-switch BitTest dispatch (caffeinelabs/motoko's gabor/variant-switch) recognizes this and emits a single LSB-test for the dispatch; combined with this PR, the entire let-else lowers to load hash; testq $1, ...; jcc on x64 — three instructions for a pattern match. Every Result-returning API + every let-else-style early return collapses to this shape.

Aggregated across hot paths (variant-switch dispatch, GC compact/heap discriminator, sign tests, …) this is meaningful.

Follow-ups (not in this PR)

  • aarch64, riscv64, s390x analogues — separate PRs once x64 reviewer feedback lands.
  • select-consumer variant — select already routes through is_nonzero_cmpis_nonzero, so this PR's rules cover it too without extra work.

Follow-up to bytecodealliance#13332. That PR added egraph rules collapsing
`(eq (ctz X) 0)` / `(ne (ctz X) 0)` / clz analogues to direct
LSB / sign-bit tests — but only when the comparison is mediated by an
explicit `icmp`. The wasm front-end translates `wasm if (ctz X)` to
`brif (ireduce.i32 (ctz.i64 X))` directly (no `icmp`), so the egraph
rules don't fire on the wasm-natural shape.

This commit closes the gap by specialising `is_nonzero` in the x64
backend — the helper that all `brif`/`select`/`trapif` lowerings
funnel through. Four rules: `ctz`/`clz` × bare/`ireduce`-wrapped.

The `ireduce` variant catches the wasm front-end's `i32.wrap_i64`
over a 64-bit `ctz`/`clz` — a no-op on values in [0, bitwidth].

Test deltas (tests/disas/ctz-clz-bool-condition.wat):

  if_ctz_bare_i32:   5 insns -> 2 (testl $1, %edx; je)
  if_ctz_bare_i64:   5 insns -> 2 (testq $1, %rdx; je)
  if_clz_bare_i32:   7 insns -> 2 (testl %edx, %edx; jns)

The icmp-mediated cases (collapsed by bytecodealliance#13332's egraph rules) are
unchanged. The numeric-comparison negative test stays untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ggreif ggreif changed the title cranelift(x64): lower bare ctz/clz boolean tests via test+CC cranelift(x64): lower bare ctz/clz boolean tests via test+CC May 11, 2026
@ggreif ggreif marked this pull request as ready for review May 11, 2026 16:17
@ggreif ggreif requested review from a team as code owners May 11, 2026 16:17
@ggreif ggreif requested review from pchickey and uweigand and removed request for a team May 11, 2026 16:17
@ggreif ggreif changed the title cranelift(x64): lower bare ctz/clz boolean tests via test+CC cranelift(x64): lower bare ctz/clz boolean tests via test+CC May 11, 2026
@github-actions github-actions Bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant