Skip to content

Recompute IR flags from efunc when the rewriter changes opcodes.#211

Merged
maleadt merged 2 commits into
mainfrom
tb/rewrite_effects
Apr 30, 2026
Merged

Recompute IR flags from efunc when the rewriter changes opcodes.#211
maleadt merged 2 commits into
mainfrom
tb/rewrite_effects

Conversation

@maleadt
Copy link
Copy Markdown
Member

@maleadt maleadt commented Apr 30, 2026

The canonicalize! pass rewrites Core.Intrinsics.* to cuTile's Intrinsics.* (sub_int → subi, etc.). The rewrite driver previously cleared IR_FLAG_EFFECT_FREE on opcode change because the inferred flag described the OLD op. Without the flag, downstream CSE and LICM gate-fail on every rewritten call.

Plumb effect propagation through the rewriter itself: when the driver changes an opcode or inserts a new sub-call, recompute the IR flags from the new func's declared effects, mirroring inference's flags_for_effects. The single source of truth for per-intrinsic effects is the efunc(...) overrides — the same hooks that adjust inference results during method-instance compilation (e.g., RNG ops setting effect_free=ALWAYS_FALSE). No parallel impurity list to maintain; no post-hoc walk over the SCI.

The new inferred_flags(func) helper lives next to classify_memory_op in analysis/effects.jl. It's wired into the three sites that produce new instructions: the substitution and inplace branches of apply_rewrite!, the resolve_rhs insertion of synthetic sub-calls, and the imperative commute_arith_transparent rewriter.

Concrete IR impact on the matmul inner loop: 7 → 4 surviving subi(token_id, 1) after CSE, and LICM hoists invariant cmpi/bcast/ reshape out of the k-loop. matmul moves from 43.4 → 47.5 TFLOPS (+9.4% over cuTile Python).

@AntonOresten
Copy link
Copy Markdown
Contributor

matmul moves from 43.4 → 47.5 TFLOPS (+9.4% over cuTile Python)

Based on the current README it seems it's already roughly +8-9% over cuTile Python, at 46.9 TFLOPS. Is this patching a regression since then, or timing noise?

@maleadt
Copy link
Copy Markdown
Member Author

maleadt commented Apr 30, 2026

Is this patching a regression since then, or timing noise?

Yeah that's wrong; it doesn't by itself improve performance, but unlocks additional rewrites I'm going to push in a next PR. The goal is to enable vectorization for the MoE example, which is missing right now (and will improve performance of that example).

maleadt added 2 commits April 30, 2026 10:37
The `canonicalize!` pass rewrites `Core.Intrinsics.*` to cuTile's
`Intrinsics.*` (`sub_int → subi`, etc.). The rewrite driver previously
cleared `IR_FLAG_EFFECT_FREE` on opcode change because the inferred flag
described the OLD op. Without the flag, downstream CSE and LICM gate-fail
on every rewritten call.

Plumb effect propagation through the rewriter itself: when the driver
changes an opcode or inserts a new sub-call, recompute the IR flags from
the new func's declared effects, mirroring inference's
`flags_for_effects`. The single source of truth for per-intrinsic
effects is the `efunc(...)` overrides — the same hooks that adjust
inference results during method-instance compilation (e.g., RNG ops
setting `effect_free=ALWAYS_FALSE`). No parallel impurity list to
maintain; no post-hoc walk over the SCI.

The new `inferred_flags(func)` helper lives next to `classify_memory_op`
in `analysis/effects.jl`. It's wired into the three sites that produce
new instructions: the substitution and inplace branches of `apply_rewrite!`,
the `resolve_rhs` insertion of synthetic sub-calls, and the imperative
`commute_arith_transparent` rewriter.

Concrete IR impact on the matmul inner loop: 7 → 4 surviving
`subi(token_id, 1)` after CSE, and LICM hoists invariant cmpi/bcast/
reshape out of the k-loop.
@AntonOresten
Copy link
Copy Markdown
Contributor

Neat!

@maleadt maleadt force-pushed the tb/rewrite_effects branch from fb0756a to b77d031 Compare April 30, 2026 08:44
@maleadt maleadt merged commit d4e3c76 into main Apr 30, 2026
13 checks passed
@maleadt maleadt deleted the tb/rewrite_effects branch April 30, 2026 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants