Skip to content

Make examples slightly more idiomatic and aligned#228

Merged
maleadt merged 4 commits into
mainfrom
tb/examples
May 17, 2026
Merged

Make examples slightly more idiomatic and aligned#228
maleadt merged 4 commits into
mainfrom
tb/examples

Conversation

@maleadt

@maleadt maleadt commented May 17, 2026

Copy link
Copy Markdown
Member

No description provided.

maleadt added 4 commits May 17, 2026 09:52
Matches the PermutedDimsArray override so a view of a CuArray can be
passed directly to a Tile kernel without Adapt trying to wrap a
non-AbstractArray TileArray back into a SubArray.
- moe: drop the unsafe_copy2d! workaround that physically split AB into
  two halves, in favor of two views — matches MoE.py's AB.chunk(2, dim=-1)
  and now that @cuda backend=cuTile accepts SubArrays it's the cleaner
  expression. Side effect: -2.3% wall on the benchmark (0.944 -> 0.922 ms)
  because the copies are gone.
- layernorm: remove layer_norm_bwd_dx, an early-development simplified
  backward that has no counterpart in LayerNorm.py and isn't invoked by
  run().
Each kernel was subtracting 1 from a 1-indexed bid to do modular
arithmetic, then adding 1 back at every load/index site. Push the
conversion inside the swizzle helper (matmul, moe) so the kernel body
stays 1-indexed throughout, and use cld/mod1 for fmha's batch/head
split which gives 1-indexed results directly from a 1-indexed bid_y.
Also collapse moe's (token_ids - 1) ÷ replicas + 1 to cld.(token_ids,
replicas).
Re-measured on RTX 5080, tileiras 13.2.51. Most kernels within ±2% of
the previous reading. MoE improved from 27.0 to 27.7 TFLOPS thanks to
the view()-based AB split. Recorded each kernel solo because in-suite
runs are sensitive to GPU clock ramp-up — FFT in particular reads
~10% slower in-suite than solo (kernel itself is unchanged, this is a
boost-clock interaction).
@maleadt maleadt merged commit 9c3e09f into main May 17, 2026
1 check passed
@maleadt maleadt deleted the tb/examples branch May 17, 2026 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant