Skip to content

Add reinterpret as interface to bitcast, pack, and unpack#238

Open
AntonOresten wants to merge 2 commits into
JuliaGPU:mainfrom
AntonOresten:reinterpret
Open

Add reinterpret as interface to bitcast, pack, and unpack#238
AntonOresten wants to merge 2 commits into
JuliaGPU:mainfrom
AntonOresten:reinterpret

Conversation

@AntonOresten
Copy link
Copy Markdown
Collaborator

Adds Base.reinterpret(::Type{T}, ::Tile) and Base.reinterpret(reshape, ::Type{T}, ::Tile), giving tiles the same whole-array reinterpret semantics as AbstractArray: the bits are viewed as one contiguous column-major block and the leading dimension is rescaled by the element-width ratio:

julia> arr = rand(UInt8, 2, 2)
2×2 Matrix{UInt8}:
 0x11  0x13
 0xfd  0x20

julia> reinterpret(UInt16, arr)
1×2 reinterpret(UInt16, ::Matrix{UInt8}):
 0xfd11  0x2013

julia> reinterpret(reshape, UInt16, arr)
2-element reinterpret(reshape, UInt16, ::Matrix{UInt8}) with eltype UInt16:
 0xfd11
 0x2013

This is distinct from the element-wise reinterpret.(T, x) broadcast, which preserves shape and already lowered to bitcast.

Here's FP4, loaded two-per-byte:

bytes = ct.load(a, pid, (8,))             # Tile{UInt8,(8,)}
fp4   = reinterpret(Float4_E2M1FN, bytes) # Tile{Float4_E2M1FN,(16,)}
vals  = convert(ct.Tile{Float32}, fp4)

I think this is justified since Tile lacks a byte-level memory representation, and Tile IR treats sub-byte types like Float4_E2M1FN as first-class tile elements. Their sub-byte-ness only surfaces at the memory boundary, which pack/unpack bridge.

Lowering

With q = bitwidth(eltype(x)) and p = bitwidth(T), the leading dimension always scales by q // p; the op is chosen by which side is the byte type (i8):

condition op
p == q bitcast (shape preserved)
target is a byte (p == 8) packto bytes
source is a byte (q == 8) unpackfrom bytes
neither pack then unpack (pivot through i8)

pack/unpack (new intrinsics over cuda_tile.pack/unpack, v13.3+) are rank-1 ops between a numeric tile and a byte tile, named for their direction relative to bytes — not for element-count growth, which the q ÷ p scale carries (a sub-byte unpack expands the dimension; a super-byte pack like UInt16 → UInt8 also expands it). reinterpret flattens to rank-1 via reshape, converts, and reshapes back, so it works at any rank; identity reshapes fold away, leaving a single pack/unpack in the common case. Mirrors cutile-python's pack_to_bytes/unpack_from_bytes.

bitwidth trait

reinterpret on sub-byte types needs a true element width, since sizeof rounds up to whole bytes (Float4_E2M1FN is 4 bits, sizeof == 1). Adds cuTile.bitwidth(::Type), defaulting to 8 * sizeof(T); the Microfloats extension forwards to Microfloats.bitwidth.

AntonOresten and others added 2 commits May 29, 2026 15:12
The shape helpers and pack/unpack tfuncs ran inside the kernel-inferred
path, where two failure modes produced confusing errors:

- A tfunc returning `nothing` on an indivisible width left the result
  untypable, surfacing downstream as `internal error: invalid terminators`.
- A `throw(ArgumentError(...))` in a shape helper became an unsupported
  `String` in kernel IR (`format_string`/`unsupported String` error),
  masking the intended message.

Make both layers total: pack/unpack tfuncs always return a concrete type
(via `fld`), and the shape helpers are pure arithmetic. Validation now
lives solely in the pack/unpack/reshape emit, which throws a clear
`IRError` (e.g. "unpack: 1 bytes do not evenly divide into Float32").
Valid reinterprets are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant