v0.5.0: Phase 6 — Deployment, Portability & Testing Infrastructure
Multi-Backend KIR Foundation (M47a)
- Kernel IR — 40+ instruction SSA-form intermediate representation
- PTX Backend — KIR to PTX lowering with typed register allocation
- GpuTarget — CUDA/ROCm/Metal/WebGPU with per-backend feature capability tables
- GpuBackend trait — alloc/free/copy/launch/sync interface for all backends
- target(backend) — conditional compilation per GPU target
- --target — CLI flag for backend selection
vmap AST Transform (M39b)
- VmapTransformer — FnDef-to-FnDef AST rewriting producing _batched variants
- Matmul/reduction/transpose rewriting with batch status propagation
- nsl_vmap_check_batch runtime FFI
Testing Infrastructure
- Snapshot testing (insta) — 7 PTX/KIR/fusion snapshots catching silent codegen regressions
- Differential oracle testing — same script with/without --disable-fusion, assert numerical equivalence
Full Changelog: v0.4.0...v0.5.0