Borsalino v0.2.1
Hotfix release — removes accidentally published early-research content.
No other changes from v0.2.0.
Borsalino v0.2.0
New Features
- Async dispatch —
dispatch_async()returnsPulsehandle for non-blocking GPU execution. VkFence (Vulkan), MTLCommandBuffer (Metal). Drop performs implicit join. - Persistent buffers —
create_device_buffer()keeps data on GPU across dispatches. VRAM on discrete GPUs, zero-copy on unified memory. - GPU timestamps —
gpu.timestamp()for profiling. Vulkan: vkCmdWriteTimestamp query pool. - 2D/3D tiled dispatch — WGSL shared memory + barriers for tiled matmul.
- Candle integration — custom element-wise GPU kernel pattern for complementing ML frameworks.
Benchmarks
| Platform | Tiled Matmul 8192 | Batched SAXPY 1M | Dispatch |
|---|---|---|---|
| GB10 (RTX Spark) | 1,403 GFLOPS | 372 GFLOPS | 0.4 µs |
| RTX 5080 | 523 GFLOPS | 477 GFLOPS | 0.5 µs |
| M3 Pro | 186 GFLOPS | 42 GFLOPS | 142 µs |
Breaking Changes
None. All additions are backward-compatible trait methods with default implementations.
Full details: CHANGELOG.md