Skip to content

Borsalino v0.2.1

Latest

Choose a tag to compare

@github-actions github-actions released this 11 Jun 20:35
· 3 commits to main since this release
54b84b0

Borsalino v0.2.1

Hotfix release — removes accidentally published early-research content.
No other changes from v0.2.0.

Borsalino v0.2.0

New Features

  • Async dispatchdispatch_async() returns Pulse handle for non-blocking GPU execution. VkFence (Vulkan), MTLCommandBuffer (Metal). Drop performs implicit join.
  • Persistent bufferscreate_device_buffer() keeps data on GPU across dispatches. VRAM on discrete GPUs, zero-copy on unified memory.
  • GPU timestampsgpu.timestamp() for profiling. Vulkan: vkCmdWriteTimestamp query pool.
  • 2D/3D tiled dispatch — WGSL shared memory + barriers for tiled matmul.
  • Candle integration — custom element-wise GPU kernel pattern for complementing ML frameworks.

Benchmarks

Platform Tiled Matmul 8192 Batched SAXPY 1M Dispatch
GB10 (RTX Spark) 1,403 GFLOPS 372 GFLOPS 0.4 µs
RTX 5080 523 GFLOPS 477 GFLOPS 0.5 µs
M3 Pro 186 GFLOPS 42 GFLOPS 142 µs

Breaking Changes

None. All additions are backward-compatible trait methods with default implementations.

Full details: CHANGELOG.md