Skip to content

v0.1.0

Choose a tag to compare

@elibol elibol released this 16 May 05:49
· 11 commits to main since this release
v0.1.0
f79740b

cuTile Rust v0.1.0 release

cuTile Rust v0.1.0 is now available on crates.io.

We've finalized our host-side and device-side APIs. We
are not planning any further breaking changes to the kernel authoring model, tensor
launch API, DeviceOp execution model, or core device operation surface.

It is also much easier to try cuTile Rust in a normal Rust project:

cargo add cutile@0.1.0

or add it directly to Cargo.toml:

[dependencies]
cutile = "0.1.0"

That should be enough for normal kernel authoring; you should not need to
depend on the internal workspace crates directly.

Highlights in v0.1.0:

  • Finalized host and device APIs for tensor kernels, DeviceOps, async
    execution, CUDA graphs, and CUDA interop.
  • Device-side operations now closely track Tile IR, including tensor views,
    partition views, atomics, memory ordering, shape operations, and tile math.
  • The JIT compiler now has stronger type inference, static dispatch lowering,
    type aliases, global constants, Global, else if, and better diagnostics.
  • Mapped partitions support safe persistent scheduling patterns, including a
    persistent GEMM example.
  • Dynamic-shape and read-only tile-like loads generate faster code in important
    cases.
  • Runtime ergonomics improved with dynamic CUDA bindings, tileiras override,
    custom memory pools, and memory accounting.

The README and book have also been updated with a shorter quick-start example
and current API docs:

Please keep sending feedback, especially on the v0.1 API surface as you start
using it from crates.io.