v0.1.0
cuTile Rust v0.1.0 release
cuTile Rust v0.1.0 is now available on crates.io.
We've finalized our host-side and device-side APIs. We
are not planning any further breaking changes to the kernel authoring model, tensor
launch API, DeviceOp execution model, or core device operation surface.
It is also much easier to try cuTile Rust in a normal Rust project:
cargo add cutile@0.1.0or add it directly to Cargo.toml:
[dependencies]
cutile = "0.1.0"That should be enough for normal kernel authoring; you should not need to
depend on the internal workspace crates directly.
Highlights in v0.1.0:
- Finalized host and device APIs for tensor kernels,
DeviceOps, async
execution, CUDA graphs, and CUDA interop. - Device-side operations now closely track Tile IR, including tensor views,
partition views, atomics, memory ordering, shape operations, and tile math. - The JIT compiler now has stronger type inference, static dispatch lowering,
type aliases, global constants,Global,else if, and better diagnostics. - Mapped partitions support safe persistent scheduling patterns, including a
persistent GEMM example. - Dynamic-shape and read-only tile-like loads generate faster code in important
cases. - Runtime ergonomics improved with dynamic CUDA bindings,
tileirasoverride,
custom memory pools, and memory accounting.
The README and book have also been updated with a shorter quick-start example
and current API docs:
- Repository: https://github.com/NVlabs/cutile-rs
- Book: https://nvlabs.github.io/cutile-rs/
- Crates.io: https://crates.io/crates/cutile
Please keep sending feedback, especially on the v0.1 API surface as you start
using it from crates.io.