Skip to content

cuda-oxide v0.2.1

Latest

Choose a tag to compare

@nihalpasham nihalpasham released this 10 Jun 06:11
· 245 commits to main since this release
4514af2

Patch release: correctness fixes in the device pipeline, plus a licensing cleanup in cuda-bindings that supersedes v0.2.0 for anyone consuming that crate.

Correctness

  • Device functions and call sites now carry LLVM's convergent attribute, preventing illegal transforms around warp-synchronous operations (#145).
  • The detected GPU architecture is treated as a hint, not a hard target, so explicitly requested --arch builds behave as asked (#145).
  • NaN float literals are emitted as hex bit patterns instead of bare nan, which llc rejects (#116, ported from #63).
  • Loads, stores, and allocas now carry Rust-side ABI alignment instead of LLVM defaults (#122, ported from #113).

cuda-bindings licensing cleanup

  • crates/cuda-bindings ships under the NVIDIA Software License and accepts no external contributions. Two externally-authored patches that briefly landed there have been removed and their functionality re-implemented under NVIDIA authorship (#152): the CUDA_HOME toolkit fallback and the CUDA 12.8 cuEventElapsedTime_v2 compatibility shim.
  • The FFI bindings themselves were never affected: they are generated at build time by bindgen from your local CUDA headers and are not checked in.
  • The crate's SPDX headers now match its declared license, and CI now rejects non-NVIDIA-authored changes to the crate (#152).

Testing, CI, and docs

  • 14 new mir-lower control-flow unit tests (#111, thanks @goog00).
  • Error-demo examples are now classified in STATUS.md and enforced in CI (#86, thanks @ronakv).
  • gemm_sol gained a live cublasLt speed-of-light baseline for perf work.
  • thread::index_1d / index_2d uniqueness is documented as launch-conditional (#127).