Release cuda-oxide v0.2.1 · NVlabs/cuda-oxide

Patch release: correctness fixes in the device pipeline, plus a licensing cleanup in cuda-bindings that supersedes v0.2.0 for anyone consuming that crate.

Correctness

Device functions and call sites now carry LLVM's convergent attribute, preventing illegal transforms around warp-synchronous operations (#145).
The detected GPU architecture is treated as a hint, not a hard target, so explicitly requested --arch builds behave as asked (#145).
NaN float literals are emitted as hex bit patterns instead of bare nan, which llc rejects (#116, ported from #63).
Loads, stores, and allocas now carry Rust-side ABI alignment instead of LLVM defaults (#122, ported from #113).

cuda-bindings licensing cleanup

crates/cuda-bindings ships under the NVIDIA Software License and accepts no external contributions. Two externally-authored patches that briefly landed there have been removed and their functionality re-implemented under NVIDIA authorship (#152): the CUDA_HOME toolkit fallback and the CUDA 12.8 cuEventElapsedTime_v2 compatibility shim.
The FFI bindings themselves were never affected: they are generated at build time by bindgen from your local CUDA headers and are not checked in.
The crate's SPDX headers now match its declared license, and CI now rejects non-NVIDIA-authored changes to the crate (#152).

Testing, CI, and docs

14 new mir-lower control-flow unit tests (#111, thanks @goog00).
Error-demo examples are now classified in STATUS.md and enforced in CI (#86, thanks @ronakv).
gemm_sol gained a live cublasLt speed-of-light baseline for perf work.
thread::index_1d / index_2d uniqueness is documented as launch-conditional (#127).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda-oxide v0.2.1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Correctness

cuda-bindings licensing cleanup

Testing, CI, and docs

Contributors

Uh oh!