Release v0.10.0 · NVIDIA/cuEquivariance

Python 3.14 support finalized, including a fix for stale tuple hashes in SegmentedTensorProduct after in-place operand mutation, and updated CI matrix (#272)
[Torch/JAX] cuet.triangle_attention/cuex.triangle_attention: new faster sm100f (CC 10.0/10.3) forward kernel for hidden_dim ≤ 256, bwd hidden_dim ≤ 128; bias is cast to q/k/v dtype (instead of always float32) under sm100f; non-contiguous input tensors are handled internally — no manual contiguity assertion is required as long as shape requirements are met; updated docstrings. Only available on cu13 builds (#260)
[JAX] MACE flax.nnx example restructured to use nnx.split + @jax.jit on (graphdef, state) instead of @nnx.jit on the module, removing the Python-side nnx graph traversal overhead from each training/inference step (#261)
[JAX] NVTX markers added to the MACE examples to make step boundaries visible in nsys profiles (#266)

[Torch] SegmentedPolynomial checkpoint portability: GPU-saved models now load correctly on CPU. Implemented via __reduce__ on SegmentedPolynomialFromUniform1dJit, SegmentedPolynomialFusedTP, SegmentedPolynomialIndexedLinear, and SegmentedPolynomial, plus graceful fallback when specific cuequivariance_ops_torch extensions (e.g. uniform_1d) are unavailable (#270)
[Torch] Replaced deprecated is_fx_tracing with is_fx_symbolic_tracing (#270)
[JAX] Restrict PTX 88 to sm_121 for CUDA 12.9+, avoiding breakage on other architectures (addresses the known issue noted in the 0.9.0 release) (#250)
[Torch/JAX] cuet.attention_pair_bias/cuex.attention_pair_bias: fixed incorrect results when the hidden dimension is not a multiple of 32; the previous torch fallback for these cases is removed as the kernel now handles them correctly

[Torch] The CUEQ_TORCH_COMPILE environment variable (experimental) enables torch.compile for cuet.triangle_attention; useful for non-contiguous tensor inputs on Ampere/Hopper architectures

What's Changed

Fix eq / lt on segmented polynomial types for Python and JAX compatibility by @mariogeiger in #258
[merge after release] Restrict PTX 88 to sm_121 for CUDA 12.9+ by @hsadasivan in #250
doc string and api update for triattn by @hsadasivan in #260
api: remove dim_order from triangle_attention by @hsadasivan in #262
Removed dim_order from triAttn by @phiandark in #263
nnx.split by @mariogeiger in #261
add nvtx marker by @paulz-nv in #266
Fixing some torch segmented_polynomial support by @phiandark in #270
Add Python 3.14 support and fix CI setup by @mariogeiger in #272
Fix tutorials doc format issues by @LiamZhang100 in #274
Add skill.md files by @mariogeiger in #269
Release 0.10.0 by @mariogeiger in #275

Full Changelog: v0.9.0...v0.10.0