Skip to content

v0.10.0

Latest

Choose a tag to compare

@phiandark phiandark released this 22 Apr 01:20
· 7 commits to main since this release
5196d21

Added

  • Python 3.14 support finalized, including a fix for stale tuple hashes in SegmentedTensorProduct after in-place operand mutation, and updated CI matrix (#272)
  • [Torch/JAX] cuet.triangle_attention/cuex.triangle_attention: new faster sm100f (CC 10.0/10.3) forward kernel for hidden_dim ≤ 256, bwd hidden_dim ≤ 128; bias is cast to q/k/v dtype (instead of always float32) under sm100f; non-contiguous input tensors are handled internally — no manual contiguity assertion is required as long as shape requirements are met; updated docstrings. Only available on cu13 builds (#260)
  • [JAX] MACE flax.nnx example restructured to use nnx.split + @jax.jit on (graphdef, state) instead of @nnx.jit on the module, removing the Python-side nnx graph traversal overhead from each training/inference step (#261)
  • [JAX] NVTX markers added to the MACE examples to make step boundaries visible in nsys profiles (#266)

Bug fix

  • [Torch] SegmentedPolynomial checkpoint portability: GPU-saved models now load correctly on CPU. Implemented via __reduce__ on SegmentedPolynomialFromUniform1dJit, SegmentedPolynomialFusedTP, SegmentedPolynomialIndexedLinear, and SegmentedPolynomial, plus graceful fallback when specific cuequivariance_ops_torch extensions (e.g. uniform_1d) are unavailable (#270)
  • [Torch] Replaced deprecated is_fx_tracing with is_fx_symbolic_tracing (#270)
  • [JAX] Restrict PTX 88 to sm_121 for CUDA 12.9+, avoiding breakage on other architectures (addresses the known issue noted in the 0.9.0 release) (#250)
  • [Torch/JAX] cuet.attention_pair_bias/cuex.attention_pair_bias: fixed incorrect results when the hidden dimension is not a multiple of 32; the previous torch fallback for these cases is removed as the kernel now handles them correctly

Notes

  • [Torch] The CUEQ_TORCH_COMPILE environment variable (experimental) enables torch.compile for cuet.triangle_attention; useful for non-contiguous tensor inputs on Ampere/Hopper architectures

Documentation

  • Fixed tutorial format issues (#274)

What's Changed

New Contributors

Full Changelog: v0.9.0...v0.10.0