Skip to content

v1.2.0

Choose a tag to compare

@arjkesh arjkesh released this 22 Apr 05:43
e0c1602

What's Changed

  • Add PyPI install instructions to README by @hannahli-nv in #96
  • Integrate Qwen3.5 with TileGym cuTile Kernels — 2.68x Speedup & other updates by @hannahli-nv in #92
  • cleanup: Remove dead mask variable and make bounds checking explicit in GELU kernel by @hannahli-nv in #99
  • docs: Update ROADMAP.md statuses & Add more unsloth kernels by @hannahli-nv in #100
  • Update TileGym Julia kernels to cuTile 0.2 by @maleadt in #102
  • Update translated READMEs to match latest English README & other updates by @hannahli-nv in #103
  • perf: gemma_attention CuTile — use approx tanh (rounding_mode=APPROX) for soft cap & other updates by @hannahli-nv in #105
  • Integrate TileGym Kernels for allenai/Olmo-3-1025-7B & [skill] Add cutile auto research by @hannahli-nv in #106
  • fix(unsloth): fix 6 correctness and performance issues in CuTile RoPE kernels & other updates by @hannahli-nv in #108
  • [skill] fix test func for perf improvement skills & perf(sm80): tune cuTile kernels for A100 with README updated by @hannahli-nv in #110
  • fix(ci): render all benchmark columns in summary, not just allowlisted backends by @hannahli-nv in #109
  • Bump version from 1.1.0 to 1.2.0 by @hannahli-nv in #112

New Contributors

Full Changelog: v1.1.0...v1.2.0