v1.2.0
What's Changed
- Add PyPI install instructions to README by @hannahli-nv in #96
- Integrate Qwen3.5 with TileGym cuTile Kernels — 2.68x Speedup & other updates by @hannahli-nv in #92
- cleanup: Remove dead mask variable and make bounds checking explicit in GELU kernel by @hannahli-nv in #99
- docs: Update ROADMAP.md statuses & Add more unsloth kernels by @hannahli-nv in #100
- Update TileGym Julia kernels to cuTile 0.2 by @maleadt in #102
- Update translated READMEs to match latest English README & other updates by @hannahli-nv in #103
- perf: gemma_attention CuTile — use approx tanh (rounding_mode=APPROX) for soft cap & other updates by @hannahli-nv in #105
- Integrate TileGym Kernels for allenai/Olmo-3-1025-7B & [skill] Add cutile auto research by @hannahli-nv in #106
- fix(unsloth): fix 6 correctness and performance issues in CuTile RoPE kernels & other updates by @hannahli-nv in #108
- [skill] fix test func for perf improvement skills & perf(sm80): tune cuTile kernels for A100 with README updated by @hannahli-nv in #110
- fix(ci): render all benchmark columns in summary, not just allowlisted backends by @hannahli-nv in #109
- Bump version from 1.1.0 to 1.2.0 by @hannahli-nv in #112
New Contributors
Full Changelog: v1.1.0...v1.2.0