What's Changed
Features & Enhancements
- Add GR inference example by @cb521 in #414
- benchmark: update HSTU benchmark fixes and docs by @JacoCheung in #411
- feat(hstu): enable HSTU + DynamicEmb E2E training on Blackwell (sm_100) by @JacoCheung in #399
- [Fea] beam search sm8x support by @z52527 in #407
- Segmented unique optimization by @jiashuy in #417
- Add an example for disabling contextual mask in training by @geoffreyQiu in #395
- Recsys-FlexKV cpu breakdown benchmark by @Clebrate in #416
Bug Fixes
- update doc by @shijieliu in #402
- Update beam search benchmark RESULTS.md by @z52527 in #406
- Refactor dynamicemb benchmark by @jiashuy in #396
- Refactor _print_memory_consume to report real per-tier value bytes by @jiashuy in #408
- Refactor/prefetch select cleanup by @jiashuy in #409
- [dynamicemb] Add spawn-based multi-worker DataLoader option to MovieL… by @jiashuy in #412
- fix offload_try_wait by @Clebrate in #415
New Contributors
Full Changelog: v26.04...v26.05