Skip to content

v26.04

Choose a tag to compare

@shijieliu shijieliu released this 20 May 06:35
· 16 commits to main since this release
a4d3b01

What's Changed

Features & Enhancements

  • Add hash_roundrobin routing mode to mitigate modulo-aliasing imbalance by @ShaobinChen-AH in #367
  • Jagged Arbitrary Masked Self Attention support by @z52527 in #339
  • fix segmented_unique_cuda: replace table_ids with segmented_range by @jiashuy in #377
  • perf(hstu): restore eager-mode .item() in preprocessor; drop duplicate triton_jagged.py by @JacoCheung in #389
  • Improve AOTI compilation of hstu model by @geoffreyQiu in #380
  • Recsys KVCache Manager refactored into standalone package by @geoffreyQiu in #387
  • Add inference aoti benchmark results by @geoffreyQiu in #394
  • [FEA] Beam search by @z52527 in #379

Bug Fixes

  • fix: unify dense tensor padding convention (dim-0 == batch_size) by @JacoCheung in #362
  • fix(dynamicemb): traverse nn.Module children in check_emb_collection_modules by @JacoCheung in #355
  • fix(ddp): bucket_size=True silently disables grad bucketing by @JacoCheung in #374

Misc

Full Changelog: v26.03...v26.04