Skip to content

Development Roadmap(H3) #162

@shijieliu

Description

@shijieliu

Here is the development roadmap for 2025 H3(Aug to Oct).

Focus

  • Enhance the performance and core functionality of Dynamicemb.
  • In HSTU example training, support larger models with increased parallelism, leveraging Dynamicemb's improved functionality and performance.
  • For HSTU example inference, focus on kernel optimization and integration with Triton to enable real-world deployment.
  • Update HSTU attention to support the latest architecture.
  • Develop a proof of concept for the Semantic ID example.

Roadmap

  Aug Release Sep Release Oct Release Long-Term
Dynamicemb
  • GPU cache and hot embedding migration (milestone 1) [FEA] dynamic embedding training cache #63
  • Refactor load/dump to support distributed dumping [FEA] Distributed embedding dumping for dynamicemb #108
  • GPU cache and hot embedding migration (milestone 2) [FEA] dynamic embedding training cache #63
  • LFU bug fixing [BUG] dynamicemb's LFU mode only counts the frequency of unique keys. #143
  • Embedding admission [FEA] Embedding adimission #111
  • LRU dumping score [FEA] dynamicemb LRU support dumping score #158
  • NVEmbedding backend integration
  • InputDist upstream
  • Dynamic growing
  • Fuse multiple tables with same dimension
  • HSTU attention
  • Arbitrary mask support [FEA] Support arbitrary HSTU mask #118
  • Blackwell support
  • HSTU example training
  • Dynamicemb prefetch pipeline integration [FEA] hstu example support dynamicemb prefetch #159
  • HSTU + FFN support [FEA] Harness TransformerLayer from megatron to enable flexible structure #133
  • Activation offloading [FEA] activation offloading #48
  • Context parallelism [FEA] Support HSTU Context parallelism in training #7
  • Sequence parallelism [FEA] Enable Sequence Parallelism #130
  • HSTU example inference
  • NVEmbedding integration [FEA] HSTU ranking training and inference e2e example #109
  • E2E example [FEA] HSTU ranking training and inference e2e example #109
  • HSTU layer kernel optimization and fusing [FEA] HSTU layer inference kernel optimization #160
  • NVIDIA Triton HSTU model support [FEA] HSTU inference TritonServer support #161
  • Multi-stream KVCache manager support
  • Model serialization & torch cpp runtime reference
  • KVCache manager upstream
  • Semantic ID example training & inference
  • PoC
  • Metadata

    Metadata

    Labels

    dynamicembRelated with dynamicembfeatureNew feature

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions