Development Roadmap(H3)

Here is the development roadmap for 2025 H3(Aug to Oct).

## Focus
* Enhance the performance and core functionality of Dynamicemb.
* In HSTU example training, support larger models with increased parallelism, leveraging Dynamicemb's improved functionality and performance.
* For HSTU example inference, focus on kernel optimization and integration with Triton to enable real-world deployment.
* Update HSTU attention to support the latest architecture.
* Develop a proof of concept for the Semantic ID example.
 
## Roadmap

<table border="1" cellpadding="4" cellspacing="0" style="border-collapse:collapse; text-align:left;">
  <thead>
    <tr>
      <th> </th>
      <th>Aug Release</th>
      <th>Sep Release</th>
      <th>Oct Release</th>
      <th>Long-Term</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Dynamicemb</td>
      <td>
        <li> GPU cache and hot embedding migration (milestone 1) #63 
      </td>
      <td>
        <li> Refactor load/dump to support distributed dumping #108 <br/>
        <li> GPU cache and hot embedding migration (milestone 2) #63 <br/>
        <li> LFU bug fixing #143 
      </td>
      <td>
        <li> Embedding admission #111 <br/>
        <li> LRU dumping score #158 
      </td>
      <td>
        <li> NVEmbedding backend integration<br/>
        <li> InputDist upstream<br/>
        <li> Dynamic growing<br/>
        <li> Fuse multiple tables with same dimension
      </td>
    </tr>
    <tr>
      <td>HSTU attention</td>
      <td>
        <li>  Arbitrary mask support #118
      </td>
      <td></td>
      <td>
        <li> Blackwell support
      </td>
      <td></td>
    </tr>
    <tr>
      <td>HSTU example training</td>
      <td></td>
      <td>
        <li> Dynamicemb prefetch pipeline integration #159 <br/>
      </td>
      <td>
        <li> HSTU + FFN support #133 <br/>
        <li> Activation offloading #48 
      </td>
      <td>
      <li> Context parallelism #7 <br/>
      <li> Sequence parallelism #130 
      </td>
    </tr>
    <tr>
      <td>HSTU example inference</td>
      <td>
        <li> NVEmbedding integration #109 <br/>
        <li> E2E example #109
      </td>
      <td>
        <li> HSTU layer kernel optimization and fusing #160 
      </td>
      <td>
        <li> NVIDIA Triton HSTU model support #161 
      </td>
      <td>
        <li> Multi-stream KVCache manager support <br/>
        <li> Model serialization & torch cpp runtime reference<br/>
        <li> KVCache manager upstream
      </td>
    </tr>
    <tr>
      <td>Semantic ID example training & inference</td>
      <td></td>
      <td></td>
      <td>
      <li> PoC
      </td>
      <td></td>
    </tr>
  </tbody>
</table>

	Aug Release	Sep Release	Oct Release	Long-Term
Dynamicemb	GPU cache and hot embedding migration (milestone 1) [FEA] dynamic embedding training cache #63	Refactor load/dump to support distributed dumping [FEA] Distributed embedding dumping for dynamicemb #108 GPU cache and hot embedding migration (milestone 2) [FEA] dynamic embedding training cache #63 LFU bug fixing [BUG] dynamicemb's LFU mode only counts the frequency of unique keys. #143	Embedding admission [FEA] Embedding adimission #111 LRU dumping score [FEA] dynamicemb LRU support dumping score #158	NVEmbedding backend integration InputDist upstream Dynamic growing Fuse multiple tables with same dimension
HSTU attention	Arbitrary mask support [FEA] Support arbitrary HSTU mask #118		Blackwell support
HSTU example training		Dynamicemb prefetch pipeline integration [FEA] hstu example support dynamicemb prefetch #159	HSTU + FFN support [FEA] Harness TransformerLayer from megatron to enable flexible structure #133 Activation offloading [FEA] activation offloading #48	Context parallelism [FEA] Support HSTU Context parallelism in training #7 Sequence parallelism [FEA] Enable Sequence Parallelism #130
HSTU example inference	NVEmbedding integration [FEA] HSTU ranking training and inference e2e example #109 E2E example [FEA] HSTU ranking training and inference e2e example #109	HSTU layer kernel optimization and fusing [FEA] HSTU layer inference kernel optimization #160	NVIDIA Triton HSTU model support [FEA] HSTU inference TritonServer support #161	Multi-stream KVCache manager support Model serialization & torch cpp runtime reference KVCache manager upstream
Semantic ID example training & inference			PoC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Development Roadmap(H3) #162

Focus

Roadmap

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Development Roadmap(H3) #162

Description

Focus

Roadmap

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions