Enable Indexer cache for DS v3.2 decoding by RissyRan · Pull Request #3529 · AI-Hypercomputer/maxtext

RissyRan · 2026-03-31T18:03:11Z

Description

This is a clean version of previous PR after rebase.

Enable Indexer cache for DS v3.2 decoding, to unblock the eval benchmark for DS v3.2 model with sparse attention bringup.

DS reference implementation for Indexer is here
Add init_indexer_cache & update_indexer_cache for indexer cache
Add encoding_dsv32.py from HF files to enable specific encoding for V3.2
Other small changes for benchmark time out setting

Tests

All runners are green
Training end-to-end (no impact) with a smaller model version: link
Test against reference implementation still green: link
Decoding gives reasonable
- small seq len to skip the indexer: link
- large seq len to process the indexer max_prefill_predict_length=3072 max_target_length=4096: link
Reasonable MMLU pro valuation in b/469547559, and sanity check on decoding - link

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-03-31T20:14:17Z

Codecov Report

❌ Patch coverage is 95.00000% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/layers/attention_mla.py	93.75%	1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

github-actions · 2026-03-31T22:49:13Z

🤖 Hi @RissyRan, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-03-31T22:52:10Z

🤖 I'm sorry @RissyRan, but I was unable to process your request. Please see the logs for more details.

shuningjin

Thanks for enabling decoding for DeepSeek Sparse Attention, and adding specialized encoding!

Really appreciate the comprehensive testing for pre-filling and generation (across unit tests, standalone decoding, and the API server). It is amazing that you reproduce reasonable MMLU-pro result for large scale with limited resources.

github-actions · 2026-04-02T01:52:50Z

🤖 Hi @shuningjin, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

## 📋 Review Summary

This pull request enables the Indexer cache for DeepSeek V3.2 decoding, which is a critical component for sparse attention bringup in MaxText. It also introduces necessary encoding/decoding logic for DeepSeek-V3.2's specialized output format, including thinking/reasoning blocks, and updates the API server to handle these new fields.

🔍 General Feedback

Reasoning Content: The addition of reasoning_content to ChatMessage is a great improvement for supporting modern LLMs that use separate thinking blocks.
Cache Implementation: The Indexer KV cache implementation follows the existing MLA patterns well and correctly handles masking for uninitialized slots during decoding.
Model Compatibility: The model name checks in the API server might be too restrictive, potentially missing the official Hugging Face model names. Consider using more robust string matching as suggested.
Testing: The new unit tests effectively verify both prefill and autoregressive modes for the indexer, covering different sequence length scenarios.

Rohan-Bierneni

Thank you for making this change! As we have discussed in previous pr, there can be some other optimizations made to the cache, but for functionality reusing kvcache for indexer was a good choice.

Also thank you for adding tests and getting MMLU-Pro to work with api_server.

LGTM!

RissyRan force-pushed the dsv32_decode_clean branch from d0609de to 024c567 Compare March 31, 2026 20:08

RissyRan force-pushed the dsv32_decode_clean branch from 024c567 to d3f6edd Compare March 31, 2026 22:35

RissyRan marked this pull request as ready for review March 31, 2026 22:41

RissyRan added the gemini-review label Mar 31, 2026

AI-Hypercomputer deleted a comment from github-actions Bot Mar 31, 2026

RissyRan added gemini-review and removed gemini-review labels Mar 31, 2026

RissyRan force-pushed the dsv32_decode_clean branch 2 times, most recently from b6120e0 to 1a3b8af Compare March 31, 2026 23:42

RissyRan assigned shuningjin and Rohan-Bierneni Apr 1, 2026

RissyRan force-pushed the dsv32_decode_clean branch from 1a3b8af to 646ac9c Compare April 1, 2026 05:40

shuningjin approved these changes Apr 2, 2026

View reviewed changes

shuningjin added gemini-review and removed gemini-review labels Apr 2, 2026

github-actions Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread benchmarks/api_server/maxtext_server.py Outdated

Comment thread benchmarks/api_server/server_utils.py Outdated

Comment thread benchmarks/api_server/encoding/encoding_dsv32.py

Comment thread src/maxtext/layers/attention_mla.py

RissyRan force-pushed the dsv32_decode_clean branch from 646ac9c to 8c1cdb3 Compare April 2, 2026 04:58

Enable Indexer cache for DS v3.2 decoding

f4a4f21

RissyRan force-pushed the dsv32_decode_clean branch from 8c1cdb3 to f4a4f21 Compare April 2, 2026 05:01

Rohan-Bierneni approved these changes Apr 3, 2026

View reviewed changes

RissyRan added the pull ready label Apr 3, 2026

RissyRan unassigned shuningjin and Rohan-Bierneni Apr 3, 2026

copybara-service Bot merged commit 94d2010 into main Apr 3, 2026
37 checks passed

copybara-service Bot deleted the dsv32_decode_clean branch April 3, 2026 16:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Indexer cache for DS v3.2 decoding#3529

Enable Indexer cache for DS v3.2 decoding#3529
copybara-service[bot] merged 1 commit intomainfrom
dsv32_decode_clean

RissyRan commented Mar 31, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 31, 2026

Uh oh!

github-actions Bot commented Mar 31, 2026

Uh oh!

shuningjin left a comment •

edited

Loading

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Rohan-Bierneni left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

RissyRan commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov Bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Mar 31, 2026

Uh oh!

github-actions Bot commented Mar 31, 2026

Uh oh!

shuningjin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔍 General Feedback

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Rohan-Bierneni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RissyRan commented Mar 31, 2026 •

edited

Loading

codecov Bot commented Mar 31, 2026 •

edited

Loading

shuningjin left a comment •

edited

Loading