Skip to content

Enable shuffled KV cache layout for MiniMax vLLM#1199

Open
jiacao-amd wants to merge 1 commit intoSemiAnalysisAI:mainfrom
jiacao-amd:add-minimax-shuffle-kv-layout
Open

Enable shuffled KV cache layout for MiniMax vLLM#1199
jiacao-amd wants to merge 1 commit intoSemiAnalysisAI:mainfrom
jiacao-amd:add-minimax-shuffle-kv-layout

Conversation

@jiacao-amd
Copy link
Copy Markdown

Summary

  • Export VLLM_ROCM_SHUFFLE_KV_CACHE_LAYOUT=1 in the MiniMax-M2.5 FP8 MI355X vLLM benchmark.

Why

vLLM's ROCm AITER attention backend expects the shuffled KV cache layout for its fast attention path. Without this environment variable, the benchmark can run with the default KV cache layout and miss the
intended AITER attention kernels, leaving MI355X MiniMax-M2.5 FP8 throughput below the optimized path.

This script already enables VLLM_ROCM_USE_AITER=1 and launches vLLM with --attention-backend ROCM_AITER_FA; setting VLLM_ROCM_SHUFFLE_KV_CACHE_LAYOUT=1 makes the KV cache layout match that backend.

Testing

  • bash -n benchmarks/single_node/minimaxm2.5_fp8_mi355x.sh

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@cquil11 cquil11 requested a review from chunfangamd April 27, 2026 20:23
Copy link
Copy Markdown
Collaborator

@chunfangamd chunfangamd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lstm

Hi @haic0, could you please help update the corresponding vllm-project/recipes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants