Skip to content

[None][feat] Enable EPLB for DeepSeek-V4#13595

Merged
lfr-0531 merged 2 commits into
NVIDIA:feat/deepseek_v4from
Barry-Delaney:user/jinshik/deepseek_v4_eplb
Apr 29, 2026
Merged

[None][feat] Enable EPLB for DeepSeek-V4#13595
lfr-0531 merged 2 commits into
NVIDIA:feat/deepseek_v4from
Barry-Delaney:user/jinshik/deepseek_v4_eplb

Conversation

@Barry-Delaney
Copy link
Copy Markdown
Collaborator

@coderabbitai summary

Description

Enables Expert Parallel Load Balancing (EPLB) for DeepSeek-V4 on top of the existing DeepSeek-V4 base support.

  • Registers DeepseekV4ForCausalLM in moe_model_arch_list so the MoE load balancer recognizes the V4 architecture (tensorrt_llm/_torch/modules/fused_moe/moe_load_balancer.py).
  • Adds accuracy-harness coverage for DeepSeek-V4-Flash (NVFP4) and DeepSeek-V4-Flash-Base (FP8) under both static and online EPLB on 8 GPUs (Blackwell).
  • Introduces _make_deepseekv4_eplb_config(...), a small helper that builds MoeLoadBalancerConfig from the HF config. DeepSeek-V4 has no first_k_dense_replace prefix, so every layer in 0..num_hidden_layers-1 is MoE; num_slots = n_routed_experts + 16 * EP matches the redundancy used by TestNemotronV3Super.
  • Stages B200 pre-merge entries in l0_dgx_b200.yml, commented out until the DeepSeek-V4-Flash / DeepSeek-V4-Flash-Base checkpoints are published under llm_models_root(). They can be uncommented in a follow-up once the weights are staged.

No runtime behavior changes for existing models.

Test Coverage

New tests in tests/integration/defs/accuracy/test_llm_api_pytorch.py:

  • TestDeepSeekV4Flash::test_nvfp4_8gpus_static_eplb[moe_backend=WIDEEP|CUTLASS]
  • TestDeepSeekV4Flash::test_nvfp4_8gpus_online_eplb[moe_backend=WIDEEP|CUTLASS|TRTLLM][mtp_nextn=0|1]
  • TestDeepSeekV4FlashBase::test_fp8_8gpus_static_eplb[moe_backend=WIDEEP|CUTLASS]
  • TestDeepSeekV4FlashBase::test_fp8_8gpus_online_eplb[moe_backend=WIDEEP|CUTLASS]

All tests are gated by @skip_pre_blackwell and require 8 GPUs with ≥140 GB memory each, plus LLM_MODELS_ROOT containing the V4 checkpoints. They run GSM8K through the LLM API with TP=8 / EP=8 and enable_attention_dp=True.

The l0_dgx_b200 pre-merge entries for the static-EPLB WIDEEP case are checked in but commented; please uncomment after the checkpoints land.

@Barry-Delaney Barry-Delaney marked this pull request as ready for review April 29, 2026 05:04
@Barry-Delaney Barry-Delaney requested review from a team as code owners April 29, 2026 05:04
@Barry-Delaney Barry-Delaney requested review from lfr-0531 and xxi-nv and removed request for a team April 29, 2026 05:04
@Barry-Delaney Barry-Delaney force-pushed the user/jinshik/deepseek_v4_eplb branch 2 times, most recently from 5fe98b0 to 63fb6fa Compare April 29, 2026 06:12
@Barry-Delaney Barry-Delaney requested a review from a team as a code owner April 29, 2026 06:12
@Barry-Delaney Barry-Delaney requested review from symphonylyh and removed request for a team April 29, 2026 06:12
@Barry-Delaney Barry-Delaney force-pushed the user/jinshik/deepseek_v4_eplb branch from 63fb6fa to c3ac084 Compare April 29, 2026 06:12
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Bandit's hardcoded_password_string heuristic flags the DeepSeek tokenizer
special tokens (BOS/EOS/USER/ASSISTANT/THINKING_END) as potential
hardcoded passwords. They are tokenizer markers, not credentials. Mark
each line with `# nosec B105` so the release_check CI step (which fails
on any `Issue:` in bandit output) stops blocking on these false
positives.

Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
@Barry-Delaney Barry-Delaney force-pushed the user/jinshik/deepseek_v4_eplb branch from 0b41533 to ad2da18 Compare April 29, 2026 07:23
@lfr-0531 lfr-0531 merged commit 3957bb9 into NVIDIA:feat/deepseek_v4 Apr 29, 2026
4 checks passed
lfr-0531 pushed a commit that referenced this pull request May 7, 2026
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
lfr-0531 pushed a commit that referenced this pull request May 14, 2026
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
lfr-0531 pushed a commit to lfr-0531/TensorRT-LLM that referenced this pull request May 29, 2026
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit eb85528)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants