[None][feat] Enable EPLB for DeepSeek-V4 by Barry-Delaney · Pull Request #13595 · NVIDIA/TensorRT-LLM

Barry-Delaney · 2026-04-29T05:03:08Z

@coderabbitai summary

Description

Enables Expert Parallel Load Balancing (EPLB) for DeepSeek-V4 on top of the existing DeepSeek-V4 base support.

Registers DeepseekV4ForCausalLM in moe_model_arch_list so the MoE load balancer recognizes the V4 architecture (tensorrt_llm/_torch/modules/fused_moe/moe_load_balancer.py).
Adds accuracy-harness coverage for DeepSeek-V4-Flash (NVFP4) and DeepSeek-V4-Flash-Base (FP8) under both static and online EPLB on 8 GPUs (Blackwell).
Introduces _make_deepseekv4_eplb_config(...), a small helper that builds MoeLoadBalancerConfig from the HF config. DeepSeek-V4 has no first_k_dense_replace prefix, so every layer in 0..num_hidden_layers-1 is MoE; num_slots = n_routed_experts + 16 * EP matches the redundancy used by TestNemotronV3Super.
Stages B200 pre-merge entries in l0_dgx_b200.yml, commented out until the DeepSeek-V4-Flash / DeepSeek-V4-Flash-Base checkpoints are published under llm_models_root(). They can be uncommented in a follow-up once the weights are staged.

No runtime behavior changes for existing models.

Test Coverage

New tests in tests/integration/defs/accuracy/test_llm_api_pytorch.py:

TestDeepSeekV4Flash::test_nvfp4_8gpus_static_eplb[moe_backend=WIDEEP|CUTLASS]
TestDeepSeekV4Flash::test_nvfp4_8gpus_online_eplb[moe_backend=WIDEEP|CUTLASS|TRTLLM][mtp_nextn=0|1]
TestDeepSeekV4FlashBase::test_fp8_8gpus_static_eplb[moe_backend=WIDEEP|CUTLASS]
TestDeepSeekV4FlashBase::test_fp8_8gpus_online_eplb[moe_backend=WIDEEP|CUTLASS]

All tests are gated by @skip_pre_blackwell and require 8 GPUs with ≥140 GB memory each, plus LLM_MODELS_ROOT containing the V4 checkpoints. They run GSM8K through the LLM API with TP=8 / EP=8 and enable_attention_dp=True.

The l0_dgx_b200 pre-merge entries for the static-EPLB WIDEEP case are checked in but commented; please uncomment after the checkpoints land.

Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>

Bandit's hardcoded_password_string heuristic flags the DeepSeek tokenizer special tokens (BOS/EOS/USER/ASSISTANT/THINKING_END) as potential hardcoded passwords. They are tokenizer markers, not credentials. Mark each line with `# nosec B105` so the release_check CI step (which fails on any `Issue:` in bandit output) stops blocking on these false positives. Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>

Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>

Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>

Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit eb85528) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>

github-actions Bot assigned Barry-Delaney Apr 29, 2026

Barry-Delaney added the deepseek-v4 label Apr 29, 2026

Barry-Delaney marked this pull request as ready for review April 29, 2026 05:04

Barry-Delaney requested review from a team as code owners April 29, 2026 05:04

Barry-Delaney requested review from lfr-0531 and xxi-nv and removed request for a team April 29, 2026 05:04

Barry-Delaney force-pushed the user/jinshik/deepseek_v4_eplb branch 2 times, most recently from 5fe98b0 to 63fb6fa Compare April 29, 2026 06:12

Barry-Delaney requested a review from a team as a code owner April 29, 2026 06:12

Barry-Delaney requested review from symphonylyh and removed request for a team April 29, 2026 06:12

Barry-Delaney force-pushed the user/jinshik/deepseek_v4_eplb branch from 63fb6fa to c3ac084 Compare April 29, 2026 06:12

Barry-Delaney added 2 commits April 29, 2026 15:23

Support EPLB for DeepSeek-V4

58a36f8

Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>

Barry-Delaney force-pushed the user/jinshik/deepseek_v4_eplb branch from 0b41533 to ad2da18 Compare April 29, 2026 07:23

lfr-0531 merged commit 3957bb9 into NVIDIA:feat/deepseek_v4 Apr 29, 2026
4 checks passed

lfr-0531 pushed a commit that referenced this pull request May 7, 2026

[None][feat] Enable EPLB for DeepSeek-V4 (#13595)

befd340

Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>

lfr-0531 pushed a commit that referenced this pull request May 14, 2026

[None][feat] Enable EPLB for DeepSeek-V4 (#13595)

eb85528

Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][feat] Enable EPLB for DeepSeek-V4#13595

[None][feat] Enable EPLB for DeepSeek-V4#13595
lfr-0531 merged 2 commits into
NVIDIA:feat/deepseek_v4from
Barry-Delaney:user/jinshik/deepseek_v4_eplb

Barry-Delaney commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Barry-Delaney commented Apr 29, 2026

Description

Test Coverage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants