[None][fix] Enable MoE load balancer setup for Kimi-K2.5 by qiaoxj07 · Pull Request #13386 · NVIDIA/TensorRT-LLM

qiaoxj07 · 2026-04-23T15:20:30Z

Summary

Add KimiK25ForConditionalGeneration to moe_model_arch_list in tensorrt_llm/_torch/modules/fused_moe/moe_load_balancer.py so maybe_create_moe_load_balancer calls moe_load_balancer.setup(ep_rank, ep_size) for Kimi-K2.5.
Without this entry, enabling EPLB (e.g. moe_load_balancer.num_slots=288) for Kimi-K2.5 crashes during executor init with ValueError: Cannot calculate num_local_slots. num_slots must be set and setup() must be called..

Root cause

Kimi-K2.5 registers as KimiK25ForConditionalGeneration (see tensorrt_llm/_torch/models/modeling_deepseekv3.py), deriving from DeepseekV3ForCausalLM. maybe_create_moe_load_balancer whitelists architectures by name before calling MoeLoadBalancerConfig.setup(ep_rank, ep_size). Kimi-K2.5 was missing from the list, so setup() was never called and _ep_size stayed None. Later, during model init, ConfigurableMoE._init_load_balancer reads moe_load_balancer_config.num_local_slots, which then raises.

Traceback (abbreviated, from a 16-rank GB200 run with EPLB enabled):

File ".../fused_moe/create_moe.py", line 389, in create_moe
    return ConfigurableMoE(...)
File ".../fused_moe/configurable_moe.py", line 146, in __init__
    super().__init__(...)
File ".../fused_moe/interface.py", line 293, in __init__
    self._init_load_balancer(model_config, aux_stream_dict)
File ".../fused_moe/interface.py", line 357, in _init_load_balancer
    moe_load_balancer_config.num_local_slots
File ".../llmapi/llm_args.py", line 497, in num_local_slots
    raise ValueError(
ValueError: Cannot calculate `num_local_slots`. `num_slots` must be set and setup() must be called.

Test plan

Rerun the Kimi-K2.5 disaggregated benchmark (ctx2_gen1_dep16, moe_load_balancer.num_slots=288) and confirm executor init succeeds without the num_local_slots error.
Confirm existing MoE models (DeepseekV3/V32, Qwen3MoE, etc.) still initialize correctly — the change only adds a new whitelist entry.

Summary by CodeRabbit

New Features
- Added support for the KimiK25 model with optimized expert-parallel load balancing capabilities.

Add `KimiK25ForConditionalGeneration` to `moe_model_arch_list` so `maybe_create_moe_load_balancer` invokes `moe_load_balancer.setup(ep_rank, ep_size)` for Kimi-K2.5 as it already does for DeepseekV3/V32. Without this, the model instantiation path (ConfigurableMoE -> _init_load_balancer) raises `ValueError: Cannot calculate num_local_slots. num_slots must be set and setup() must be called.` because the runtime state (`_ep_size`) on `MoeLoadBalancerConfig` is never populated. Kimi-K2.5 is already a registered auto-model derived from `DeepseekV3ForCausalLM` (modeling_deepseekv3.py), so it follows the same EPLB code path and needs the same whitelist entry. Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>

coderabbitai · 2026-04-23T15:22:25Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 8e72b096-b26b-41f0-ae41-622f68fef96f

📥 Commits

Reviewing files that changed from the base of the PR and between a5244ae and 98f9cf8.

📒 Files selected for processing (1)

tensorrt_llm/_torch/modules/fused_moe/moe_load_balancer.py

📝 Walkthrough

Walkthrough

A new model architecture (KimiK25ForConditionalGeneration) is added to the MoE load balancer's architecture allowlist, enabling expert-parallel load balancing support for this model when other prerequisites are met.

Changes

Cohort / File(s)	Summary
MoE Load Balancer Configuration `tensorrt_llm/_torch/modules/fused_moe/moe_load_balancer.py`	Added `KimiK25ForConditionalGeneration` to `moe_model_arch_list` allowlist, expanding architecture support for MoE load balancing.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically identifies the fix: enabling MoE load balancer setup for Kimi-K2.5, which directly matches the changeset's main purpose.
Description check	✅ Passed	The description provides a comprehensive explanation of what was changed, why it was needed, the root cause, and a test plan covering both the new functionality and regression testing of existing models.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

qiaoxj07 · 2026-04-23T15:23:58Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-23T15:33:53Z

PR_Github #45206 [ run ] triggered by Bot. Commit: 98f9cf8 Link to invocation

tensorrt-cicd · 2026-04-24T05:07:28Z

PR_Github #45206 [ run ] completed with state SUCCESS. Commit: 98f9cf8
/LLM/main/L0_MergeRequest_PR pipeline #35473 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

qiaoxj07 requested a review from a team as a code owner April 23, 2026 15:20

qiaoxj07 requested a review from xxi-nv April 23, 2026 15:20

github-actions Bot assigned qiaoxj07 Apr 23, 2026

qiaoxj07 closed this Apr 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][fix] Enable MoE load balancer setup for Kimi-K2.5#13386

[None][fix] Enable MoE load balancer setup for Kimi-K2.5#13386
qiaoxj07 wants to merge 1 commit intoNVIDIA:mainfrom
qiaoxj07:fix/eplb-kimi-k25-arch

qiaoxj07 commented Apr 23, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 23, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

qiaoxj07 commented Apr 23, 2026

Uh oh!

tensorrt-cicd commented Apr 23, 2026

Uh oh!

tensorrt-cicd commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

qiaoxj07 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 23, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

qiaoxj07 commented Apr 23, 2026

Uh oh!

tensorrt-cicd commented Apr 23, 2026

Uh oh!

tensorrt-cicd commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qiaoxj07 commented Apr 23, 2026 •

edited

Loading