[None][fix] Enable MoE load balancer setup for Kimi-K2.5#13386
[None][fix] Enable MoE load balancer setup for Kimi-K2.5#13386qiaoxj07 wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Add `KimiK25ForConditionalGeneration` to `moe_model_arch_list` so `maybe_create_moe_load_balancer` invokes `moe_load_balancer.setup(ep_rank, ep_size)` for Kimi-K2.5 as it already does for DeepseekV3/V32. Without this, the model instantiation path (ConfigurableMoE -> _init_load_balancer) raises `ValueError: Cannot calculate num_local_slots. num_slots must be set and setup() must be called.` because the runtime state (`_ep_size`) on `MoeLoadBalancerConfig` is never populated. Kimi-K2.5 is already a registered auto-model derived from `DeepseekV3ForCausalLM` (modeling_deepseekv3.py), so it follows the same EPLB code path and needs the same whitelist entry. Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughA new model architecture ( Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/bot run --disable-fail-fast |
|
PR_Github #45206 [ run ] triggered by Bot. Commit: |
|
PR_Github #45206 [ run ] completed with state
|
Summary
KimiK25ForConditionalGenerationtomoe_model_arch_listintensorrt_llm/_torch/modules/fused_moe/moe_load_balancer.pysomaybe_create_moe_load_balancercallsmoe_load_balancer.setup(ep_rank, ep_size)for Kimi-K2.5.moe_load_balancer.num_slots=288) for Kimi-K2.5 crashes during executor init withValueError: Cannot calculate num_local_slots. num_slots must be set and setup() must be called..Root cause
Kimi-K2.5 registers as
KimiK25ForConditionalGeneration(seetensorrt_llm/_torch/models/modeling_deepseekv3.py), deriving fromDeepseekV3ForCausalLM.maybe_create_moe_load_balancerwhitelists architectures by name before callingMoeLoadBalancerConfig.setup(ep_rank, ep_size). Kimi-K2.5 was missing from the list, sosetup()was never called and_ep_sizestayedNone. Later, during model init,ConfigurableMoE._init_load_balancerreadsmoe_load_balancer_config.num_local_slots, which then raises.Traceback (abbreviated, from a 16-rank GB200 run with EPLB enabled):
Test plan
moe_load_balancer.num_slots=288) and confirm executor init succeeds without thenum_local_slotserror.Summary by CodeRabbit