Add haic0 patch for AMD kimi k2.5 MTP support by haic0 · Pull Request #1108 · SemiAnalysisAI/InferenceX

haic0 · 2026-04-21T12:49:30Z

No description provided.

Signed-off-by: haic0 <haichzha@gbt350-odcdh5-wbb3.png-odc.dcgpu>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

chunfangamd

@haic0 could you please branch against the InferenceX repo directly and create the PR?

chunfangamd · 2026-04-21T12:53:05Z

    search-space:
-    - { tp: 8, conc-start: 4, conc-end: 64 }
-    - { tp: 4, conc-start: 4, conc-end: 64 }
+    - { tp: 8, conc-start: 4, conc-end: 64}


It's better to leave the unrelated lines unmodified

functionstackx · 2026-04-21T13:13:44Z

thanks for the PR @haic0

We haven't thought much about the guidelines for MTP for models that dont natively ship with it as we didn't think we would include it for inferencexv3 & have previously rejected submissions for it #1026 (review) do you think we should include it for inferencex v3 i.e. the questions that need some thought is:

what speculator weight & speculator # of layers/arch should be used?
Can different vendors submit different speculator weights?
Can different vendors submit different speculator? If so, how you are maintain similar acceptance rate between vendors. i.e. [Model Runner V2] Enable forcing a specific acceptance rate during rejection sampling vllm-project/vllm#38045 --speculative-config '{"method": "eagle", "model": <eagle_model>, "num_speculative_tokens": 3, "rejection_sample_method": "synthetic", "synthetic_acceptance_rate": <test-value>}' if you are doing this, how do we ensure that the specific AR distribution implementation is similar across different frameworks?
should we be using lightseekorg/kimi-k2.6-eagle3 https://github.com/vllm-project/recipes/pull/347/changes ?

We already have 3 different models that natively ship with MTP (deepseek, glm5, qwen3.5), is it worth it to spend time thinking about MTP for models that don't come natively with MTP?

chunfangamd · 2026-04-21T13:16:37Z

+kimik2.5-int4-mi300x-vllm-mtp:
+  image: vllm/vllm-openai-rocm:v0.19.0
+  model: moonshotai/Kimi-K2.5
+  model-prefix: kimik2.5
+  runner: mi300x
+  precision: int4
+  framework: vllm
+  multinode: false
+  seq-len-configs:
+  - isl: 1024
+    osl: 1024
+    search-space:
+    - { tp: 4, conc-start: 4, conc-end: 64, spec-decoding: mtp }
+  - isl: 8192
+    osl: 1024
+    search-space:
+    - { tp: 4, conc-start: 4, conc-end: 64, spec-decoding: mtp }
+
+kimik2.5-int4-mi325x-vllm-mtp:


The corresponding file benchmarks/single_node/kimik2.5_int4_mi325x_mtp.sh is missing

chunfangamd · 2026-04-21T13:16:52Z

+    search-space:
+    - { tp: 4, conc-start: 4, conc-end: 64, spec-decoding: mtp }
+
+kimik2.5-int4-mi300x-vllm-mtp:


The corresponding file benchmarks/single_node/kimik2.5_int4_mi300x_mtp.sh is missing

Add haic0 patch for AMD kimi k2.5 MTP support

b23a446

Signed-off-by: haic0 <haichzha@gbt350-odcdh5-wbb3.png-odc.dcgpu>

haic0 requested a review from a team April 21, 2026 12:49

haic0 requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners April 21, 2026 12:49

github-project-automation Bot added this to InferenceMAX Board Apr 21, 2026

claude Bot reviewed Apr 21, 2026

View reviewed changes

Merge branch 'main' into hc/kimi-k2.5-int4-mtp

1ae8985

chunfangamd requested changes Apr 21, 2026

View reviewed changes

functionstackx marked this pull request as draft April 21, 2026 13:06

chunfangamd reviewed Apr 21, 2026

View reviewed changes

chunfangamd requested changes Apr 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add haic0 patch for AMD kimi k2.5 MTP support#1108

Add haic0 patch for AMD kimi k2.5 MTP support#1108
haic0 wants to merge 2 commits intoSemiAnalysisAI:mainfrom
haic0:hc/kimi-k2.5-int4-mtp

haic0 commented Apr 21, 2026

Uh oh!

claude Bot left a comment

Uh oh!

chunfangamd left a comment

Uh oh!

chunfangamd Apr 21, 2026

Uh oh!

functionstackx commented Apr 21, 2026 •

edited

Loading

Uh oh!

chunfangamd Apr 21, 2026

Uh oh!

chunfangamd Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

haic0 commented Apr 21, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

chunfangamd left a comment

Choose a reason for hiding this comment

Uh oh!

chunfangamd Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

functionstackx commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chunfangamd Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

chunfangamd Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

functionstackx commented Apr 21, 2026 •

edited

Loading