Skip to content

Add haic0 patch for AMD kimi k2.5 MTP support#1108

Draft
haic0 wants to merge 2 commits intoSemiAnalysisAI:mainfrom
haic0:hc/kimi-k2.5-int4-mtp
Draft

Add haic0 patch for AMD kimi k2.5 MTP support#1108
haic0 wants to merge 2 commits intoSemiAnalysisAI:mainfrom
haic0:hc/kimi-k2.5-int4-mtp

Conversation

@haic0
Copy link
Copy Markdown
Collaborator

@haic0 haic0 commented Apr 21, 2026

No description provided.

Signed-off-by: haic0 <haichzha@gbt350-odcdh5-wbb3.png-odc.dcgpu>
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Copy link
Copy Markdown
Collaborator

@chunfangamd chunfangamd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haic0 could you please branch against the InferenceX repo directly and create the PR?

search-space:
- { tp: 8, conc-start: 4, conc-end: 64 }
- { tp: 4, conc-start: 4, conc-end: 64 }
- { tp: 8, conc-start: 4, conc-end: 64}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to leave the unrelated lines unmodified

@functionstackx functionstackx marked this pull request as draft April 21, 2026 13:06
@functionstackx
Copy link
Copy Markdown
Contributor

functionstackx commented Apr 21, 2026

thanks for the PR @haic0

We haven't thought much about the guidelines for MTP for models that dont natively ship with it as we didn't think we would include it for inferencexv3 & have previously rejected submissions for it #1026 (review) do you think we should include it for inferencex v3 i.e. the questions that need some thought is:

  1. what speculator weight & speculator # of layers/arch should be used?
  2. Can different vendors submit different speculator weights?
  3. Can different vendors submit different speculator? If so, how you are maintain similar acceptance rate between vendors. i.e. [Model Runner V2] Enable forcing a specific acceptance rate during rejection sampling vllm-project/vllm#38045 --speculative-config '{"method": "eagle", "model": <eagle_model>, "num_speculative_tokens": 3, "rejection_sample_method": "synthetic", "synthetic_acceptance_rate": <test-value>}' if you are doing this, how do we ensure that the specific AR distribution implementation is similar across different frameworks?
  4. should we be using lightseekorg/kimi-k2.6-eagle3 https://github.com/vllm-project/recipes/pull/347/changes ?

We already have 3 different models that natively ship with MTP (deepseek, glm5, qwen3.5), is it worth it to spend time thinking about MTP for models that don't come natively with MTP?

Comment on lines +432 to +450
kimik2.5-int4-mi300x-vllm-mtp:
image: vllm/vllm-openai-rocm:v0.19.0
model: moonshotai/Kimi-K2.5
model-prefix: kimik2.5
runner: mi300x
precision: int4
framework: vllm
multinode: false
seq-len-configs:
- isl: 1024
osl: 1024
search-space:
- { tp: 4, conc-start: 4, conc-end: 64, spec-decoding: mtp }
- isl: 8192
osl: 1024
search-space:
- { tp: 4, conc-start: 4, conc-end: 64, spec-decoding: mtp }

kimik2.5-int4-mi325x-vllm-mtp:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The corresponding file benchmarks/single_node/kimik2.5_int4_mi325x_mtp.sh is missing

search-space:
- { tp: 4, conc-start: 4, conc-end: 64, spec-decoding: mtp }

kimik2.5-int4-mi300x-vllm-mtp:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The corresponding file benchmarks/single_node/kimik2.5_int4_mi300x_mtp.sh is missing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants