Docs: require --use-chat-template in MTP scripts by functionstackx · Pull Request #1079 · SemiAnalysisAI/InferenceX

functionstackx · 2026-04-18T00:59:31Z

Summary

Adds a one-liner to AGENTS.md under Code Conventions → Bash stating that every MTP (*_mtp.sh) script MUST pass --use-chat-template to run_benchmark_serving, no exceptions.

Why

EAGLE-style speculative decoding is trained against chat-formatted inputs. When the benchmark client sends raw prompts, the draft model's acceptance rate silently collapses and the numbers no longer reflect realistic serving. Several recent MTP PRs shipped without the flag (#1074, #1075, #1076, #1077), so codifying this in AGENTS.md keeps future MTP additions from hitting the same foot-gun.

Test plan

Markdown-only change, no code or config touched.

🤖 Generated with Claude Code

MTP (EAGLE-style spec decoding) is trained on chat-formatted inputs; omitting --use-chat-template silently regresses acceptance rate and produces misleading benchmark numbers. Several recent MTP configs were missing the flag, so make the rule explicit in AGENTS.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude

LGTM — straightforward documentation rule, nit about multi-node AMD MTP pattern is already captured inline.

Extended reasoning...

Overview

Single-line addition to AGENTS.md under the Bash conventions section, codifying that every MTP script must pass --use-chat-template to run_benchmark_serving. No code, config, or logic changes.

Security risks

None. Documentation-only change.

Level of scrutiny

Minimal scrutiny required. This is a docs-only addition that captures an existing best practice; it neither modifies nor introduces executable logic.

Other factors

The one bug found is a nit: the rule's scope (*_mtp.sh naming convention) doesn't explicitly call out the multi-node AMD pattern where IS_MTP=true is set via env var instead. That pattern's bench.sh already injects --use-chat-template automatically today, so no current script is broken — the gap is only in guidance for future custom wrappers. The inline comment already surfaces this. The core intent of the rule is sound and useful.

claude · 2026-04-18T01:06:18Z

 - Source shared utilities: `source benchmark_lib.sh`
 - Functions: `check_env_vars()`, `wait_for_server_ready()`, `run_benchmark_serving()`, `run_eval()`, `append_lm_eval_summary()`
 - Parameters passed via environment variables
+- **MTP scripts MUST pass `--use-chat-template` to `run_benchmark_serving` — no exceptions.** EAGLE-style speculative decoding is trained against chat-formatted inputs, so benchmarking against raw prompts silently regresses acceptance rate and produces misleading numbers. This applies to every `*_mtp.sh` script regardless of model, precision, or runner.


🟡 The new MTP rule only references the *_mtp.sh naming convention, but multi-node AMD MTP benchmarks use a different pattern: server.sh exports IS_MTP=true when DECODE_MTP_SIZE > 0, and bench.sh conditionally adds --use-chat-template based on that variable. A developer adding a new multi-node AMD MTP configuration (not named *_mtp.sh) would read this rule and likely conclude it doesn't apply. Consider expanding the rule to mention both patterns.

Extended reasoning...

Two distinct MTP patterns exist in the codebase, but the rule only documents one.

The newly added rule states: "This applies to every *_mtp.sh script regardless of model, precision, or runner." This accurately describes the single-node benchmark pattern (e.g., dsr1_fp8_b200_mtp.sh) where scripts are explicitly named with the _mtp suffix and must call run_benchmark_serving with --use-chat-template directly.

However, multi-node AMD MTP benchmarks follow a completely different mechanism. In benchmarks/multi_node/amd_utils/server.sh (lines 459–463), IS_MTP=true is exported when DECODE_MTP_SIZE > 0. Then in bench.sh (line 60), --use-chat-template is conditionally injected via $( [ "$IS_MTP" = "true" ] && echo "--use-chat-template" ). These scripts are named things like dsr1_fp8_mi355x_sglang-disagg.sh — they do not match *_mtp.sh.

Why the refutation is partially valid but incomplete: The refutation correctly observes that the existing bench.sh infrastructure already handles --use-chat-template automatically for multi-node AMD MTP configurations, meaning no current script is broken. However, this only holds for scripts that reuse bench.sh. A developer writing a new multi-node AMD MTP script that doesn't route through bench.sh, or who is setting up the IS_MTP=true dispatch chain for the first time, could read AGENTS.md, see their script isn't named *_mtp.sh, and not realize the requirement applies.

Concrete scenario: Developer adds a new multi-node AMD MTP configuration and writes a custom launch wrapper that calls run_benchmark_serving directly (bypassing bench.sh). They check AGENTS.md and see: "This applies to every *_mtp.sh script" — their script is not named *_mtp.sh, so they move on without adding --use-chat-template. The acceptance rate silently collapses, producing misleading benchmark numbers — exactly the failure mode the rule intends to prevent.

Suggested fix: Expand the final sentence to cover both patterns, e.g.: "This applies to every *_mtp.sh script AND to any script where IS_MTP=true is set (the multi-node AMD MTP pattern), regardless of model, precision, or runner."

functionstackx requested a review from a team April 18, 2026 00:59

github-project-automation bot added this to InferenceMAX Board Apr 18, 2026

functionstackx merged commit 27f958d into main Apr 18, 2026
4 checks passed

functionstackx deleted the claude/agents-md-mtp-chat-template branch April 18, 2026 00:59

github-project-automation bot moved this to Done in InferenceMAX Board Apr 18, 2026

claude bot reviewed Apr 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs: require --use-chat-template in MTP scripts#1079

Docs: require --use-chat-template in MTP scripts#1079
functionstackx merged 1 commit intomainfrom
claude/agents-md-mtp-chat-template

functionstackx commented Apr 18, 2026

Uh oh!

Uh oh!

claude bot left a comment

Uh oh!

claude bot Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented Apr 18, 2026

Summary

Why

Test plan

Uh oh!

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant