Docs: require --use-chat-template in MTP scripts#1079
Conversation
MTP (EAGLE-style spec decoding) is trained on chat-formatted inputs; omitting --use-chat-template silently regresses acceptance rate and produces misleading benchmark numbers. Several recent MTP configs were missing the flag, so make the rule explicit in AGENTS.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
LGTM — straightforward documentation rule, nit about multi-node AMD MTP pattern is already captured inline.
Extended reasoning...
Overview
Single-line addition to AGENTS.md under the Bash conventions section, codifying that every MTP script must pass --use-chat-template to run_benchmark_serving. No code, config, or logic changes.
Security risks
None. Documentation-only change.
Level of scrutiny
Minimal scrutiny required. This is a docs-only addition that captures an existing best practice; it neither modifies nor introduces executable logic.
Other factors
The one bug found is a nit: the rule's scope (*_mtp.sh naming convention) doesn't explicitly call out the multi-node AMD pattern where IS_MTP=true is set via env var instead. That pattern's bench.sh already injects --use-chat-template automatically today, so no current script is broken — the gap is only in guidance for future custom wrappers. The inline comment already surfaces this. The core intent of the rule is sound and useful.
| - Source shared utilities: `source benchmark_lib.sh` | ||
| - Functions: `check_env_vars()`, `wait_for_server_ready()`, `run_benchmark_serving()`, `run_eval()`, `append_lm_eval_summary()` | ||
| - Parameters passed via environment variables | ||
| - **MTP scripts MUST pass `--use-chat-template` to `run_benchmark_serving` — no exceptions.** EAGLE-style speculative decoding is trained against chat-formatted inputs, so benchmarking against raw prompts silently regresses acceptance rate and produces misleading numbers. This applies to every `*_mtp.sh` script regardless of model, precision, or runner. |
There was a problem hiding this comment.
🟡 The new MTP rule only references the *_mtp.sh naming convention, but multi-node AMD MTP benchmarks use a different pattern: server.sh exports IS_MTP=true when DECODE_MTP_SIZE > 0, and bench.sh conditionally adds --use-chat-template based on that variable. A developer adding a new multi-node AMD MTP configuration (not named *_mtp.sh) would read this rule and likely conclude it doesn't apply. Consider expanding the rule to mention both patterns.
Extended reasoning...
Two distinct MTP patterns exist in the codebase, but the rule only documents one.
The newly added rule states: "This applies to every *_mtp.sh script regardless of model, precision, or runner." This accurately describes the single-node benchmark pattern (e.g., dsr1_fp8_b200_mtp.sh) where scripts are explicitly named with the _mtp suffix and must call run_benchmark_serving with --use-chat-template directly.
However, multi-node AMD MTP benchmarks follow a completely different mechanism. In benchmarks/multi_node/amd_utils/server.sh (lines 459–463), IS_MTP=true is exported when DECODE_MTP_SIZE > 0. Then in bench.sh (line 60), --use-chat-template is conditionally injected via $( [ "$IS_MTP" = "true" ] && echo "--use-chat-template" ). These scripts are named things like dsr1_fp8_mi355x_sglang-disagg.sh — they do not match *_mtp.sh.
Why the refutation is partially valid but incomplete: The refutation correctly observes that the existing bench.sh infrastructure already handles --use-chat-template automatically for multi-node AMD MTP configurations, meaning no current script is broken. However, this only holds for scripts that reuse bench.sh. A developer writing a new multi-node AMD MTP script that doesn't route through bench.sh, or who is setting up the IS_MTP=true dispatch chain for the first time, could read AGENTS.md, see their script isn't named *_mtp.sh, and not realize the requirement applies.
Concrete scenario: Developer adds a new multi-node AMD MTP configuration and writes a custom launch wrapper that calls run_benchmark_serving directly (bypassing bench.sh). They check AGENTS.md and see: "This applies to every *_mtp.sh script" — their script is not named *_mtp.sh, so they move on without adding --use-chat-template. The acceptance rate silently collapses, producing misleading benchmark numbers — exactly the failure mode the rule intends to prevent.
Suggested fix: Expand the final sentence to cover both patterns, e.g.: "This applies to every *_mtp.sh script AND to any script where IS_MTP=true is set (the multi-node AMD MTP pattern), regardless of model, precision, or runner."
Summary
Adds a one-liner to AGENTS.md under Code Conventions → Bash stating that every MTP (
*_mtp.sh) script MUST pass--use-chat-templatetorun_benchmark_serving, no exceptions.Why
EAGLE-style speculative decoding is trained against chat-formatted inputs. When the benchmark client sends raw prompts, the draft model's acceptance rate silently collapses and the numbers no longer reflect realistic serving. Several recent MTP PRs shipped without the flag (#1074, #1075, #1076, #1077), so codifying this in AGENTS.md keeps future MTP additions from hitting the same foot-gun.
Test plan
🤖 Generated with Claude Code