Skip to content

fix(skill/review): enforce parallel agent dispatch for weaker models#3276

Closed
wenshao wants to merge 4 commits intoQwenLM:mainfrom
wenshao:fix/review-parallel-dispatch
Closed

fix(skill/review): enforce parallel agent dispatch for weaker models#3276
wenshao wants to merge 4 commits intoQwenLM:mainfrom
wenshao:fix/review-parallel-dispatch

Conversation

@wenshao
Copy link
Copy Markdown
Collaborator

@wenshao wenshao commented Apr 14, 2026

Summary

  • Strengthens the /review skill's Step 4 instructions so that qwen3.6-plus (and other models that were previously serializing the review agents) now reliably launches all of them in a single assistant turn.
  • Replaces the one-sentence dispatch instruction with a prominent callout containing a rationale, ✅ CORRECT / ❌ WRONG ASCII examples, an explicit self-check before ending the turn, and a pattern-break ("STOP") for the common failure mode.
  • Surfaces the rule in the top-level "Critical rules" list at the top of the skill, so the model sees it before it reaches Step 4.

Why

On qwen3.6-plus, Step 4 of /review was launching the 5 review agents sequentially (one per assistant turn) instead of dispatching them all in a single turn. The original phrasing ("invoking all tools in a single response") was too abstract for the model to consistently follow. Sequential dispatch multiplies review latency ~5× for no benefit, since none of the agents depend on another's output.

The new phrasing uses the concrete-example + anti-pattern + self-check style, which weaker models respond to more reliably, without changing any actual review logic.

Scope

Prompt-only change. This PR edits a single Markdown file (packages/core/src/skills/bundled/review/SKILL.md) that is loaded as a runtime prompt for the /review skill. No TypeScript, no build changes, no tests to run — the change takes effect the next time the skill is loaded.

Test plan

  • Manually run /review <pr-number> on qwen3.6-plus after this lands and confirm all 5 agents launch in the same assistant turn
  • Sanity-check that models which already parallelized correctly (e.g. Claude, GPT-4 class) are unaffected

Strengthen the /review skill's Step 4 instructions so models that
previously serialized the review agents now reliably launch all of them
in a single assistant turn. Adds a prominent callout with CORRECT/WRONG
ASCII examples, an explicit self-check, and surfaces the rule in the
top-level "Critical rules" list so it is seen before Step 4.
@github-actions
Copy link
Copy Markdown
Contributor

📋 Review Summary

This PR strengthens the /review skill's Step 4 instructions to enforce parallel agent dispatch across all models, particularly weaker ones that previously serialized agent launches. The changes are well-targeted, using concrete examples and anti-patterns to guide model behavior. Overall assessment: solid documentation improvement with clear rationale and actionable guidance.

🔍 General Feedback

  • Positive approach: The use of ASCII diagrams (✅ CORRECT / ❌ WRONG) is an excellent pattern for guiding model behavior — concrete examples are far more effective than abstract instructions.
  • Good structure: Moving the parallel dispatch rule to the "Critical rules" section at the top ensures models encounter it before reaching Step 4, which is a smart intervention point.
  • Clear rationale: The "Why it matters" callout explaining the 5× latency multiplication provides necessary context for models to understand the importance.
  • Self-check mechanism: The "verify exactly 5 task tool calls" self-check is a practical guard against the sequential dispatch bug.
  • Single-file change: Focused modification to only the skill documentation — no code changes, minimal risk.

🎯 Specific Feedback

🔵 Low

  • Line 21 (new rule Where is the config saved? #2): The rule numbering shift (old rule 2 → rule 3) is correct, but consider whether the Step 4 parallel dispatch rule should also be referenced in Step 9's "Create Review API" rule for consistency. Currently Step 9 focuses on the API format but doesn't reinforce the parallel dispatch requirement.

  • Line 141 (CRITICAL callout): The phrase "extremely rare" regarding runtime rejection of parallel tool calls may be unnecessary hedging. Consider simplifying to: "If the runtime genuinely does not support parallel tool calls, fall back to sequential dispatch..." — this removes the speculative frequency assessment.

  • Line 167 (agent prompt guidance): The revised text mentions "under 200 words" but doesn't explain why this matters (to fit all tool calls in one response without hitting output-length limits). The original diff shows this was added, but consider whether weaker models would benefit from an explicit connection: "Keep each agent's prompt short (under 200 words) so all tool calls fit in one assistant turn without hitting output-length limits."

✅ Highlights

  • Excellent use of visual patterns: The ✅/❌ ASCII examples with explicit "STOP" callouts directly address the common failure mode of sequential dispatch.
  • Strong justification: The PR summary clearly explains why the original phrasing failed ("too abstract for weaker models") and how the new approach addresses it ("concrete-example + anti-pattern + self-check style").
  • Comprehensive coverage: The callout addresses multiple failure modes (single-agent sequential, multi-batch sequential) and provides a clear self-check mechanism.
  • Good test plan: The test plan covers all three review scenarios (local diff, PR number, cross-repo URL) and explicitly calls out verification of parallel dispatch behavior.

wenshao added 3 commits April 15, 2026 10:35
Two small changes that together make /review-style parallel multi-agent
flows more reliable on OpenAI-compatible endpoints (primarily DashScope
Qwen code models):

1. Set parallel_tool_calls=true whenever tools are present. Some
   providers and models — notably Qwen code models — default to
   sequential tool dispatch when this flag is unset, which serializes
   multi-agent flows (e.g. the /review skill's Step 4 launching 5 review
   agents in one turn) into one-agent-per-turn round trips. Providers
   that do not recognize this parameter simply ignore it.

2. Retry on DashScope's server-side "function.arguments must be in JSON
   format" 400 error. This rejection happens when the model generates a
   tool call whose arguments string cannot be parsed as JSON — a
   non-deterministic model output defect, not a user error, so the same
   request usually succeeds on retry. Previously 400s were never
   retried; now this specific class flows through the existing
   retryWithBackoff path while all other 400s still fail fast.
@wenshao wenshao marked this pull request as draft April 18, 2026 20:00
@wenshao wenshao closed this Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant