Skip to content

feat: idd-verify orchestration playbook (5 Agents + Recovery Protocol)#73

Merged
kiki830621 merged 3 commits into
mainfrom
idd/52-verify-orchestration-playbook
May 11, 2026
Merged

feat: idd-verify orchestration playbook (5 Agents + Recovery Protocol)#73
kiki830621 merged 3 commits into
mainfrom
idd/52-verify-orchestration-playbook

Conversation

@kiki830621
Copy link
Copy Markdown
Contributor

Refs #52
Refs #70

Summary

/idd-verify Step 2 spawn 機制重構:從 TeamCreate(5 teammates with Read/Grep/Glob/Bash tools, no Write)改為 5 個平行 Agent(subagent_type=general-purpose) calls + 1 Bash codex exec background。NEW Step 2.5 Recovery Protocol section 處理 file-existence check + retry context re-paste + coordinator self-review fallback。

Why

/idd-verify #47 跑時 spawn 5 個 Claude reviewer agents 卻無一產生 findings — verify 從預期的 6-AI ensemble 退化成 1-AI(只剩 Codex)。Root cause 三層(per #52 issue body):

  1. subagent_type=Explore 沒 Write tool → 無法寫 findings 檔
  2. Idle/wake cycle 後 agent 不 hold prompt context → SendMessage retry 沒 explicit 重 paste 就 silent
  3. Prompt 沒明示 output mechanism(file vs SendMessage reply)→ agent idle 不主動回

Plus 本 session 多次 verify cycle 撞到 #70(TeamDelete fail on idle teammates,team 殘留)。

Side effect — #70 structurally dissolves

Switching from TeamCreate to standalone Agent calls structurally removes the TeamDelete failure surface — no team exists to cleanup. #70 will be closed in a separate /idd-close cycle citing this Plan.

Changes (1 commit)

Files Changed

  • plugins/issue-driven-dev/skills/idd-verify/SKILL.md (+259/-104) — Step 0 bootstrap rename + Step 2 engine note + Step 2 spawn restructure + NEW Step 2.5 Recovery Protocol + Step 3 prose update + Step 4 templates sweep + frontmatter allowed-tools cleanup

Plan tier decisions (D1-D5 from approved plan)

D# Decision Why
D1 TeamCreate teammates → 5 standalone Agent(subagent_type=general-purpose) calls Existing teammate tools field lacks Write — same failure mode as Explore. Switching to general-purpose Agent gives full tool set + dissolves #70
D2 NEW Step 2.5 Recovery Protocol section (not inline Step 3 preamble) Recovery is per-reviewer state check, merge is per-finding scan — different granularity, deserves dedicated section
D3 Inline in idd-verify SKILL.md (not separate references/verify-orchestration.md) ~80 lines additions below extraction threshold (~150); verify orchestration intrinsic to idd-verify
D4 ALWAYS re-paste original full prompt on retry #47 confirmed context lost across idle/wake; re-paste ~500 tokens ≪ second-idle coordinator-fallback cost
D5 Devil's Advocate polling loop on sibling findings files (max 30 iter × 5s = 2.5min) Standalone Agent calls lack TeamCreate's wait_for_idle primitive; polling-on-file gives deterministic semantic

Verification — first-real-use validation track

No automated test harness for orchestration — orchestration validates only via first real /idd-verify invocation after merge. Manual smoke matrix from Plan:

# Setup Expected
1 /idd-verify #X first invocation post-merge 5 Agent calls return + 5 findings files non-empty + Bash codex completes + 6-source merged master
2 Force one Agent idle (if inducible) Recovery Protocol fires: retry + second-idle → coordinator self-review + process gap noted
3 All 5 siblings present, Devil's Advocate polling succeeds DA writes findings citing 4 sibling files
4 TeamCreate/TeamDelete absent from session log post-restructure confirms #70 dissolved
5 Cluster-PR mode (multi-issue) verify per-issue findings sections appear correctly

Checklist


🤖 Generated by /idd-implement on PR path (Plan tier approved). Do NOT add Closes #52 — IDD discipline requires manual /idd-close after merge.

…otocol (Refs #52, resolves #70 structurally)

Per #52 Plan tier (#52):

- Step 0 TaskList bootstrap: rename launch_parallel_reviewers description
  to reflect new 6 tool call pattern (5 Agent + 1 Bash codex, no TeamCreate).
  Add NEW recovery_protocol task entry.
- Step 2 engine note preamble: explicit warning subagent_type=Explore 不適合
  verify (read-only, 無 Write tool — #47 incident proof). Document why
  TeamCreate (pre-v2.59.0 model) is rejected: Write-missing teammate tools +
  wait_for_idle context-loss + #70 cleanup gap.
- Step 2 spawn: replace TeamCreate teammates with 5 parallel
  Agent(subagent_type=general-purpose) calls. Each prompt now mandatorily
  contains (a) explicit findings file output path, (b) 'DO NOT idle without
  producing output' rule, (c) retry-context-re-paste hint.
- Step 2 Devil's Advocate sequencing: bash polling loop on sibling findings
  files (max 30 iter × 5s = 2.5min timeout) replaces TeamCreate
  wait_for_idle primitive. Timeout fallback writes 'skipped: timeout' findings.
- NEW Step 2.5 Recovery Protocol section between Step 2 spawn and Step 3
  merge: per-role file existence check + SendMessage/spawn-fresh retry with
  FULL context re-paste (never assume context survived idle/wake) +
  second-idle coordinator self-review fallback + explicit 'Process Gaps'
  section in master report (no silent engine degradation).
- Step 3 prose updates: 'Agent Team and Codex' → '5 reviewer Agents and
  Codex'; source tag '[team:logic+codex]' → '[agents:logic+codex]'.
- Step 4 master report templates (both local/branch and PR mode): sweep
  'Agent Team (5 Claude reviewers)' → '5 general-purpose Agents (Claude
  reviewers, file-based output)'.
- Frontmatter: remove TeamCreate from allowed-tools (no longer used).
- 驗證架構 ASCII tree: update 'Agent Team' label.
- Engine: team alias: keep CLI alias for backward compat, document underlying
  as 5 standalone Agent calls.

Side effect: #70 (TeamDelete cleanup gap on idle teammates) is structurally
dissolved by this Plan — no team to delete since reviewer Agents are
standalone calls that return to coordinator after completion. #70 will be
closed in a separate /idd-close cycle citing this Plan.

Refs #52
Refs #70
…stence, team→agents sweep (Refs #52)

Per /idd-verify --pr 73 round 1 codex findings:

- P1.1 — Devil's Advocate timeout was silently treated as valid review:
  DA polling loop wrote non-empty timeout content + exit 0; Step 2.5a only
  checked file non-empty (-s test). Fix: DA writes sentinel header
  '[STAGE 2.5 RECOVERY: DEVILS_ADVOCATE_TIMEOUT_<n>/4]' on timeout;
  Step 2.5a detects sentinel via head -1 + grep, routes to retry/fallback
  same as missing file case.
- P1.2 — Recovery Protocol Step 2.5b retry used
  '/tmp/verify_<N>_prompt_<role>.md' but Step 2 never instructed coordinator
  to save those prompts. Added explicit pre-spawn prompt persistence note
  in Step 2a with cat > tmpfile <<EOF pattern. Coordinator now saves all 5
  prompts before invoking 5 parallel Agent calls.
- P1.3 — Stale 'team:' source tags in 3 master report templates
  (line ~628-630, 658, 668, 688-690) contradicted Step 3 prose update.
  Swept all 'team:logic+codex' / 'team:security' / 'team:regression' /
  'team:devils-advocate' → 'agents:*'. Also updated architecture ASCII tree
  ('看不到 team 的討論' → '看不到其他 reviewer Agents 的 findings 檔') and
  鐵律 rule ('看不到 team 的討論' → '看不到 5 reviewer Agents 的 findings 檔').

Refs #52
…ting (Refs #52)

Per /idd-verify --pr 73 round 2 codex findings:

- P1.1 — sentinel file persists past Step 2.5a, downstream retry (-s) and
  fallback (! -s) pass without action: sentinel content IS non-empty, so
  retry polling would see it and exit; fallback ! -s test fails. Fix: after
  detecting sentinel in 2.5a, rm -f the file. Now retry/fallback correctly
  see role as missing and proceed.
- P1.2 — Devil's Advocate timeout writer had bash quoting syntax error:
  'Devil\'s' inside single-quoted printf format cannot escape apostrophe
  (bash single quotes don't honor backslash). Rewrote to printf '%s\n\n%s\n'
  with double-quoted format-string args; apostrophe lives inside double quotes
  where no escaping is needed. Empirical bash smoke: PASS (sentinel writes
  + grep detects).

Refs #52
@kiki830621
Copy link
Copy Markdown
Contributor Author

Verify Report — PR #73

Engine

Codex-only degraded mode (Anthropic API rate-limit blocked Claude reviewer ensemble throughout this session). Engine: Codex CLI (gpt-5.5, xhigh reasoning), 3 verify rounds.

Process Gap (per new Step 2.5d, dogfooded immediately on first-real-use): 5 Claude reviewer Agents were NOT spawned this round — Anthropic API limit persisted from #45/#51/#55 verify cycles into #52 verify. Master report still produced (codex carries), but the new Step 2.5 Recovery Protocol's coordinator-self-review-fallback pathway was NOT exercised in production. First-real-use validation deferred until API capacity restored OR fresh session retries /idd-verify --pr 73.

Aggregate

PASS — 0 blocking, 0 follow-up after 3 codex verify rounds.

Scope coverage

  • PR refs: #47 (historical incident reference), #52 (primary), #70 (structurally resolved)
  • Verified scope: #52 + #70 cross-link

Verify history

Round Verdict New P1
R1 FAIL 3 (DA timeout silent / prompt file missing / stale team: source tags)
R2 FAIL 2 (sentinel file persists past retry, only -s checked / DA printf bash quoting 'Devil\\'s' syntax error)
R3 PASS 0

#52 — idd-verify orchestration playbook

Requirements coverage: 6/6 addressed.

Plan tier decisions (D1-D5) all implemented:

  • D1: TeamCreate → 5 standalone Agent(subagent_type=general-purpose) calls ✓
  • D2: NEW Step 2.5 Recovery Protocol section ✓
  • D3: Inline in idd-verify SKILL.md (no separate reference doc) ✓
  • D4: ALWAYS re-paste full prompt on retry ✓ (per-role prompt persistence in /tmp/verify_${NUMBER}_prompt_<role>.md)
  • D5: Devil's Advocate polling-on-files (max 30 iter × 5s) ✓ + sentinel marker on timeout

Cumulative changes:

  • 63d2474 feat(idd-verify): Step 2 spawn restructure + NEW Step 2.5 Recovery Protocol
  • d149b81 fix: round-1 P1 fixes — DA timeout sentinel, prompt persistence, team→agents sweep
  • c3fcfb4 fix: round-2 P1 fixes — sentinel deletion + DA printf quoting

#70 — TeamDelete cleanup gap (structurally resolved)

Resolution mechanism: PR #73 removes TeamCreate from idd-verify Step 2. No team is created → no TeamDelete failure surface exists. The bug surface that #70 reports is now structurally impossible in idd-verify (the originating skill where #70 surfaced this session via verify-pr58 attempt).

Recommendation: /idd-close #70 after this PR merges, citing PR #73's c342aa2-c3fcfb4 commits as resolution. The closing summary should note "resolved as side effect by #52 Plan implementation — no team to delete = no cleanup gap to fix".

Recommendation

Ready to merge + invoke /idd-close #52 after merge + /idd-close #70 separately.

Process Gap Caveat

This PR's verify was conducted under codex-only degraded mode. The NEW 5-Agent + Codex orchestration the PR introduces has NOT been exercised in production yet (since the verify itself didn't use the new mechanism due to Anthropic API limit). First-real-use validation will occur on the NEXT /idd-verify invocation in a fresh session with restored API capacity. This is per Plan D6 "first-real-use validation track" disclaimer (orchestration cannot be tested without an actual verify to run).

Should the next real-use surface new bugs, they will be filed as standard /idd-issue follow-ups citing PR #73 as origin.


Cumulative findings: 5 P1 caught + fixed across 3 codex rounds (zero P0 throughout, on a structural change of this magnitude).

@kiki830621 kiki830621 merged commit c31b097 into main May 11, 2026
@kiki830621 kiki830621 deleted the idd/52-verify-orchestration-playbook branch May 11, 2026 22:53
kiki830621 added a commit that referenced this pull request May 19, 2026
* fix: drop literal 'Closes #N' from pr-flow.md canonical PR-body template (Refs #87 #74)

The canonical PR-body template embedded `Closes #${N}` in the anti-trailer
warning. Skills (idd-implement, idd-all) copied this pattern; on heredoc
substitution it becomes a real `Closes #<num>` that GitHub auto-close matches
context-blind, bypassing /idd-close. Reword to a digit-free form so no
`<keyword> #<number>` substring can exist.

Refs #87 #74

* fix: drop literal 'Closes #N' from idd-implement/idd-all PR-body templates (Refs #74 #87)

idd-implement Step 5.5 and idd-all Phase 5 PR-body heredocs substituted
${NUMBER}/${N} into the anti-trailer warning, producing a real `Closes #<num>`
that GitHub auto-close matched on merge — bypassing /idd-close's gate. Confirmed
incidents: #559, che-apple-mail-mcp#99, #73, #56. Reword all to the digit-free
form. idd-all-chain already used literal '#N' (safe) but is aligned to identical
wording so the safe pattern is uniform and cannot be re-parameterized.

Refs #74 #87

* feat: add idd-verify Step 0.8 PR-body auto-close-trailer scan gate (Refs #74 #87)

Defence-in-depth gate complementing the #87/#74 template rewords. In PR mode,
scans the rendered PR body for GitHub auto-close trailers
(closes|fixes|resolves #<digit>) and warns — these would auto-close the issue
at merge, bypassing /idd-close's checklist gate. Warn-only (a PR body may
legitimately quote the keywords in prose). Adds matching scan_pr_body_trailers
entry to the Step 0 bootstrap TaskList.

Refs #74 #87

* fix: idd-verify Step 0.8 regex — exclude /idd-close skill-invocation false positive (Refs #74 #87)

The initial Step 0.8 regex used \b word boundaries, which matched the 'close'
inside '/idd-close #N' — an IDD skill invocation instruction, not a GitHub
auto-close trailer. GitHub treats 'idd-close' as one hyphenated token and does
NOT auto-close it (verified: PR #94 closingIssuesReferences is empty despite
its body containing '/idd-close #87 #74'). Replace \b prefix with
(^|[^-/[:alnum:]]) so the keyword must not be preceded by '-', '/', or an
alphanumeric — precisely matching GitHub's behavior while still catching
context-blind hits like (Closes #N) / **Closes #N**.

Found by dogfooding the new gate against this PR's own body.

Refs #74 #87

* fix: idd-verify Step 0.8 — use closingIssuesReferences instead of hand-rolled regex (Refs #74 #87)

R1 verify (/idd-verify --pr 94) found the Step 0.8 regex missed the colon
form 'Closes: #87' — a GitHub-documented auto-close form. Reimplementing
GitHub's keyword parser by regex is inherently fragile (colon form, cross-repo
form, issue-URL form all need separate handling, and cross-repo behavior is
itself disputed).

Replace the regex with GitHub's own authoritative parse: gh pr view --json
closingIssuesReferences lists exactly which issues the PR will auto-close on
merge — covering every trailer form GitHub honors, with zero false positive
(it IS GitHub's determination) and zero parser-reimplementation risk. This
supersedes the regex from 3ba7435 + the regex false-positive fix from 5831128;
/idd-close #N skill invocations naturally never appear in closingIssuesReferences
so no special exclusion is needed.

Refs #74 #87

* fix: idd-verify Step 0.8 — drop orphan regex doc, surface gh-failure, use .url (Refs #74 #87)

R2 verify (/idd-verify --pr 94 round 2) found 3 cleanup items, all in Step 0.8:

- DEF-1 (6-AI unanimous, blocking): the R2 regex->closingIssuesReferences
  rewrite deleted the regex code but left the 'Regex 設計' paragraph
  documenting it — orphan dead doc contradicting the new prose. Deleted.
- L-1: '2>/dev/null || true' conflated a gh failure with a clean PR (silent
  fail-open). Switched to 'if CMD; then ... else ...' so a gh failure prints
  a 'Step 0.8 skipped' note instead of silently passing.
- L-2: softened the 'zero false negative by definition' overclaim — note that
  closingIssuesReferences is eventually-consistent (settled by verify time).
- L-3 (Codex polish): query .url instead of bare .number so a cross-repo
  close ref is unambiguous in the warning output.

Refs #74 #87
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: /idd-verify 結束沒確認 teams 被關掉 — TeamDelete fails on idle-but-active reviewers + no cleanup SOP

1 participant