feat(autorun): playbook-level HITL gate markers (#232)#1015
Conversation
Add support for <!-- MAESTRO:HITL reason="..." artifact="..." --> markers in playbook documents. When a marker is positioned before an unchecked task with no intervening checked task, Auto Run pauses via the existing error-resolution infrastructure: the AutoRunErrorBanner shows "Resume" and "Abort Run" buttons, and a sticky toast surfaces the reason + artifact so the user can review the work-in-progress before approving. Once the user ticks the human-approval checkbox, the marker is treated as consumed and Auto Run continues to the next gate or end-of-doc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe PR implements HITL (human-in-the-loop) gates for playbook auto-run mode. It adds markdown marker parsing ( ChangesHITL Gates for Auto-Run Mode
Sequence DiagramsequenceDiagram
participant Processor as TaskProcessor
participant Detector as GateDetector
participant Handler as PauseHandler
participant UI as ErrorUI
Processor->>Detector: findPendingHitlGate(docContent)
Detector-->>Processor: HitlGate or null
alt Gate Found
Processor->>Processor: Create AgentError type hitl_gate
Processor->>Handler: pauseBatchOnError(error)
Handler->>UI: Show Resume Abort controls
UI-->>Handler: User resumes
Handler-->>Processor: Continue
Processor->>Processor: Re-read docContent
end
Processor->>Processor: Process next task
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR adds playbook-level HITL (human-in-the-loop) gate support to Auto Run. When a
Confidence Score: 3/5The runner integration has two correctness issues that will visibly misbehave in the intended happy path before fixes are applied. The parser and type extension are solid and well-tested. The sticky toast is emitted on every gate loop iteration, so repeated Resume-without-tick stacks undismissed toasts. The document re-read added for HITL resume is placed in the generic error-resolution fall-through, running for all pre-existing error types and adding an unguarded async I/O call that can propagate and crash the batch runner session. src/renderer/hooks/batch/internal/useBatchRunner.ts — the notifyToast placement and the unconditional document re-read in the error-resolution block. Important Files Changed
Sequence DiagramsequenceDiagram
participant L as TaskLoop
participant P as findPendingHitlGate
participant B as pauseBatchOnError
participant T as notifyToast
participant U as Human
participant R as errorResolutionRef
L->>P: docContent
alt gate pending
P-->>L: HitlGate
L->>B: pauseBatchOnError sets errorResolutionRef
L->>T: "notifyToast dismissible=true"
L->>L: continue
L->>R: await promise
alt Resume without ticking checkbox
U-->>R: resolve resume
L->>L: re-read doc all resume types
L->>P: unchanged docContent
P-->>L: same HitlGate
L->>B: pauseBatchOnError again
L->>T: NEW sticky toast accumulates
else Resume after ticking checkbox
U-->>R: resolve resume
L->>L: re-read doc checkbox now checked
P-->>L: null gate consumed
L->>L: processTask normal flow
else Abort
U-->>R: resolve abort
L->>L: break
end
else no gate
P-->>L: null
L->>L: processTask
end
Reviews (1): Last reviewed commit: "feat(autorun): playbook-level HITL gate ..." | Re-trigger Greptile |
Addresses two P1 review comments on PR RunMaestro#1015: 1. Stacking sticky toasts on Resume-without-tick. pauseBatchOnError re-fires on every loop iteration that re-detects the same gate, so clicking Resume without ticking the approval checkbox previously queued a fresh "Auto Run paused for review" toast each iteration. Track the active gate's line number via a per-run local; only emit the toast the first time we pause at a given line. Cleared when a new gate fires on a different line, or when no gate is detected. 2. Document re-read silently applied to ALL resume types. The re-read was inserted into the generic error-resolution fall-through, so agent_crashed / permission_denied / etc. resumes started re-reading from disk too — uncaught I/O failures would crash the run for a case that previously recovered gracefully. Now the re-read fires only when the resume came from a HITL pause (gated on `activeHitlGateLine !== null`) and is wrapped in try/catch with a warn-and-continue fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed On the failing CI on |
|
Thanks for the contribution, @chr1syy — this is a really nicely scoped PR. 🙌 Reviewed the changes and the response to Greptile's two P1s, and I'm happy with it:
You're correct that the failing Labeling |
Closes #232
Summary
Adds support for playbook-level HITL (human-in-the-loop) gates in Auto Run — Option A from the issue. Authors mark review points in playbook documents with:
When Auto Run reaches a marker that is positioned before an unchecked task (with no intervening checked task), it pauses, surfaces a sticky toast with the reason + artifact, and the user gets the existing
AutoRunErrorBannerwith Resume and Abort Run buttons. Once the user ticks the human-approval checkbox, the marker is treated as consumed and the run continues to the next gate or end-of-doc.This eliminates the orchestration burden that split playbooks (`SpecFlow_1_Specify` → `SpecFlow_2_Plan` → `SpecFlow_3_Implement`) currently put on users — the workflow now manages itself between phases, while the human stays in the driver's seat at decision points.
Why
From the issue: in interactive mode, review gates happen naturally because you're watching. In Auto Run they get skipped, so 24-hour parallel sessions blow past review points and the agent has already done significant downstream work by the time you notice. Maestro should pause at natural review checkpoints (after spec, after plan, after impl, after tests) and only continue once a human approves.
Per the maintainer's question — "why not split playbooks into phases?" — the reporter's response was that split playbooks invert the relationship: the human ends up managing sequencing, context handoffs, and "which playbook is next?" instead of just making approval decisions. Integrated gates keep one workflow, one context, with human-as-decider rather than human-as-orchestrator.
How it works
The parser `findPendingHitlGate(content)` in `batchUtils.ts` walks the document line-by-line:
In `useBatchRunner.ts`, the runner calls the parser at the top of the inner task loop (after the existing error-resolution `await` block, before `processTask`). On hit it synthesizes a recoverable `AgentError { type: 'hitl_gate' }` and routes through the existing `pauseBatchOnError` — so the AutoRunErrorBanner UI, the PAUSED_ERROR state machine entry, the `errorResolutionRefs` promise chain, and the Resume/Skip/Abort actions all just work without new UI plumbing. The toast uses `dismissible: true` so the user must click it (review gates shouldn't auto-dismiss).
After Resume, the loop re-reads the doc so the next HITL check sees fresh content — if the user ticked the box, the marker is now consumed and we continue; if they didn't, we re-pause.
Changes
No new UI components. No new IPC. No main-process changes.
Out of scope (deferred)
Parser test coverage
Test plan
Summary by CodeRabbit
New Features
Tests