feat(autorun): playbook-level HITL gate markers (#232) by chr1syy · Pull Request #1015 · RunMaestro/Maestro

chr1syy · 2026-05-18T18:55:54Z

Closes #232

Summary

Adds support for playbook-level HITL (human-in-the-loop) gates in Auto Run — Option A from the issue. Authors mark review points in playbook documents with:

## Step 5: Review Specification

<!-- MAESTRO:HITL reason="Spec ready for review" artifact=".maestro/outputs/SPEC.md" -->

- [ ] Human has reviewed and approved the specification

When Auto Run reaches a marker that is positioned before an unchecked task (with no intervening checked task), it pauses, surfaces a sticky toast with the reason + artifact, and the user gets the existing AutoRunErrorBanner with Resume and Abort Run buttons. Once the user ticks the human-approval checkbox, the marker is treated as consumed and the run continues to the next gate or end-of-doc.

This eliminates the orchestration burden that split playbooks (`SpecFlow_1_Specify` → `SpecFlow_2_Plan` → `SpecFlow_3_Implement`) currently put on users — the workflow now manages itself between phases, while the human stays in the driver's seat at decision points.

Why

From the issue: in interactive mode, review gates happen naturally because you're watching. In Auto Run they get skipped, so 24-hour parallel sessions blow past review points and the agent has already done significant downstream work by the time you notice. Maestro should pause at natural review checkpoints (after spec, after plan, after impl, after tests) and only continue once a human approves.

Per the maintainer's question — "why not split playbooks into phases?" — the reporter's response was that split playbooks invert the relationship: the human ends up managing sequencing, context handoffs, and "which playbook is next?" instead of just making approval decisions. Integrated gates keep one workflow, one context, with human-as-decider rather than human-as-orchestrator.

How it works

The parser `findPendingHitlGate(content)` in `batchUtils.ts` walks the document line-by-line:

Skips fenced code blocks (so docs can include the marker syntax as examples)
Treats checked tasks as "consuming" any pending marker above them — the user already approved
Returns the first marker in a pending chain when the next unchecked task is reached
Returns null when an unchecked task appears before any marker (no gate above it)

In `useBatchRunner.ts`, the runner calls the parser at the top of the inner task loop (after the existing error-resolution `await` block, before `processTask`). On hit it synthesizes a recoverable `AgentError { type: 'hitl_gate' }` and routes through the existing `pauseBatchOnError` — so the AutoRunErrorBanner UI, the PAUSED_ERROR state machine entry, the `errorResolutionRefs` promise chain, and the Resume/Skip/Abort actions all just work without new UI plumbing. The toast uses `dismissible: true` so the user must click it (review gates shouldn't auto-dismiss).

After Resume, the loop re-reads the doc so the next HITL check sees fresh content — if the user ticked the box, the marker is now consumed and we continue; if they didn't, we re-pause.

Changes

`src/shared/types.ts` — add `'hitl_gate'` to the `AgentErrorType` union (1 line).
`src/renderer/hooks/batch/batchUtils.ts` — add `HitlGate` type and `findPendingHitlGate(content)` parser. Reuses the existing fence-aware iteration pattern from `countMarkdownTasks`.
`src/renderer/hooks/batch/internal/useBatchRunner.ts` — call the parser between iterations, synthesize the `AgentError`, route through `pauseBatchOnError`, emit a sticky toast. Re-read doc on resume.
`src/renderer/hooks/batch/useBatchProcessor.ts` — thread `pauseBatchOnError` into the runner's deps.
`src/tests/renderer/hooks/batch/batchUtilsHitl.test.ts` — 13 parser tests.

No new UI components. No new IPC. No main-process changes.

Out of scope (deferred)

Option B from the issue (CLI-based `maestro signal --pause` or file-watching). Option A covers the SpecFlow case cleanly; we can revisit if external workflow tools need it.
Dedicated HITL banner styling (currently reuses the orange/red error banner). The text says "Auto Run Paused" + the reason, which reads correctly for HITL, but a dedicated "Review requested" theme could land in a follow-up.
Marker validation/lint inside the playbook editor.
Telemetry for HITL gate usage.

Parser test coverage

no marker present
no tasks at all
marker precedes the first unchecked task ✅
unchecked task appears before any marker
checked task consumes the marker above it
re-pause on a fresh marker after a previously consumed one
multiple markers in a chain → returns the first
missing artifact attribute
missing reason attribute (falls back to default)
markers inside fenced code blocks are ignored
real marker outside a fenced block is detected when an example exists inside one
records 0-indexed line number
CRLF line endings

Test plan

`npm run lint` — TypeScript across all configs
`npm run lint:eslint` — ESLint on the changed files
`npx prettier --check` — all changed files
`npx vitest run src/tests/renderer/hooks/batch/ src/tests/cli/services/batch-processor.test.ts` — 213/213 passing, including 13 new HITL parser tests
Manual: write a playbook with a HITL marker, start Auto Run, verify it pauses with the AutoRunErrorBanner and sticky toast, tick the human-approval box, click Resume, verify it continues to the next gate
Manual: verify Abort Run from the HITL paused state cleanly stops the run
Manual: verify markers inside fenced code blocks in the playbook docs don't trigger pauses

Summary by CodeRabbit

New Features
- Added human-in-the-loop (HITL) review gate support: batch/auto-run pauses at HITL markers and shows resume/abort controls with a sticky info toast.
- Pauses are deduplicated per marker and re-check state when resuming so only relevant tasks are affected.
Tests
- Added test coverage validating HITL marker detection and handling across common edge cases (line endings, code blocks, consumed vs. pending markers).

Add support for  markers in playbook documents. When a marker is positioned before an unchecked task with no intervening checked task, Auto Run pauses via the existing error-resolution infrastructure: the AutoRunErrorBanner shows "Resume" and "Abort Run" buttons, and a sticky toast surfaces the reason + artifact so the user can review the work-in-progress before approving. Once the user ticks the human-approval checkbox, the marker is treated as consumed and Auto Run continues to the next gate or end-of-doc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-18T18:56:12Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 86944d0c-3dbb-4aff-a054-a0f5856d735b

📥 Commits

Reviewing files that changed from the base of the PR and between b4c8a18 and f63f371.

📒 Files selected for processing (1)

src/renderer/hooks/batch/internal/useBatchRunner.ts

📝 Walkthrough

Walkthrough

The PR implements HITL (human-in-the-loop) gates for playbook auto-run mode. It adds markdown marker parsing () to detect review points, integrates gate detection into the task processing loop, and pauses auto-run with a recoverable error when a gate is encountered. Upon resume, the runner re-reads document state and continues.

Changes

HITL Gates for Auto-Run Mode

Layer / File(s)	Summary
Gate error type `src/shared/types.ts`	`AgentErrorType` union adds `'hitl_gate'` discriminator for HITL pause errors.
HITL gate detection logic `src/renderer/hooks/batch/batchUtils.ts`, `src/__tests__/renderer/hooks/batch/batchUtilsHitl.test.ts`	Regex, `HitlGate` interface, and `findPendingHitlGate()` scan markdown while skipping code fences, track pending markers across task boundaries, extract `reason`/`artifact` attributes, and return line numbers. Test suite validates marker/task pairing rules, attribute defaults, fence ignoring, and CRLF handling.
Batch runner HITL integration `src/renderer/hooks/batch/internal/useBatchRunner.ts`	`UseBatchRunnerDeps` adds `pauseBatchOnError` callback. Task loop checks for pending gates before each task; when found, creates recoverable `hitl_gate` error, calls pause handler to surface resume/abort UI, logs encounter, posts notification (deduped per marker line), and re-reads document state on HITL-scoped resume.
Processor wiring `src/renderer/hooks/batch/useBatchProcessor.ts`	Passes `pauseBatchOnError` into `useBatchRunner` to complete orchestration layer connectivity.

Sequence Diagram

sequenceDiagram
  participant Processor as TaskProcessor
  participant Detector as GateDetector
  participant Handler as PauseHandler
  participant UI as ErrorUI
  Processor->>Detector: findPendingHitlGate(docContent)
  Detector-->>Processor: HitlGate or null
  alt Gate Found
    Processor->>Processor: Create AgentError type hitl_gate
    Processor->>Handler: pauseBatchOnError(error)
    Handler->>UI: Show Resume Abort controls
    UI-->>Handler: User resumes
    Handler-->>Processor: Continue
    Processor->>Processor: Re-read docContent
  end
  Processor->>Processor: Process next task

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly Related PRs

RunMaestro/Maestro#726: Fixes errorPaused session resolution in useBatchHandlers to correctly handle the same pause/resume/abort error UI flow that this PR now triggers for HITL gates.

Poem

🐰 I sniff the markdown line by line,
A tiny gate that says "Pause, be kind."
I tap the human for a look and nod,
They click resume—the program hops abroad,
Hooray! The run continues, all aligned.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly identifies the main feature: playbook-level HITL gate markers for auto-run mode, matching the substantial changes in batchUtils, useBatchRunner, and related files.
Linked Issues check	✅ Passed	The implementation fulfills all coding objectives from issue `#232`: adds HITL gate detection via findPendingHitlGate, integrates with pauseBatchOnError for pause/resume flow, handles marker syntax with reason/artifact, ignores fenced code blocks, and tracks checked tasks as consuming markers.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to implementing HITL gates: adding type definitions, parser logic, integration with existing error handling, and test coverage. No unrelated modifications detected.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-05-18T19:02:15Z

Greptile Summary

This PR adds playbook-level HITL (human-in-the-loop) gate support to Auto Run. When a  marker precedes an unchecked approval task, the runner pauses via the existing pauseBatchOnError / AutoRunErrorBanner infrastructure and emits a sticky toast. Resuming re-reads the document; if the user checked the approval box, the gate is consumed and the run continues.

batchUtils.ts: adds findPendingHitlGate(content) — a fence-aware, line-by-line parser that returns the first pending gate or null.
useBatchRunner.ts: calls the parser before each processTask, synthesizes a hitl_gate AgentError, routes through pauseBatchOnError, and emits a sticky toast; also re-reads the document after any error resume.
types.ts / useBatchProcessor.ts: extend AgentErrorType with 'hitl_gate' and thread pauseBatchOnError into the runner's deps.

Confidence Score: 3/5

The runner integration has two correctness issues that will visibly misbehave in the intended happy path before fixes are applied.

The parser and type extension are solid and well-tested. The sticky toast is emitted on every gate loop iteration, so repeated Resume-without-tick stacks undismissed toasts. The document re-read added for HITL resume is placed in the generic error-resolution fall-through, running for all pre-existing error types and adding an unguarded async I/O call that can propagate and crash the batch runner session.

src/renderer/hooks/batch/internal/useBatchRunner.ts — the notifyToast placement and the unconditional document re-read in the error-resolution block.

Important Files Changed

Filename	Overview
src/renderer/hooks/batch/internal/useBatchRunner.ts	Adds HITL gate detection before each processTask call and resumes with a document re-read; the notifyToast call inside the gate block fires on every loop iteration when the gate is still pending, producing duplicate sticky toasts on repeated Resume-without-tick.
src/renderer/hooks/batch/batchUtils.ts	Adds HitlGate type and findPendingHitlGate parser; logic is clean, well-tested, and consistent with the existing fence-aware countMarkdownTasks pattern.
src/renderer/hooks/batch/useBatchProcessor.ts	Threads pauseBatchOnError into useBatchRunner deps — a one-line mechanical change with no logic risk.
src/shared/types.ts	Adds 'hitl_gate' to AgentErrorType union — minimal and correct.
src/tests/renderer/hooks/batch/batchUtilsHitl.test.ts	13 parser tests covering all documented edge cases including fenced blocks, CRLF, missing attributes, and multi-marker chains.

Sequence Diagram

sequenceDiagram
    participant L as TaskLoop
    participant P as findPendingHitlGate
    participant B as pauseBatchOnError
    participant T as notifyToast
    participant U as Human
    participant R as errorResolutionRef

    L->>P: docContent
    alt gate pending
        P-->>L: HitlGate
        L->>B: pauseBatchOnError sets errorResolutionRef
        L->>T: "notifyToast dismissible=true"
        L->>L: continue
        L->>R: await promise
        alt Resume without ticking checkbox
            U-->>R: resolve resume
            L->>L: re-read doc all resume types
            L->>P: unchanged docContent
            P-->>L: same HitlGate
            L->>B: pauseBatchOnError again
            L->>T: NEW sticky toast accumulates
        else Resume after ticking checkbox
            U-->>R: resolve resume
            L->>L: re-read doc checkbox now checked
            P-->>L: null gate consumed
            L->>L: processTask normal flow
        else Abort
            U-->>R: resolve abort
            L->>L: break
        end
    else no gate
        P-->>L: null
        L->>L: processTask
    end

_{Reviews (1): Last reviewed commit: "feat(autorun): playbook-level HITL gate ..." | Re-trigger Greptile}

Addresses two P1 review comments on PR RunMaestro#1015: 1. Stacking sticky toasts on Resume-without-tick. pauseBatchOnError re-fires on every loop iteration that re-detects the same gate, so clicking Resume without ticking the approval checkbox previously queued a fresh "Auto Run paused for review" toast each iteration. Track the active gate's line number via a per-run local; only emit the toast the first time we pause at a given line. Cleared when a new gate fires on a different line, or when no gate is detected. 2. Document re-read silently applied to ALL resume types. The re-read was inserted into the generic error-resolution fall-through, so agent_crashed / permission_denied / etc. resumes started re-reading from disk too — uncaught I/O failures would crash the run for a case that previously recovered gracefully. Now the re-read fires only when the resume came from a HITL pause (gated on `activeHitlGateLine !== null`) and is wrapped in try/catch with a warn-and-continue fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chr1syy · 2026-05-18T19:27:12Z

Pushed f63f37161 addressing both Greptile P1 comments. Threads replied inline.

On the failing test job: the failures are all in src/__tests__/main/agents/claude-usage-startup.test.ts, which doesn't exist on this branch — that file was added by #1010 (Maestro-P) merging to rc after this PR was opened. The same test job is failing on rc itself at 85b078177:

https://github.com/RunMaestro/Maestro/actions/runs/26053802690

CI on rc passed at b96e9a07e (the commit this PR branched off) and started failing at the next merge. This is unrelated to the HITL changes here and should be fixed in rc. Happy to rebase once that's green, or leave the PR as-is.

pedramamini · 2026-05-18T20:46:42Z

Thanks for the contribution, @chr1syy — this is a really nicely scoped PR. 🙌

Reviewed the changes and the response to Greptile's two P1s, and I'm happy with it:

The activeHitlGateLine dedupe is the right call — both for the toast-stacking issue and for keeping the doc re-read HITL-scoped so other recoverable-error resume paths keep their pre-PR behavior. The try/catch around readDocAndCountTasks is the right level of paranoia for a paused runner where the SSH session or file may have changed underneath us.
Parser is fence-aware, handles CRLF, falls back gracefully on missing attributes, and reuses the existing countMarkdownTasks iteration pattern. 13 parser tests cover the edge cases I'd worry about (multi-marker chains, consumed-then-fresh, fenced examples with a real marker outside the fence).
Reusing pauseBatchOnError + AutoRunErrorBanner instead of building new UI plumbing keeps the surface area tiny — exactly the kind of integration I'd want here.
The big block comments above the HITL-resume re-read and the gate-detection block earn their weight by explaining why (re-pause-forever risk, dedupe rationale) rather than just narrating the code.

You're correct that the failing test job is unrelated — claude-usage-startup.test.ts was added on rc after this PR was opened (in #1010) and the same job is failing directly on rc at 85b078177. No need to rebase on my account; I'll get rc green separately.

Labeling approved — go ahead and merge once you're ready.

greptile-apps Bot reviewed May 18, 2026

View reviewed changes

Comment thread src/renderer/hooks/batch/internal/useBatchRunner.ts Outdated

Comment thread src/renderer/hooks/batch/internal/useBatchRunner.ts Outdated

chr1syy added the ready to merge This PR is ready to merge label May 18, 2026

pedramamini added the approved label May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(autorun): playbook-level HITL gate markers (#232)#1015

feat(autorun): playbook-level HITL gate markers (#232)#1015
chr1syy wants to merge 2 commits into
RunMaestro:rcfrom
chr1syy:feat/232-autorun-hitl-gates

chr1syy commented May 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 18, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Possibly Related PRs

Poem

Uh oh!

greptile-apps Bot commented May 18, 2026

Uh oh!

Uh oh!

Uh oh!

chr1syy commented May 18, 2026

Uh oh!

pedramamini commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chr1syy commented May 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

How it works

Changes

Out of scope (deferred)

Parser test coverage

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Possibly Related PRs

Poem

Uh oh!

greptile-apps Bot commented May 18, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

chr1syy commented May 18, 2026

Uh oh!

pedramamini commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chr1syy commented May 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 18, 2026 •

edited

Loading