Skip to content

feat(autorun): playbook-level HITL gate markers (#232)#1015

Open
chr1syy wants to merge 2 commits into
RunMaestro:rcfrom
chr1syy:feat/232-autorun-hitl-gates
Open

feat(autorun): playbook-level HITL gate markers (#232)#1015
chr1syy wants to merge 2 commits into
RunMaestro:rcfrom
chr1syy:feat/232-autorun-hitl-gates

Conversation

@chr1syy
Copy link
Copy Markdown
Contributor

@chr1syy chr1syy commented May 18, 2026

Closes #232

Summary

Adds support for playbook-level HITL (human-in-the-loop) gates in Auto Run — Option A from the issue. Authors mark review points in playbook documents with:

## Step 5: Review Specification

<!-- MAESTRO:HITL reason="Spec ready for review" artifact=".maestro/outputs/SPEC.md" -->

- [ ] Human has reviewed and approved the specification

When Auto Run reaches a marker that is positioned before an unchecked task (with no intervening checked task), it pauses, surfaces a sticky toast with the reason + artifact, and the user gets the existing AutoRunErrorBanner with Resume and Abort Run buttons. Once the user ticks the human-approval checkbox, the marker is treated as consumed and the run continues to the next gate or end-of-doc.

This eliminates the orchestration burden that split playbooks (`SpecFlow_1_Specify` → `SpecFlow_2_Plan` → `SpecFlow_3_Implement`) currently put on users — the workflow now manages itself between phases, while the human stays in the driver's seat at decision points.

Why

From the issue: in interactive mode, review gates happen naturally because you're watching. In Auto Run they get skipped, so 24-hour parallel sessions blow past review points and the agent has already done significant downstream work by the time you notice. Maestro should pause at natural review checkpoints (after spec, after plan, after impl, after tests) and only continue once a human approves.

Per the maintainer's question — "why not split playbooks into phases?" — the reporter's response was that split playbooks invert the relationship: the human ends up managing sequencing, context handoffs, and "which playbook is next?" instead of just making approval decisions. Integrated gates keep one workflow, one context, with human-as-decider rather than human-as-orchestrator.

How it works

The parser `findPendingHitlGate(content)` in `batchUtils.ts` walks the document line-by-line:

  • Skips fenced code blocks (so docs can include the marker syntax as examples)
  • Treats checked tasks as "consuming" any pending marker above them — the user already approved
  • Returns the first marker in a pending chain when the next unchecked task is reached
  • Returns null when an unchecked task appears before any marker (no gate above it)

In `useBatchRunner.ts`, the runner calls the parser at the top of the inner task loop (after the existing error-resolution `await` block, before `processTask`). On hit it synthesizes a recoverable `AgentError { type: 'hitl_gate' }` and routes through the existing `pauseBatchOnError` — so the AutoRunErrorBanner UI, the PAUSED_ERROR state machine entry, the `errorResolutionRefs` promise chain, and the Resume/Skip/Abort actions all just work without new UI plumbing. The toast uses `dismissible: true` so the user must click it (review gates shouldn't auto-dismiss).

After Resume, the loop re-reads the doc so the next HITL check sees fresh content — if the user ticked the box, the marker is now consumed and we continue; if they didn't, we re-pause.

Changes

  • `src/shared/types.ts` — add `'hitl_gate'` to the `AgentErrorType` union (1 line).
  • `src/renderer/hooks/batch/batchUtils.ts` — add `HitlGate` type and `findPendingHitlGate(content)` parser. Reuses the existing fence-aware iteration pattern from `countMarkdownTasks`.
  • `src/renderer/hooks/batch/internal/useBatchRunner.ts` — call the parser between iterations, synthesize the `AgentError`, route through `pauseBatchOnError`, emit a sticky toast. Re-read doc on resume.
  • `src/renderer/hooks/batch/useBatchProcessor.ts` — thread `pauseBatchOnError` into the runner's deps.
  • `src/tests/renderer/hooks/batch/batchUtilsHitl.test.ts` — 13 parser tests.

No new UI components. No new IPC. No main-process changes.

Out of scope (deferred)

  • Option B from the issue (CLI-based `maestro signal --pause` or file-watching). Option A covers the SpecFlow case cleanly; we can revisit if external workflow tools need it.
  • Dedicated HITL banner styling (currently reuses the orange/red error banner). The text says "Auto Run Paused" + the reason, which reads correctly for HITL, but a dedicated "Review requested" theme could land in a follow-up.
  • Marker validation/lint inside the playbook editor.
  • Telemetry for HITL gate usage.

Parser test coverage

  • no marker present
  • no tasks at all
  • marker precedes the first unchecked task ✅
  • unchecked task appears before any marker
  • checked task consumes the marker above it
  • re-pause on a fresh marker after a previously consumed one
  • multiple markers in a chain → returns the first
  • missing artifact attribute
  • missing reason attribute (falls back to default)
  • markers inside fenced code blocks are ignored
  • real marker outside a fenced block is detected when an example exists inside one
  • records 0-indexed line number
  • CRLF line endings

Test plan

  • `npm run lint` — TypeScript across all configs
  • `npm run lint:eslint` — ESLint on the changed files
  • `npx prettier --check` — all changed files
  • `npx vitest run src/tests/renderer/hooks/batch/ src/tests/cli/services/batch-processor.test.ts` — 213/213 passing, including 13 new HITL parser tests
  • Manual: write a playbook with a HITL marker, start Auto Run, verify it pauses with the AutoRunErrorBanner and sticky toast, tick the human-approval box, click Resume, verify it continues to the next gate
  • Manual: verify Abort Run from the HITL paused state cleanly stops the run
  • Manual: verify markers inside fenced code blocks in the playbook docs don't trigger pauses

Summary by CodeRabbit

  • New Features

    • Added human-in-the-loop (HITL) review gate support: batch/auto-run pauses at HITL markers and shows resume/abort controls with a sticky info toast.
    • Pauses are deduplicated per marker and re-check state when resuming so only relevant tasks are affected.
  • Tests

    • Added test coverage validating HITL marker detection and handling across common edge cases (line endings, code blocks, consumed vs. pending markers).

Review Change Stack

Add support for <!-- MAESTRO:HITL reason="..." artifact="..." --> markers
in playbook documents. When a marker is positioned before an unchecked
task with no intervening checked task, Auto Run pauses via the existing
error-resolution infrastructure: the AutoRunErrorBanner shows "Resume"
and "Abort Run" buttons, and a sticky toast surfaces the reason +
artifact so the user can review the work-in-progress before approving.

Once the user ticks the human-approval checkbox, the marker is treated
as consumed and Auto Run continues to the next gate or end-of-doc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 86944d0c-3dbb-4aff-a054-a0f5856d735b

📥 Commits

Reviewing files that changed from the base of the PR and between b4c8a18 and f63f371.

📒 Files selected for processing (1)
  • src/renderer/hooks/batch/internal/useBatchRunner.ts

📝 Walkthrough

Walkthrough

The PR implements HITL (human-in-the-loop) gates for playbook auto-run mode. It adds markdown marker parsing (<!-- MAESTRO:HITL ... -->) to detect review points, integrates gate detection into the task processing loop, and pauses auto-run with a recoverable error when a gate is encountered. Upon resume, the runner re-reads document state and continues.

Changes

HITL Gates for Auto-Run Mode

Layer / File(s) Summary
Gate error type
src/shared/types.ts
AgentErrorType union adds 'hitl_gate' discriminator for HITL pause errors.
HITL gate detection logic
src/renderer/hooks/batch/batchUtils.ts, src/__tests__/renderer/hooks/batch/batchUtilsHitl.test.ts
Regex, HitlGate interface, and findPendingHitlGate() scan markdown while skipping code fences, track pending markers across task boundaries, extract reason/artifact attributes, and return line numbers. Test suite validates marker/task pairing rules, attribute defaults, fence ignoring, and CRLF handling.
Batch runner HITL integration
src/renderer/hooks/batch/internal/useBatchRunner.ts
UseBatchRunnerDeps adds pauseBatchOnError callback. Task loop checks for pending gates before each task; when found, creates recoverable hitl_gate error, calls pause handler to surface resume/abort UI, logs encounter, posts notification (deduped per marker line), and re-reads document state on HITL-scoped resume.
Processor wiring
src/renderer/hooks/batch/useBatchProcessor.ts
Passes pauseBatchOnError into useBatchRunner to complete orchestration layer connectivity.

Sequence Diagram

sequenceDiagram
  participant Processor as TaskProcessor
  participant Detector as GateDetector
  participant Handler as PauseHandler
  participant UI as ErrorUI
  Processor->>Detector: findPendingHitlGate(docContent)
  Detector-->>Processor: HitlGate or null
  alt Gate Found
    Processor->>Processor: Create AgentError type hitl_gate
    Processor->>Handler: pauseBatchOnError(error)
    Handler->>UI: Show Resume Abort controls
    UI-->>Handler: User resumes
    Handler-->>Processor: Continue
    Processor->>Processor: Re-read docContent
  end
  Processor->>Processor: Process next task
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly Related PRs

  • RunMaestro/Maestro#726: Fixes errorPaused session resolution in useBatchHandlers to correctly handle the same pause/resume/abort error UI flow that this PR now triggers for HITL gates.

Poem

🐰 I sniff the markdown line by line,
A tiny gate that says "Pause, be kind."
I tap the human for a look and nod,
They click resume—the program hops abroad,
Hooray! The run continues, all aligned.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly identifies the main feature: playbook-level HITL gate markers for auto-run mode, matching the substantial changes in batchUtils, useBatchRunner, and related files.
Linked Issues check ✅ Passed The implementation fulfills all coding objectives from issue #232: adds HITL gate detection via findPendingHitlGate, integrates with pauseBatchOnError for pause/resume flow, handles marker syntax with reason/artifact, ignores fenced code blocks, and tracks checked tasks as consuming markers.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing HITL gates: adding type definitions, parser logic, integration with existing error handling, and test coverage. No unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 18, 2026

Greptile Summary

This PR adds playbook-level HITL (human-in-the-loop) gate support to Auto Run. When a <!-- MAESTRO:HITL --> marker precedes an unchecked approval task, the runner pauses via the existing pauseBatchOnError / AutoRunErrorBanner infrastructure and emits a sticky toast. Resuming re-reads the document; if the user checked the approval box, the gate is consumed and the run continues.

  • batchUtils.ts: adds findPendingHitlGate(content) — a fence-aware, line-by-line parser that returns the first pending gate or null.
  • useBatchRunner.ts: calls the parser before each processTask, synthesizes a hitl_gate AgentError, routes through pauseBatchOnError, and emits a sticky toast; also re-reads the document after any error resume.
  • types.ts / useBatchProcessor.ts: extend AgentErrorType with 'hitl_gate' and thread pauseBatchOnError into the runner's deps.

Confidence Score: 3/5

The runner integration has two correctness issues that will visibly misbehave in the intended happy path before fixes are applied.

The parser and type extension are solid and well-tested. The sticky toast is emitted on every gate loop iteration, so repeated Resume-without-tick stacks undismissed toasts. The document re-read added for HITL resume is placed in the generic error-resolution fall-through, running for all pre-existing error types and adding an unguarded async I/O call that can propagate and crash the batch runner session.

src/renderer/hooks/batch/internal/useBatchRunner.ts — the notifyToast placement and the unconditional document re-read in the error-resolution block.

Important Files Changed

Filename Overview
src/renderer/hooks/batch/internal/useBatchRunner.ts Adds HITL gate detection before each processTask call and resumes with a document re-read; the notifyToast call inside the gate block fires on every loop iteration when the gate is still pending, producing duplicate sticky toasts on repeated Resume-without-tick.
src/renderer/hooks/batch/batchUtils.ts Adds HitlGate type and findPendingHitlGate parser; logic is clean, well-tested, and consistent with the existing fence-aware countMarkdownTasks pattern.
src/renderer/hooks/batch/useBatchProcessor.ts Threads pauseBatchOnError into useBatchRunner deps — a one-line mechanical change with no logic risk.
src/shared/types.ts Adds 'hitl_gate' to AgentErrorType union — minimal and correct.
src/tests/renderer/hooks/batch/batchUtilsHitl.test.ts 13 parser tests covering all documented edge cases including fenced blocks, CRLF, missing attributes, and multi-marker chains.

Sequence Diagram

sequenceDiagram
    participant L as TaskLoop
    participant P as findPendingHitlGate
    participant B as pauseBatchOnError
    participant T as notifyToast
    participant U as Human
    participant R as errorResolutionRef

    L->>P: docContent
    alt gate pending
        P-->>L: HitlGate
        L->>B: pauseBatchOnError sets errorResolutionRef
        L->>T: "notifyToast dismissible=true"
        L->>L: continue
        L->>R: await promise
        alt Resume without ticking checkbox
            U-->>R: resolve resume
            L->>L: re-read doc all resume types
            L->>P: unchanged docContent
            P-->>L: same HitlGate
            L->>B: pauseBatchOnError again
            L->>T: NEW sticky toast accumulates
        else Resume after ticking checkbox
            U-->>R: resolve resume
            L->>L: re-read doc checkbox now checked
            P-->>L: null gate consumed
            L->>L: processTask normal flow
        else Abort
            U-->>R: resolve abort
            L->>L: break
        end
    else no gate
        P-->>L: null
        L->>L: processTask
    end
Loading

Reviews (1): Last reviewed commit: "feat(autorun): playbook-level HITL gate ..." | Re-trigger Greptile

Comment thread src/renderer/hooks/batch/internal/useBatchRunner.ts Outdated
Comment thread src/renderer/hooks/batch/internal/useBatchRunner.ts Outdated
Addresses two P1 review comments on PR RunMaestro#1015:

1. Stacking sticky toasts on Resume-without-tick. pauseBatchOnError
   re-fires on every loop iteration that re-detects the same gate, so
   clicking Resume without ticking the approval checkbox previously
   queued a fresh "Auto Run paused for review" toast each iteration.
   Track the active gate's line number via a per-run local; only emit
   the toast the first time we pause at a given line. Cleared when a
   new gate fires on a different line, or when no gate is detected.

2. Document re-read silently applied to ALL resume types. The re-read
   was inserted into the generic error-resolution fall-through, so
   agent_crashed / permission_denied / etc. resumes started re-reading
   from disk too — uncaught I/O failures would crash the run for a
   case that previously recovered gracefully. Now the re-read fires
   only when the resume came from a HITL pause (gated on
   `activeHitlGateLine !== null`) and is wrapped in try/catch with a
   warn-and-continue fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@chr1syy
Copy link
Copy Markdown
Contributor Author

chr1syy commented May 18, 2026

Pushed f63f37161 addressing both Greptile P1 comments. Threads replied inline.

On the failing test job: the failures are all in src/__tests__/main/agents/claude-usage-startup.test.ts, which doesn't exist on this branch — that file was added by #1010 (Maestro-P) merging to rc after this PR was opened. The same test job is failing on rc itself at 85b078177:

https://github.com/RunMaestro/Maestro/actions/runs/26053802690

CI on rc passed at b96e9a07e (the commit this PR branched off) and started failing at the next merge. This is unrelated to the HITL changes here and should be fixed in rc. Happy to rebase once that's green, or leave the PR as-is.

@chr1syy chr1syy added the ready to merge This PR is ready to merge label May 18, 2026
@pedramamini
Copy link
Copy Markdown
Collaborator

Thanks for the contribution, @chr1syy — this is a really nicely scoped PR. 🙌

Reviewed the changes and the response to Greptile's two P1s, and I'm happy with it:

  • The activeHitlGateLine dedupe is the right call — both for the toast-stacking issue and for keeping the doc re-read HITL-scoped so other recoverable-error resume paths keep their pre-PR behavior. The try/catch around readDocAndCountTasks is the right level of paranoia for a paused runner where the SSH session or file may have changed underneath us.
  • Parser is fence-aware, handles CRLF, falls back gracefully on missing attributes, and reuses the existing countMarkdownTasks iteration pattern. 13 parser tests cover the edge cases I'd worry about (multi-marker chains, consumed-then-fresh, fenced examples with a real marker outside the fence).
  • Reusing pauseBatchOnError + AutoRunErrorBanner instead of building new UI plumbing keeps the surface area tiny — exactly the kind of integration I'd want here.
  • The big block comments above the HITL-resume re-read and the gate-detection block earn their weight by explaining why (re-pause-forever risk, dedupe rationale) rather than just narrating the code.

You're correct that the failing test job is unrelated — claude-usage-startup.test.ts was added on rc after this PR was opened (in #1010) and the same job is failing directly on rc at 85b078177. No need to rebase on my account; I'll get rc green separately.

Labeling approved — go ahead and merge once you're ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved ready to merge This PR is ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants