Skip to content

fix: P0 critical safety fixes — retry budgets, context diffing, parallel capture#48

Merged
clintecker merged 24 commits intomainfrom
fix/p0-critical-safety-fixes
Apr 3, 2026
Merged

fix: P0 critical safety fixes — retry budgets, context diffing, parallel capture#48
clintecker merged 24 commits intomainfrom
fix/p0-critical-safety-fixes

Conversation

@clintecker
Copy link
Copy Markdown
Collaborator

@clintecker clintecker commented Mar 31, 2026

Summary

  • Goal-gate retry budgets: goalGateRetryTarget now checks RetryCount against maxRetries before allowing another loop. When exhausted, routes to fallback_target or signals failure with EventStageFailed.
  • Context diffing: New PipelineContext.DiffFrom() method detects side-effect writes (pctx.Set()) that handlers make directly, not just values returned via Outcome.ContextUpdates.
  • Parallel side-effect capture: The parallel handler now merges DiffFrom(snapshot) with explicit ContextUpdates, preventing silent data loss from branch handlers that write to context as a side effect.
  • Event emission: EventStageRetrying and EventStageFailed events are now emitted for goal-gate retry loops, giving the TUI and trace visibility into retry behavior.
  • Includes design spec and implementation plan docs.

Test plan

  • TestContextDiffFrom* — covers new keys, changed keys, unchanged exclusion, empty baseline/context
  • TestParallelHandlerCapturesSideEffectWrites — verifies side-effect-only writes are captured
  • TestParallelHandlerExplicitOverridesSideEffects — verifies explicit ContextUpdates win over side-effects
  • pipeline/engine_goal_gate_test.go — new file covering goal-gate retry budget and fallback routing
  • go test ./... -short — all packages pass

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added goal-gate fallback routing and escalation loop prevention with checkpoint-persisted tracking.
    • Enhanced parallel branch context handling to capture implicit side effects alongside explicit updates.
    • Improved interview question prompting for yes/no scenarios.
  • Bug Fixes

    • Enforce per-goal-gate retry budgets with proper exhaustion detection.
    • Treat whitespace-only interview answers as skipped.
  • Documentation

    • Added P0 critical safety fixes implementation plan and design specification covering circuit breaker, atomic checkpoints, and tool-command injection protection.

clintecker and others added 2 commits March 30, 2026 16:16
Design spec informed by 5-expert adversarial review panel.
10-task implementation plan covering #15, #16, #20, plus atomic
checkpoints and global circuit breaker.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…effect capture

- Add DiffFrom() to PipelineContext for detecting side-effect writes
- Parallel handler now captures pctx.Set() calls, not just Outcome.ContextUpdates
- Goal-gate retry checks budget before looping; routes to fallback when exhausted
- Emit EventStageRetrying/EventStageFailed events for goal-gate retries
- Add comprehensive tests for all three fixes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 98ad7c10-a9b2-4e03-9198-a6f6be438192

📥 Commits

Reviewing files that changed from the base of the PR and between b78a4a1 and 8cd1727.

📒 Files selected for processing (8)
  • cmd/tracker/embed_test.go
  • docs/superpowers/plans/2026-03-30-p0-critical-safety-fixes.md
  • llm/anthropic/adapter.go
  • pipeline/checkpoint.go
  • pipeline/engine_checkpoint.go
  • pipeline/engine_goal_gate_test.go
  • pipeline/handlers/human.go
  • pipeline/handlers/interview_result.go

Walkthrough

This PR implements P0 critical safety fixes for the pipeline engine, including: adding context diffing to capture and merge side-effect writes from parallel branches, enforcing per-goal-gate retry budgets with fallback routing and checkpoint-persisted fallback tracking, adding corresponding tests, and refining interview handler logic.

Changes

Cohort / File(s) Summary
Documentation & Planning
docs/superpowers/plans/2026-03-30-p0-critical-safety-fixes.md, docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md
Added implementation plan and design specification for four P0 pipeline-engine safety issues: parallel context correctness via DiffFrom mechanism, goal-gate retry termination with fallback tracking, tool_command interpolation safety, and atomic checkpoint persistence.
Pipeline Context & Diffing
pipeline/context.go, pipeline/context_test.go
Added exported constants ContextKeyInterviewQuestions and ContextKeyInterviewAnswers, and new DiffFrom(baseline) method to compute context changes against a snapshot; includes five unit tests covering key-present, value-changed, unchanged, empty baseline, and empty context scenarios.
Parallel Context Merging
pipeline/handlers/parallel.go, pipeline/handlers/parallel_test.go
Updated ParallelHandler.runBranch to merge explicit ContextUpdates with implicit side-effect writes captured via DiffFrom, with tests validating side-effect inclusion and explicit-update override semantics.
Goal-Gate Retry Logic
pipeline/engine_checkpoint.go, pipeline/engine_run.go, pipeline/engine_goal_gate_test.go
Expanded goalGateRetryTarget signature to return goal-gate node ID; enforced per-node retry budgets and fallback routing guarded by checkpoint-persisted FallbackTaken tracking; updated handleExitNode to emit tagged events and route to fallback targets; added comprehensive tests for retry exhaustion, node-level max_retries, fallback redirection, and escalation paths.
Checkpoint State
pipeline/checkpoint.go
Added new FallbackTaken map[string]bool field to track which goal-gate nodes have consumed their one-shot fallback route across checkpoint restore cycles.
Interview Handler
pipeline/handlers/human.go, pipeline/handlers/interview_result.go
Reordered ConsoleInterviewer.AskInterview to evaluate IsYesNo first; updated BuildMarkdownSummary to treat whitespace-only answers as empty when computing skipped-question indicators and answer counts.
Test Updates
cmd/tracker/embed_test.go
Renamed builtin workflows count test from 3 to 4, reflecting addition of a new workflow.
LLM Adapter Logging
llm/anthropic/adapter.go
Modified TRACKER_DEBUG logging to always emit a truncated (~200 char) body preview with ...(truncated) suffix and include original byte length instead of conditionally logging full response.

Sequence Diagram(s)

sequenceDiagram
    participant Engine as Engine
    participant Checkpoint as Checkpoint
    participant Handler as HandleExitNode
    participant EventBus as Event System

    loop Until retry budget exhausted or retry succeeds
        Engine->>Engine: Evaluate goal-gate node outcome
        
        Note over Engine: Check if retries exhausted<br/>cp.RetryCount(goalGateNodeID) >= maxRetries
        
        alt Retries remain
            Engine->>Checkpoint: Check FallbackTaken[goalGateNodeID]
            Checkpoint-->>Engine: false (no prior fallback)
            Engine->>Engine: Select retry target<br/>(fallback_target/retry_target/...)
            Engine->>Handler: Route to retry target
            Handler->>Checkpoint: Increment RetryCount(goalGateNodeID)
            Handler->>EventBus: Emit EventStageRetrying<br/>(with attempt/limit from retry count)
            Handler->>Checkpoint: SaveCheckpoint()
            Handler-->>Engine: Continue with new target
        else Retries exhausted
            Engine->>Checkpoint: Check FallbackTaken[goalGateNodeID]
            Checkpoint-->>Engine: false (available)
            Engine->>Engine: Attempt one-shot fallback<br/>(node/graph fallback_target)
            alt Fallback configured & valid
                Engine->>Checkpoint: Set FallbackTaken[goalGateNodeID] = true
                Engine->>Handler: Route to fallback target
                Handler->>EventBus: Emit EventStageFailed<br/>(routing to fallback)
                Handler->>Checkpoint: SaveCheckpoint()
                Handler-->>Engine: Continue with fallback
            else No fallback or already taken
                Engine->>Handler: Terminal failure
                Handler->>EventBus: Emit EventStageFailed<br/>(unsatisfied)
                Handler-->>Engine: Return OutcomeFail
            end
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related issues

  • Expert Panel Review: .dip Format Support — Consolidated Report #39: This PR directly implements the P0 critical safety fixes outlined in the issue, including parallel context loss fixes (via DiffFrom and parallel handler merging), goal-gate retry loop prevention (via FallbackTaken checkpoint tracking), and fallback routing with retry budget enforcement.

Possibly related PRs

Poem

🐰 Contexts now diff and merge with care,
Retries respect their budget fair,
Fallbacks prevent loops in flight,
Checkpoints hold the truth so bright,
A safer pipeline, oh what a sight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.88% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the three primary changes: retry budgets (goal-gate), context diffing (DiffFrom), and parallel capture (side-effect merging). It directly addresses the main objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/p0-critical-safety-fixes

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e7edb0c598

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

continue
}
if _, ok := e.graph.Nodes[fb]; ok {
return fb, nodeID, true, true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop re-entering fallback loop after goal-gate retries exhaust

When retries are exhausted, this branch still returns shouldRetry=true for a fallback target, so handleExitNode treats escalation as another retry (increments retry count and clears only the fallback subtree). That leaves the failed goal-gate node completed and unsatisfied, so reaching exit again selects the same fallback repeatedly. In graphs where fallback does not re-run/satisfy the gate (for example fallback goes straight to done), the pipeline can loop indefinitely instead of terminating. This path needs a one-time fallback guard (or a non-retry return state) so subsequent exits fail once escalation has already been attempted.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (7)
docs/superpowers/plans/2026-03-30-p0-critical-safety-fixes.md (2)

1393-1404: Add language identifier to the dependency graph code block.

The ASCII art dependency graph would benefit from a language identifier (e.g., text or plaintext) to satisfy linting and improve rendering consistency.

-```
+```text
 Task 1 (commit existing work)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/plans/2026-03-30-p0-critical-safety-fixes.md` around lines
1393 - 1404, The ASCII-art dependency graph code block lacks a language
identifier; update the opening triple-backtick that begins the graph (the block
starting with "Task 1 (commit existing work)") to include a plain language tag
such as ```text or ```plaintext so the block becomes a fenced code block with a
language identifier and satisfies linters and consistent rendering.

14-15: Minor markdown structure issue.

The document jumps from h1 (#) directly to h3 (###). Consider using h2 (##) for the task sections, or adding an h2 section header before the tasks. This is a minor documentation consistency issue flagged by markdownlint.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/plans/2026-03-30-p0-critical-safety-fixes.md` around lines
14 - 15, The markdown jumps from an h1 to h3 (e.g., the existing "### Task 1:
Commit existing parallel context fix (`#20`) and goal-gate retry fix (`#15`)");
change those task headings to h2 (replace "###" with "##") or add an
intermediate h2 section header before the task list so the structure goes H1 ->
H2 -> H3 as appropriate (update the "### Task 1..." and any subsequent "### Task
n..." headings to "##" or insert a descriptive h2 preceding them to satisfy
markdownlint).
pipeline/engine_goal_gate_test.go (3)

83-92: Same issue: check error from engine.Run.

 	engine := NewEngine(g, reg)
-	result, _ := engine.Run(context.Background())
+	result, err := engine.Run(context.Background())
+	if err != nil {
+		t.Fatalf("unexpected engine.Run error: %v", err)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/engine_goal_gate_test.go` around lines 83 - 92, The test currently
ignores the error return from engine.Run; change the call to capture both result
and err (e.g., result, err := engine.Run(...)) and assert err is nil before
inspecting result.Status and workAttempts, failing the test with t.Fatalf
including err if non-nil; reference NewEngine and Engine.Run to locate the call
to update.

41-50: Check error from engine.Run to avoid masking failures.

The test ignores the error returned by engine.Run. While the test focuses on retry termination, an unexpected error would be silently ignored, potentially masking bugs.

 	engine := NewEngine(g, reg)
-	result, _ := engine.Run(context.Background())
+	result, err := engine.Run(context.Background())
+	if err != nil {
+		t.Fatalf("unexpected engine.Run error: %v", err)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/engine_goal_gate_test.go` around lines 41 - 50, The test currently
ignores the error returned by engine.Run; call engine.Run(ctx) and check the
returned error (the error from engine.Run) before asserting on result.Status and
workAttempts, failing the test (t.Fatalf) if err != nil to avoid masking
failures; update the block around NewEngine, engine.Run, result and workAttempts
so any non-nil err aborts the test with a clear message referencing engine.Run.

131-142: Test appears incomplete — result and error are ignored.

The test assigns result and err but then explicitly ignores them with _ = err and _ = result. The comment on lines 134-136 indicates uncertainty about expected behavior ("it depends on whether clearDownstream clears work").

Consider completing the test by:

  1. Determining expected final status (likely OutcomeFail since the goal-gate remains unsatisfied)
  2. Adding assertions for result.Status

This aligns with the implementation plan Task 2, Step 6 which specifies updating this test to assert result.Status == OutcomeFail.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/engine_goal_gate_test.go` around lines 131 - 142, The test currently
ignores result and err after calling engine := NewEngine(g, reg) and result, err
:= engine.Run(...); update the test to assert the final pipeline outcome: check
err is nil (or as expected) and assert result.Status == OutcomeFail
(import/reference OutcomeFail) because the goal-gate remains unsatisfied; keep
the existing escalateVisited check but replace the lines "_ = err" and "_ =
result" with concrete assertions on err and result.Status to validate the
expected failure outcome from Run.
docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md (2)

168-169: Document the risk of allow_unsafe_interpolation: all.

The blanket override allow_unsafe_interpolation: all could be misused to bypass all tainted interpolation protections. While this is acceptable given the stated threat model (line 24: "pipeline author is the trust boundary"), consider documenting in CLAUDE.md that:

  1. Per-variable allowlisting is strongly preferred: allow_unsafe_interpolation: last_response
  2. The all keyword should only be used when the pipeline author has manually audited the tool_command for injection safety
  3. The DIP130 lint warning still fires to ensure conscious opt-in
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md` around
lines 168 - 169, Add documentation to CLAUDE.md describing the risks of using
the blanket allow_unsafe_interpolation: all override and recommended practices:
prefer per-variable allowlisting (e.g., allow_unsafe_interpolation:
last_response), only use the all keyword after manually auditing the
tool_command for injection safety, and note that the DIP130 lint warning will
still trigger to enforce conscious opt-in; update the section that references
allow_unsafe_interpolation and the threat model wording so readers see these
three points adjacent to the existing pipeline-author trust-boundary guidance.

143-177: Two expansion systems add complexity but fix is comprehensive.

The doc correctly identifies that having two independent variable expansion paths (ExpandGraphVariables in prepareExecNode and ExpandVariables in ToolHandler.Execute) creates a larger attack surface. Both must be gated to prevent bypasses.

The proposed solution addresses both systems:

  • ExpandVariables gets tainted-key checks for ${ctx.*} patterns
  • ExpandGraphVariables excludes tainted graph vars for tool_command

Line 174 correctly identifies the cross-contamination risk: an LLM node could write pctx.Set("graph.foo", malicious), so both layers need protection.

Consider adding a follow-up task to consolidate to a single expansion system with context-aware safety checks, reducing the dual-layer complexity long-term.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md` around
lines 143 - 177, The review points out unsafe variable expansion for
tool_command: add tainted-key checks and an allowlist to both expansion systems;
in ExpandVariables (used by ToolHandler.Execute) add an opts/wrapper (e.g.,
ExpandToolCommand) that rejects ${ctx.<tainted_key>} using the
taintedContextKeys map and returns the prescribed error message unless the
node's allow_unsafe_interpolation attribute (comma-separated or "all")
explicitly allows the key; in prepareExecNode/ExpandGraphVariables (when
building GraphVarMap) detect graph.<tainted_key> sources, build a
taintedGraphVars set and exclude those vars when expanding the "tool_command"
attribute (leave other attributes unchanged); reference functions/types:
ExpandVariables, ExpandToolCommand (new), ToolHandler.Execute, prepareExecNode,
ExpandGraphVariables, GraphVarMap, and the taintedContextKeys allowlist.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md`:
- Around line 190-196: Add a language specifier to the fenced code block titled
"Safe pattern for LLM output in tool commands" so markdownlint MD040 is
satisfied: change the opening triple backticks to include the shell language
(e.g., "bash") for the block that starts with "### Safe pattern for LLM output
in tool commands" and ensure the block remains properly closed with triple
backticks; this affects the code block surrounding the lines that start with "Do
NOT interpolate ${ctx.last_response}..." and the numbered steps.
- Around line 87-95: Add a FallbackTaken map[string]bool field (with
`json:"fallbackTaken"` tag) to the Checkpoint struct in pipeline/checkpoint.go,
ensure it's initialized during checkpoint creation/restore (so code that
constructs or unmarshals Checkpoint sets cp.FallbackTaken = map[string]bool{}
when nil), and update any checkpoint constructors/restore paths to preserve this
map across saves so the escalation-loop fix survives restarts; then implement
the TestGoalGateEscalationSurvivesCheckpointRestore test to create a Checkpoint,
mark a gate in cp.FallbackTaken, serialize/deserialize (or round-trip via the
existing checkpoint save/restore helpers) and assert the map value is preserved.

In `@pipeline/engine_checkpoint.go`:
- Around line 162-181: Add a FallbackTaken guard to the Checkpoint state: extend
the Checkpoint struct with a FallbackTaken map[string]bool; in the
fallback-return logic inside the retry-exhaustion branch (the loop that iterates
node.Attrs["fallback_target"], node.Attrs["fallback_retry_target"], etc.) check
cp.FallbackTaken[nodeID] before returning a fallback (skip that
fallback/escalation if true) and when you decide to return a fallback mark
cp.FallbackTaken[nodeID] = true so the same fallback won’t be re-taken
repeatedly; keep the existing behavior for returning the escalation/unsatisfied
path unchanged.

---

Nitpick comments:
In `@docs/superpowers/plans/2026-03-30-p0-critical-safety-fixes.md`:
- Around line 1393-1404: The ASCII-art dependency graph code block lacks a
language identifier; update the opening triple-backtick that begins the graph
(the block starting with "Task 1 (commit existing work)") to include a plain
language tag such as ```text or ```plaintext so the block becomes a fenced code
block with a language identifier and satisfies linters and consistent rendering.
- Around line 14-15: The markdown jumps from an h1 to h3 (e.g., the existing
"### Task 1: Commit existing parallel context fix (`#20`) and goal-gate retry fix
(`#15`)"); change those task headings to h2 (replace "###" with "##") or add an
intermediate h2 section header before the task list so the structure goes H1 ->
H2 -> H3 as appropriate (update the "### Task 1..." and any subsequent "### Task
n..." headings to "##" or insert a descriptive h2 preceding them to satisfy
markdownlint).

In `@docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md`:
- Around line 168-169: Add documentation to CLAUDE.md describing the risks of
using the blanket allow_unsafe_interpolation: all override and recommended
practices: prefer per-variable allowlisting (e.g., allow_unsafe_interpolation:
last_response), only use the all keyword after manually auditing the
tool_command for injection safety, and note that the DIP130 lint warning will
still trigger to enforce conscious opt-in; update the section that references
allow_unsafe_interpolation and the threat model wording so readers see these
three points adjacent to the existing pipeline-author trust-boundary guidance.
- Around line 143-177: The review points out unsafe variable expansion for
tool_command: add tainted-key checks and an allowlist to both expansion systems;
in ExpandVariables (used by ToolHandler.Execute) add an opts/wrapper (e.g.,
ExpandToolCommand) that rejects ${ctx.<tainted_key>} using the
taintedContextKeys map and returns the prescribed error message unless the
node's allow_unsafe_interpolation attribute (comma-separated or "all")
explicitly allows the key; in prepareExecNode/ExpandGraphVariables (when
building GraphVarMap) detect graph.<tainted_key> sources, build a
taintedGraphVars set and exclude those vars when expanding the "tool_command"
attribute (leave other attributes unchanged); reference functions/types:
ExpandVariables, ExpandToolCommand (new), ToolHandler.Execute, prepareExecNode,
ExpandGraphVariables, GraphVarMap, and the taintedContextKeys allowlist.

In `@pipeline/engine_goal_gate_test.go`:
- Around line 83-92: The test currently ignores the error return from
engine.Run; change the call to capture both result and err (e.g., result, err :=
engine.Run(...)) and assert err is nil before inspecting result.Status and
workAttempts, failing the test with t.Fatalf including err if non-nil; reference
NewEngine and Engine.Run to locate the call to update.
- Around line 41-50: The test currently ignores the error returned by
engine.Run; call engine.Run(ctx) and check the returned error (the error from
engine.Run) before asserting on result.Status and workAttempts, failing the test
(t.Fatalf) if err != nil to avoid masking failures; update the block around
NewEngine, engine.Run, result and workAttempts so any non-nil err aborts the
test with a clear message referencing engine.Run.
- Around line 131-142: The test currently ignores result and err after calling
engine := NewEngine(g, reg) and result, err := engine.Run(...); update the test
to assert the final pipeline outcome: check err is nil (or as expected) and
assert result.Status == OutcomeFail (import/reference OutcomeFail) because the
goal-gate remains unsatisfied; keep the existing escalateVisited check but
replace the lines "_ = err" and "_ = result" with concrete assertions on err and
result.Status to validate the expected failure outcome from Run.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4402dbcb-fcf7-4d6d-b95e-934e377dd785

📥 Commits

Reviewing files that changed from the base of the PR and between 0e62260 and e7edb0c.

📒 Files selected for processing (9)
  • docs/superpowers/plans/2026-03-30-p0-critical-safety-fixes.md
  • docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md
  • pipeline/context.go
  • pipeline/context_test.go
  • pipeline/engine_checkpoint.go
  • pipeline/engine_goal_gate_test.go
  • pipeline/engine_run.go
  • pipeline/handlers/parallel.go
  • pipeline/handlers/parallel_test.go

Comment on lines +87 to +95
### Bug to Fix: Escalation Loop

**Problem:** When retries exhaust and the fallback path is taken, `goalGateRetryTarget` returns `retry=true`. This causes `handleExitNode` to call `IncrementRetry` and `clearDownstream` for the escalation target. If the escalation node also fails (or doesn't satisfy the goal gate), the pipeline bounces between exit and escalation forever — the retry counter goes past max but the `>=` check routes to fallback every time.

**Fix:** Track whether we've already taken the fallback path for a given gate. Add a `FallbackTaken map[string]bool` field to `Checkpoint`. When `goalGateRetryTarget` finds retries exhausted:
- If `cp.FallbackTaken[gateNodeID]` is true → return `(_, _, false, true)` (no retry, unsatisfied → pipeline fails)
- If false → set `cp.FallbackTaken[gateNodeID] = true`, return the fallback target

This guarantees at most one escalation attempt per gate. Serialized in checkpoint JSON for crash safety.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Verify FallbackTaken field exists in Checkpoint and is tested

# Check if FallbackTaken field is defined in Checkpoint struct
echo "=== Checking Checkpoint struct for FallbackTaken field ==="
ast-grep --pattern 'type Checkpoint struct {
  $$$
}'

# Check for checkpoint serialization tests
echo -e "\n=== Checking for checkpoint round-trip tests ==="
rg -n "FallbackTaken" --type=go

# Verify test mentioned in line 122 exists
echo -e "\n=== Verifying TestGoalGateEscalationSurvivesCheckpointRestore ==="
rg -n "TestGoalGateEscalationSurvivesCheckpointRestore" --type=go

Repository: 2389-research/tracker

Length of output: 1584


Add FallbackTaken field to Checkpoint struct — escalation loop fix will not survive restarts without it.

The design document claims at line 95 that FallbackTaken is "serialized in checkpoint JSON for crash safety," but verification reveals the field is missing from the Checkpoint struct in pipeline/checkpoint.go. Without persisting this map across checkpoints, the escalation loop prevention resets on restart and infinite bouncing can resume.

Required actions:

  • Add FallbackTaken map[string]bool field to the Checkpoint struct with JSON serialization tag
  • Implement the test TestGoalGateEscalationSurvivesCheckpointRestore (referenced at design line 122) to verify the map persists through checkpoint round-trips
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md` around
lines 87 - 95, Add a FallbackTaken map[string]bool field (with
`json:"fallbackTaken"` tag) to the Checkpoint struct in pipeline/checkpoint.go,
ensure it's initialized during checkpoint creation/restore (so code that
constructs or unmarshals Checkpoint sets cp.FallbackTaken = map[string]bool{}
when nil), and update any checkpoint constructors/restore paths to preserve this
map across saves so the escalation-loop fix survives restarts; then implement
the TestGoalGateEscalationSurvivesCheckpointRestore test to create a Checkpoint,
mark a gate in cp.FallbackTaken, serialize/deserialize (or round-trip via the
existing checkpoint save/restore helpers) and assert the map value is preserved.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets P0 safety issues in the pipeline engine by preventing silent loss of parallel branch context writes and adding a retry budget + event visibility for goal-gate retry loops.

Changes:

  • Add PipelineContext.DiffFrom() and use it in the parallel handler to capture side-effect pctx.Set() writes, merging them with explicit Outcome.ContextUpdates.
  • Introduce goal-gate retry budgeting / fallback routing in goalGateRetryTarget, and emit EventStageRetrying / EventStageFailed events for goal-gate retry behavior.
  • Add new/updated tests covering context diffing, parallel side-effect capture, and goal-gate retry budget behavior; add design/spec docs.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pipeline/handlers/parallel.go Captures side-effect context writes via DiffFrom(snapshot) and merges with explicit ContextUpdates.
pipeline/handlers/parallel_test.go Adds tests ensuring side-effect-only writes are captured and explicit updates override side effects.
pipeline/context.go Adds PipelineContext.DiffFrom() for snapshot diffing of context values.
pipeline/context_test.go Adds unit tests validating DiffFrom() behavior for new/changed/unchanged keys and empty cases.
pipeline/engine_checkpoint.go Updates goal-gate retry target selection to enforce retry budgets and attempt fallback routing.
pipeline/engine_run.go Wires new goal-gate retry tuple into exit handling; increments retry count and emits retry/failure events.
pipeline/engine_goal_gate_test.go Introduces tests for goal-gate retry termination, node-level max retries, and fallback behavior.
docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md Adds design spec describing the issues and intended fixes.
docs/superpowers/plans/2026-03-30-p0-critical-safety-fixes.md Adds an implementation plan for remaining safety tasks and tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +176 to +179
return fb, nodeID, true, true
}
}
// No fallback — signal unsatisfied without retry.
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the retries-exhausted path, returning the fallback target with shouldRetry=true causes the engine to treat fallback routing as another retry. Because the goal gate remains unsatisfied, this can create an infinite loop (exit → fallback → exit → fallback …). Consider tracking whether fallback has already been taken (e.g., a FallbackTaken flag in the checkpoint) or returning a distinct "fallback" action so the engine only escalates once and then fails if still unsatisfied.

Suggested change
return fb, nodeID, true, true
}
}
// No fallback — signal unsatisfied without retry.
// Use checkpoint context to ensure we only take fallback once per node,
// and do not treat fallback routing as another generic retry.
if cp.Context == nil {
cp.Context = map[string]string{}
}
fbKey := "fallback_taken:" + nodeID
if cp.Context[fbKey] == "true" {
// Fallback already used; avoid infinite exit→fallback loops.
break
}
cp.Context[fbKey] = "true"
// Fallback is a one-time escalation, not a retry.
return fb, nodeID, false, true
}
}
// No fallback — or fallback already used — signal unsatisfied without retry.

Copilot uses AI. Check for mistakes.
Comment on lines +377 to +387
s.cp.IncrementRetry(gateNodeID)
gateNode := e.nodeOrDefault(gateNodeID)
e.emit(PipelineEvent{
Type: EventStageRetrying,
Timestamp: time.Now(),
RunID: s.runID,
NodeID: gateNodeID,
Message: fmt.Sprintf("goal-gate retry for %q → %q (attempt %d/%d)",
gateNodeID, target,
s.cp.RetryCount(gateNodeID), e.maxRetries(gateNode)),
})
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleExitNode increments retry counts and emits EventStageRetrying whenever goalGateRetryTarget returns retry=true. If retry=true is also used for fallback/escalation routing, this will over-count retries and mislabel fallback as a retry attempt (and contributes to exit↔fallback looping). Suggest returning a separate signal for fallback routing (or only increment/emit when within the retry budget).

Suggested change
s.cp.IncrementRetry(gateNodeID)
gateNode := e.nodeOrDefault(gateNodeID)
e.emit(PipelineEvent{
Type: EventStageRetrying,
Timestamp: time.Now(),
RunID: s.runID,
NodeID: gateNodeID,
Message: fmt.Sprintf("goal-gate retry for %q → %q (attempt %d/%d)",
gateNodeID, target,
s.cp.RetryCount(gateNodeID), e.maxRetries(gateNode)),
})
gateNode := e.nodeOrDefault(gateNodeID)
currentRetries := s.cp.RetryCount(gateNodeID)
maxRetries := e.maxRetries(gateNode)
// Only count and emit a retry event while within the retry budget.
if currentRetries < maxRetries {
s.cp.IncrementRetry(gateNodeID)
e.emit(PipelineEvent{
Type: EventStageRetrying,
Timestamp: time.Now(),
RunID: s.runID,
NodeID: gateNodeID,
Message: fmt.Sprintf("goal-gate retry for %q → %q (attempt %d/%d)",
gateNodeID, target,
s.cp.RetryCount(gateNodeID), maxRetries),
})
}

Copilot uses AI. Check for mistakes.
Comment on lines +134 to +138
// The escalate node routes to done, so pipeline should succeed eventually.
// But done is the exit and work is still unsatisfied — so it depends on
// whether clearDownstream clears work. Let's check the result.
_ = err
_ = result
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test currently discards err/result (via _ = err / _ = result) and doesn't assert the expected engine status. That makes it easy for an infinite goal-gate fallback loop to slip through unnoticed, and it won't validate that fallback routing terminates correctly. Please assert err/result.Status and consider running the engine with a context timeout in this test to prevent hangs.

Copilot uses AI. Check for mistakes.

**Architecture:** Four independent fixes plus one cross-cutting circuit breaker. Each fix is self-contained — commit after each task. The tool_command fix gates both variable expansion systems (`ExpandVariables` and `ExpandGraphVariables`) with a hardcoded tainted-key set rather than runtime taint tracking.

**Tech Stack:** Go 1.24, standard library only. No new dependencies.
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation plan states "Go 1.24", but the repository's go.mod currently specifies go 1.25.5. Please update the plan to match the actual toolchain version (or vice versa) so readers don't run the wrong Go version.

Suggested change
**Tech Stack:** Go 1.24, standard library only. No new dependencies.
**Tech Stack:** Go 1.25.5, standard library only. No new dependencies.

Copilot uses AI. Check for mistakes.
clintecker and others added 20 commits April 3, 2026 10:57
…w mode

Implements ParseQuestions(markdown string) []Question with support for
numbered items (. and ) style), bullet questions, and imperative verb
prompts. Extracts inline options from trailing parentheticals and detects
yes/no questions by option values or interrogative text patterns. Fenced
code blocks are skipped. 20 tests covering all specified cases pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements InterviewAnswer and InterviewResult structs with round-trip
JSON serialization and a BuildMarkdownSummary helper for downstream agents.
All 9 tests pass covering round-trip, invalid input, skipped answers,
canceled flag, and markdown format.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add QuestionsKey, AnswersKey, and Prompt extraction to extractHumanAttrs
so interview-mode human nodes can carry their context key config through
the IR-to-Graph adapter layer. Also upgrades dippin-lang to v0.15.0-beta.1
which defines these fields on ir.HumanConfig.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rview mode

Adds the InterviewInterviewer interface, routes mode=interview nodes to
executeInterview(), and implements the full handler: question parsing,
last_response fallback, zero-questions freeform degradation, retry pre-fill,
custom context keys, and JSON+markdown answer storage. Includes 7 TDD tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the InterviewInterviewer interface on AutoApproveFreeformInterviewer
so --auto-approve mode and tests can drive interview-mode human gates. Picks
first option for select questions, "yes" for yes/no, and "auto-approved" for
open-ended. Adds compile-time interface assertion and unit test coverage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… forms

Implements the TUI component shown when an interview-mode human gate fires.
Supports radio select, yes/no toggle, and textarea fields with pagination,
pre-fill from previous results, and JSON serialization via replyCh.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add MsgGateInterview message type to tui/messages.go, implement
AskInterview on BubbleteaInterviewer (Mode 1 inline program + Mode 2
send-func delegation), and handle MsgGateInterview in app.go
handleModalMsg to show the InterviewContent modal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…implementations

Implements InterviewInterviewer on AutopilotInterviewer (native LLM),
ClaudeCodeAutopilotInterviewer (claude CLI), and AutopilotTUIInterviewer
(TUI wrapper). Adds buildInterviewPrompt and parseInterviewResponse helpers
with markdown fence stripping, JSON extraction, and a one-retry-on-parse-fail
policy (provider errors hard-fail per CLAUDE.md).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements InterviewInterviewer on ConsoleInterviewer so interview-mode
human gates work in non-TUI (piped/testing) contexts. Adds compile-time
assertion and three unit tests covering name match, numeric index, and
blank-to-skip behavior.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Exercises the full pipeline: context → ParseQuestions → AskInterview →
context updates, including zero-questions freeform fallback and retry
pre-fill pass-through.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Change "N questions answered" to "N of M questions answered" so skipped
questions don't inflate the count.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
12 fixes from four-angle code review:

Must-fix:
1. Set Incomplete=true in collectAnswers() when any answer is empty
2. Show "Tab → Add details (optional)" hint on focused select/confirm fields
3. ConsoleInterviewer pre-fill uses ID-based lookup instead of position index
4. Esc in fields navigates back instead of hard-canceling (cancel only at Q1)
5. Cap parsed questions at 100 (maxQuestions) to prevent unbounded allocation
6. Strip markdown emphasis (*bold*, _italic_) from question text labels

Should-fix:
7. InterviewContent.Cancel() now closes channel (consistent with all other modals)
8. interviewRunner.Update discards cmd on quit (consistent with other runners)
9. Replace flaky time.Sleep test with blocking channel read
10. Promote default context keys to constants (ContextKeyInterviewQuestions/Answers)
11. Document ClaudeCodeAutopilot silent fallback asymmetry vs native hard-fail
12. Document autopilot prompt injection risk in buildInterviewPrompt

Also: replace custom contains/searchString helpers with strings.Contains.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eview workflow

Interview mode UX:
- One question at a time with progress bar, answered summary, selection feedback
- Enter confirms and advances for select and yes/no fields
- Text wraps to terminal width (questions, context, options, summaries)
- Untouched select fields report empty answer (not first option)
- "Other" variants auto-filtered from options (UI always provides its own)
- Cancel returns OutcomeFail so pipelines can route on cancellation

Structured output:
- ParseStructuredQuestions validates JSON questions from agent output
- extractJSON uses bracket-depth counting (not greedy first/last brace)
- response_format plumbing: .dip → IR → adapter → codergen → session → LLM API
- All three providers supported (Anthropic, OpenAI, Gemini)
- dippin-lang v0.16.0 integration (ResponseFormat, ResponseSchema, Params)

Reliability:
- Empty API response detection with capped retry (2 attempts) and diagnostics
- Anthropic adapter logs raw response on 0-token replies
- Codergen empty-response guard fails node instead of silent success
- AskFreeform now hard-fails on provider errors (was silently swallowing)
- ClaudeCode autopilot retry-then-hardfail on parse failure
- Goroutine leak fix in flashDecision with done channel
- Context leak fix (explicit cancel() in retry loop)
- Mode 1 tea.Cmd propagation in all three runners

New content:
- deep_review built-in workflow (3 interview gates, parallel analysis)
- interview-loop.dip reusable subgraph
- dippin-lang structured output request doc (resolved)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CI caught formatting issues that the pre-commit hook missed because
the hook was not installed (symlink to .git/hooks/pre-commit was missing).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- parseInterviewResponse fills missing answers with empty entries and sets
  Incomplete when LLM omits questions (Codex P1)
- Console "Other" number (len+1) now prompts for freeform text instead of
  storing the literal number string (Codex P2)
- Document unused prev param in ClaudeCodeAutopilotInterviewer.AskInterview
- Detect real terminal size in Mode 1 interview via golang.org/x/term
- Add language identifier to markdown code fence in plan doc

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ug logging

- Pre-fill sets selected=true so retry answers survive collectAnswers
- fieldAnswered checks otherInput is non-empty for Other selections
- Cancel sends partial answers (Canceled=true + current state) before closing
- parseInterviewResponse builds results in question order, validates options
  via matchChoice, normalizes yes/no, sets Incomplete for missing answers
- Raw response body logging gated behind TRACKER_DEBUG env var

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Config validation:
- ResponseFormat must be json_object or json_schema
- json_schema requires non-empty valid JSON ResponseSchema

Session reliability:
- Exhausted empty response retries now hard-fail instead of stoppedNaturally

Workflow fixes (deep_review.dip):
- SaveGoal tool node preserves original developer goal before interviews
- Analysis agents read goal from .ai/review/goal.txt
- Empty priority questions allowed when no actionable findings
- Revise loop includes ReviewPlan feedback via ${ctx.human_response}

Interview UX:
- IsYesNo takes precedence over Options in all code paths
- Answered summary capped to viewport height
- Blank retry input preserves previous answer (console)
- Freeform fallback sets answersKey in ContextUpdates
- Autopilot includes previous answers in retry prompt

Other:
- HTML-escape Mermaid arrows in workflows.html
- interview-loop.dip prompts include ${ctx.interview_answers}
- Added ResponseFormat validation and blank-preserve tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…re-commit

CI failed because examples/ had stale copies of the 3 existing workflows
and deep_review.dip was missing from examples/ entirely.

- Copied embedded workflows to examples/ (source of truth for CI)
- Added deep_review.dip to Makefile sync-workflows and check-workflows targets
- Added workflow sync check to .pre-commit hook (gate #8)
- Added deep_review.dip to dippin lint list in .pre-commit

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…kflow

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a fallback-taken guard in goalGateRetryTarget using checkpoint
context keys to ensure fallback/escalation routing is one-shot.
Separate fallback routing from retry logic in handleExitNode so
retries aren't over-counted and events are correctly labeled.
Add timeout and assertions to TestGoalGateRetryFallsBackToFallbackTarget.
Fix Go version in plan doc (1.24→1.25) and add language specifier to
fenced code block in design spec.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 57 out of 58 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +162 to +187
// Check retry budget before allowing another loop.
maxR := e.maxRetries(node)
if cp.RetryCount(nodeID) >= maxR {
// Retries exhausted — look for a fallback/escalation target.
// Guard: only take the fallback once per gate to prevent infinite loops.
fallbackKey := "_fallback_taken:" + nodeID
if cp.Context != nil && cp.Context[fallbackKey] == "true" {
// Fallback already taken — signal unsatisfied without retry.
return "", nodeID, false, true
}
for _, fb := range []string{
node.Attrs["fallback_target"],
node.Attrs["fallback_retry_target"],
e.graph.Attrs["fallback_target"],
e.graph.Attrs["fallback_retry_target"],
} {
if fb == "" {
continue
}
if _, ok := e.graph.Nodes[fb]; ok {
// Mark fallback as taken so it won't loop.
if cp.Context == nil {
cp.Context = map[string]string{}
}
cp.Context[fallbackKey] = "true"
return fb, nodeID, false, true
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback-taken guard is stored in cp.Context (fallbackKey), but saveCheckpoint() overwrites cp.Context with pctx.Snapshot() on every save. That means the marker will be lost the next time a checkpoint is saved (including immediately after routing), so the engine can still loop between exit and fallback forever. It also pollutes the user-visible context on resume. Consider persisting this state in a dedicated Checkpoint field (e.g., FallbackTaken map[string]bool) or another checkpoint-owned structure that isn’t replaced by snapshotting the pipeline context.

Copilot uses AI. Check for mistakes.
Comment on lines +246 to +306
fmt.Fprintf(c.Writer, "\nQ%d: %s\n", q.Index, q.Text)

if len(q.Options) > 0 {
// Print numbered options
for j, opt := range q.Options {
fmt.Fprintf(c.Writer, " %d) %s\n", j+1, opt)
}
fmt.Fprintf(c.Writer, " %d) Other\n", len(q.Options)+1)

// Pre-fill hint
if prevAns.Answer != "" {
fmt.Fprintf(c.Writer, "Previous: %s\n", prevAns.Answer)
}
fmt.Fprintf(c.Writer, "Enter choice (name or number, blank to skip): ")

line, err := c.readLine()
if err != nil {
canceled = true
answers[i] = ans
// Fill remaining questions with empty answers.
for j := i + 1; j < len(questions); j++ {
answers[j] = InterviewAnswer{
ID: fmt.Sprintf("q%d", questions[j].Index),
Text: questions[j].Text,
}
}
break
}
input := strings.TrimSpace(line)
if input != "" {
// Match by name (case-insensitive) or number
matched := false
for _, opt := range q.Options {
if strings.EqualFold(input, opt) {
ans.Answer = opt
matched = true
break
}
}
if !matched {
var idx int
if _, err := fmt.Sscanf(input, "%d", &idx); err == nil && idx >= 1 && idx <= len(q.Options) {
ans.Answer = q.Options[idx-1]
} else if _, err := fmt.Sscanf(input, "%d", &idx); err == nil && idx == len(q.Options)+1 {
// User selected "Other" by number — prompt for freeform text.
fmt.Fprintf(c.Writer, "Enter your answer: ")
otherLine, otherErr := c.readLine()
if otherErr == nil {
ans.Answer = strings.TrimSpace(otherLine)
}
} else {
// Treat as "Other" freeform
ans.Answer = input
}
}
} else if prevAns.Answer != "" {
// Blank input preserves the previous answer on retry.
ans.Answer = prevAns.Answer
}
} else if q.IsYesNo {
if prevAns.Answer != "" {
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConsoleInterviewer.AskInterview checks len(q.Options) before q.IsYesNo, so a yes/no question that has explicit options (e.g. parsed from “(yes, no)”) will be presented as an option list and will also add an “Other” choice. This is inconsistent with the TUI behavior (which prioritizes IsYesNo) and can change semantics. Consider handling q.IsYesNo before options (or special-casing yes/no options) to keep the UX consistent.

Copilot uses AI. Check for mistakes.
Comment on lines 41 to 45
func TestListBuiltinWorkflowsReturnsThree(t *testing.T) {
workflows := listBuiltinWorkflows()
if len(workflows) != 3 {
t.Errorf("listBuiltinWorkflows returned %d workflows, want 3", len(workflows))
if len(workflows) != 4 {
t.Errorf("listBuiltinWorkflows returned %d workflows, want 4", len(workflows))
}
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test name no longer matches behavior: this test now expects 4 workflows but is still named TestListBuiltinWorkflowsReturnsThree. Renaming it will avoid confusion when diagnosing failures.

Copilot uses AI. Check for mistakes.
…priority, rename test

- Move fallback-taken state from cp.Context (overwritten by pctx.Snapshot)
  to dedicated Checkpoint.FallbackTaken map that survives save/restore.
- Reorder ConsoleInterviewer.AskInterview to check IsYesNo before
  len(Options), matching TUI behavior for yes/no questions with explicit
  option lists.
- Rename TestListBuiltinWorkflowsReturnsThree to ReturnsFour to match
  the current assertion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🧹 Nitpick comments (5)
.pre-commit (1)

94-98: Consider preserving check-workflows output on failure

Line 94 suppresses mismatch details, so developers only get a generic error. Letting output through improves fix speed.

🔧 Minimal tweak
-if make check-workflows > /dev/null 2>&1; then
+if make check-workflows; then
   pass "embedded workflows in sync"
 else
   fail "examples/*.dip out of sync with cmd/tracker/workflows/ — run 'make sync-workflows'"
 fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.pre-commit around lines 94 - 98, The current pre-commit hook hides the
output of the make check-workflows target by redirecting stdout/stderr to
/dev/null, so on failure developers only see a generic message; update the
conditional that runs check-workflows (the make check-workflows invocation in
the if/else block) to not discard its output so that mismatch details are
printed on failure—i.e., remove the ">/dev/null 2>&1" redirection or change the
logic to capture output only on success and echo the captured output when the
command fails.
docs/requests/dippin-lang-structured-output.md (1)

136-141: Add language specifier to fenced code block.

The fenced code block should have a language identifier for consistent formatting and linting compliance. Since these are validation rule identifiers, text or plaintext would be appropriate.

📝 Proposed fix
-```
+```text
 DIP_NEW_001: response_format must be "json_object" or "json_schema"
 DIP_NEW_002: response_schema requires response_format: json_schema
 DIP_NEW_003: response_schema must be valid JSON
 DIP_LINT_NEW: warning if json_schema without response_schema
+```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/requests/dippin-lang-structured-output.md` around lines 136 - 141, The
fenced code block containing the validation rules (DIP_NEW_001, DIP_NEW_002,
DIP_NEW_003, DIP_LINT_NEW) lacks a language specifier; update the opening
triple-backtick to include a language such as "text" or "plaintext" (e.g.,
change ``` to ```text) so the block is consistently formatted and
linting-compliant.
pipeline/handlers/human_test.go (1)

659-747: Good console interviewer test coverage.

Tests verify:

  • Multi-question flow with different question types
  • Selection by numeric index
  • Blank input skipping behavior
  • Previous answer preservation on blank input

Consider adding a test for "Other" selection by number (entering the len(options)+1 index) to complete coverage of that code path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/handlers/human_test.go` around lines 659 - 747, Add a test that
covers selecting "Other" by numeric index for ConsoleInterviewer.AskInterview:
create a new test (e.g., TestConsoleInterviewer_AskInterview_OtherByNumber) that
supplies input selecting the options length+1 (e.g., "3\nCustom value\n"),
constructs a ConsoleInterviewer with that Reader/Writer, calls AskInterview with
a Question that has two options, and asserts the returned InterviewResult
contains the freeform "Custom value" as the answer; this ensures the code path
handling the numeric "Other" choice (len(options)+1) is exercised and validated.
pipeline/handlers/human.go (1)

285-299: Duplicate fmt.Sscanf parsing can be consolidated.

The code parses the input twice with fmt.Sscanf (lines 287 and 289). This works but is inefficient and slightly confusing.

♻️ Suggested consolidation
 				if !matched {
 					var idx int
-					if _, err := fmt.Sscanf(input, "%d", &idx); err == nil && idx >= 1 && idx <= len(q.Options) {
-						ans.Answer = q.Options[idx-1]
-					} else if _, err := fmt.Sscanf(input, "%d", &idx); err == nil && idx == len(q.Options)+1 {
-						// User selected "Other" by number — prompt for freeform text.
-						fmt.Fprintf(c.Writer, "Enter your answer: ")
-						otherLine, otherErr := c.readLine()
-						if otherErr == nil {
-							ans.Answer = strings.TrimSpace(otherLine)
+					if _, err := fmt.Sscanf(input, "%d", &idx); err == nil {
+						if idx >= 1 && idx <= len(q.Options) {
+							ans.Answer = q.Options[idx-1]
+						} else if idx == len(q.Options)+1 {
+							// User selected "Other" by number — prompt for freeform text.
+							fmt.Fprintf(c.Writer, "Enter your answer: ")
+							otherLine, otherErr := c.readLine()
+							if otherErr == nil {
+								ans.Answer = strings.TrimSpace(otherLine)
+							}
+						} else {
+							// Invalid index — treat as freeform "Other"
+							ans.Answer = input
 						}
 					} else {
 						// Treat as "Other" freeform
 						ans.Answer = input
 					}
 				}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/handlers/human.go` around lines 285 - 299, Consolidate the duplicate
fmt.Sscanf calls by parsing input into idx exactly once and reusing the result:
in the branch where !matched, call fmt.Sscanf(input, "%d", &idx) a single time,
check err==nil and then handle three cases (1) idx between 1 and len(q.Options)
-> set ans.Answer = q.Options[idx-1], (2) idx == len(q.Options)+1 -> prompt via
fmt.Fprintf(c.Writer, "Enter your answer: "), call otherLine, otherErr :=
c.readLine() and set ans.Answer = strings.TrimSpace(otherLine) on no error, and
(3) otherwise -> fall back to treating input as freeform ans.Answer = input;
keep variable names matched, input, q.Options, ans.Answer and c.readLine as in
the diff.
tui/interview_content.go (1)

635-636: Potential mismatch between question Index and total count display.

The label uses f.question.Index (which comes from parsed question data) alongside len(ic.fields) for the total. If question indices aren't sequential starting from 1, or if some questions were filtered, this could display confusing progress like "Q5 of 3". Consider using ic.cursor + 1 instead of f.question.Index for consistent "N of M" display.

💡 Suggested fix for consistent progress display
-		qLabel := fmt.Sprintf("Q%d of %d: %s", f.question.Index, len(ic.fields), f.question.Text)
+		qLabel := fmt.Sprintf("Q%d of %d: %s", ic.cursor+1, len(ic.fields), f.question.Text)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tui/interview_content.go` around lines 635 - 636, The Q label currently uses
f.question.Index with len(ic.fields), which can mismatch if indices aren’t
sequential or some questions are filtered; change the label construction to use
the current cursor position (ic.cursor) for the displayed ordinal (use ic.cursor
+ 1) so the progress reads "Q{current} of {len(ic.fields)}" consistently—update
the fmt.Sprintf invocation that builds qLabel (and any related uses of
f.question.Index for progress display) to reference ic.cursor + 1 instead of
f.question.Index.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@agent/config.go`:
- Around line 90-102: The current validation only checks ResponseSchema when
ResponseFormat == "json_schema", allowing a non-empty ResponseSchema to be
accepted when ResponseFormat is empty or "json_object"; add a guard to reject
orphaned ResponseSchema by returning an error if c.ResponseSchema != "" and
c.ResponseFormat != "json_schema". Update the validation block (around the
existing checks for c.ResponseFormat, c.ResponseSchema and json.Valid) to first
check that if c.ResponseSchema is non-empty then c.ResponseFormat must equal
"json_schema", otherwise return a clear error message referencing both
ResponseSchema and ResponseFormat.

In `@cmd/tracker/embed_test.go`:
- Around line 41-45: Rename the test function
TestListBuiltinWorkflowsReturnsThree to TestListBuiltinWorkflowsReturnsFour to
match the assertion expecting 4 workflows; update the function identifier in the
test file so the test name and its expectation (len(workflows) != 4) are
consistent and the test runner reflects the correct behavior.

In `@cmd/tracker/workflows/deep_review.dip`:
- Around line 365-368: Add an inline comment near the `ReviewPlan ->
WriteRemediationPlan` edge (the `revise` edge with `restart: true`) explaining
that the restart will be limited by the workflow's `max_restarts: 20` default to
prevent infinite revision loops; mention that `ReviewPlan` is `mode: freeform`
with exhaustive labeled edges (approve, revise, reject) so no extra failure
handling is required.
- Around line 25-30: The SaveGoal tool's command concatenates the expanded
${ctx.human_response} and runs it via sh -c without escaping, enabling shell
injection; update the expansion/execution path (where ExpandVariables() is used
for tool commands) to shell-escape or quote expanded values before invoking sh
-c (or avoid sh -c and pass arguments directly), and/or apply input validation
on human_response at the source; specifically, modify the code handling tool
command execution (the code that calls ExpandVariables() and then executes the
command via sh -c) to use a safe escaping routine or to build argv arrays so
metacharacters in human_response cannot be interpreted, mirroring the existing
working_dir validation approach in pipeline/handlers/tool.go.

In `@docs/superpowers/plans/2026-03-30-interview-mode.md`:
- Around line 478-483: The test uses a runtime type assertion on a concrete
*AutoApproveFreeformInterviewer (ii, ok := ai.(InterviewInterviewer)), which is
invalid in Go because assertions only work on interface values; replace that
assertion with a compile-time type assignment so the compiler will check that
AutoApproveFreeformInterviewer implements InterviewInterviewer (assign ai to a
variable of type InterviewInterviewer), then call ii.AskInterview(...) once
AskInterview is implemented.

In `@docs/superpowers/plans/2026-03-30-p0-critical-safety-fixes.md`:
- Line 15: The heading "### Task 1: Commit existing parallel context fix (`#20`)
and goal-gate retry fix (`#15`)" jumps from the document-level `#` to `###`
(MD001); change this to a `##` heading or insert an intermediate `##` section
above it so headings progress properly. Update the "Task 1: Commit existing
parallel context fix (`#20`) and goal-gate retry fix (`#15`)" line to use `##` (or
add a new `##` container heading before the `###`) to resolve the MD001 rule.
- Around line 1393-1404: The fenced dependency-graph block that starts with ```
(the tree listing of Task 1 → Task 10) needs an explicit language to satisfy
markdownlint MD040; update the opening fence from ``` to ```text so the block is
fenced as a text code block (leave the block contents unchanged and keep the
closing ```).

In `@examples/subgraphs/interview-loop.dip`:
- Around line 37-38: The current Interviewer output of {"questions": []} still
routes to the Answer node and triggers freeform input; update the graph so that
when Interviewer emits an empty question list it does not continue to Answer:
either add a conditional branch that checks for an empty "questions" array and
routes to a completion/exit node (short-circuiting the normal Answer path), or
change Interviewer to emit an explicit signal (e.g., {"complete": true} plus no
"questions") and alter downstream edges (the transitions from Interviewer to
Answer) to inspect that flag and bypass Answer when present; update all
equivalent places where Interviewer returns empty questions (the other two
branches mentioned) to use the same branching or explicit-complete signal.

In `@llm/anthropic/adapter.go`:
- Around line 118-120: The code currently logs the full raw provider response
via log.Printf when TRACKER_DEBUG is set, which can leak sensitive user/tool
data; update the logging in the anthropic response handling (where respBody is
used and log.Printf is called) to avoid printing cleartext: compute a stable
fingerprint (e.g., SHA-256) of respBody and log the hex prefix plus the byte
length, and if you must include content for diagnostics log only a small
redacted/truncated snippet (e.g., first 100–200 bytes) with sensitive characters
masked or replaced, ensuring the full respBody is never emitted; keep the
TRACKER_DEBUG conditional but replace the direct string interpolation of
respBody with these hashed/truncated diagnostics.

In `@pipeline/engine_checkpoint.go`:
- Around line 195-210: Update the function comment that currently says
fallback_target is only used when retries are exhausted to state that
fallback_target (both node-level and graph-level: node.Attrs["fallback_target"]
and e.graph.Attrs["fallback_target"]) is now included in the retry candidate
list and therefore can act as a retry destination when retries remain as well as
an escalation destination when retries are exhausted; mention that this behavior
is exercised by TestGoalGateFallbackTargetAttributeRecognized and that pipelines
using both retry_target and fallback_target together remain unaffected.

In `@pipeline/engine_goal_gate_test.go`:
- Around line 153-198: The test uses "fallback_target" but should use
"retry_target" on the "work" node to represent the retry hop; update the Attrs
map in TestGoalGateFallbackTargetAttributeRecognized (node ID "work") to set
"retry_target":"repair" instead of "fallback_target", and tighten the assertion
on workAttempts (the workAttempts variable) to require the expected retry count
(e.g., assert workAttempts == 2) so the test fails if retries didn't occur
exactly as intended.
- Around line 146-150: The test currently allows a green exit by checking
result.Status against OutcomeFail OR OutcomeSuccess; change it to require only
OutcomeFail so an engine that escalates but incorrectly exits success fails the
test. Locate the assertion using result.Status and OutcomeSuccess/OutcomeFail in
the failing test (around the terminal-check after escalation), replace the
boolean OR condition with a single check for OutcomeFail, and update the
t.Fatalf message to state "want fail" (or similar) so the intent is clear.

In `@pipeline/handlers/interview_result.go`:
- Around line 74-90: The code treats whitespace-only answers as non-empty;
update both places that check q.Answer to trim whitespace before testing: use
strings.TrimSpace(q.Answer) when deciding to render "(skipped)" in the first
loop (the block that writes Answer/Elaboration via b.WriteString) and likewise
use strings.TrimSpace(q.Answer) when counting answered in the loop that computes
answered/len(r.Questions); ensure strings is imported if not already.

In `@tui/interviewer.go`:
- Around line 227-233: In askMode2Interview, when receiving from the ReplyCh
created for MsgGateInterview, treat a closed channel (ok==false) as a TUI
failure instead of a user cancel: replace the current return of
&handlers.InterviewResult{Canceled: true} with returning a non-nil error that
indicates the TUI closed before replying (for example errors.New or fmt.Errorf),
so callers can distinguish a crashed/closed TUI from an intentional cancel;
update callers of askMode2Interview to handle this error path if necessary.

---

Nitpick comments:
In @.pre-commit:
- Around line 94-98: The current pre-commit hook hides the output of the make
check-workflows target by redirecting stdout/stderr to /dev/null, so on failure
developers only see a generic message; update the conditional that runs
check-workflows (the make check-workflows invocation in the if/else block) to
not discard its output so that mismatch details are printed on failure—i.e.,
remove the ">/dev/null 2>&1" redirection or change the logic to capture output
only on success and echo the captured output when the command fails.

In `@docs/requests/dippin-lang-structured-output.md`:
- Around line 136-141: The fenced code block containing the validation rules
(DIP_NEW_001, DIP_NEW_002, DIP_NEW_003, DIP_LINT_NEW) lacks a language
specifier; update the opening triple-backtick to include a language such as
"text" or "plaintext" (e.g., change ``` to ```text) so the block is consistently
formatted and linting-compliant.

In `@pipeline/handlers/human_test.go`:
- Around line 659-747: Add a test that covers selecting "Other" by numeric index
for ConsoleInterviewer.AskInterview: create a new test (e.g.,
TestConsoleInterviewer_AskInterview_OtherByNumber) that supplies input selecting
the options length+1 (e.g., "3\nCustom value\n"), constructs a
ConsoleInterviewer with that Reader/Writer, calls AskInterview with a Question
that has two options, and asserts the returned InterviewResult contains the
freeform "Custom value" as the answer; this ensures the code path handling the
numeric "Other" choice (len(options)+1) is exercised and validated.

In `@pipeline/handlers/human.go`:
- Around line 285-299: Consolidate the duplicate fmt.Sscanf calls by parsing
input into idx exactly once and reusing the result: in the branch where
!matched, call fmt.Sscanf(input, "%d", &idx) a single time, check err==nil and
then handle three cases (1) idx between 1 and len(q.Options) -> set ans.Answer =
q.Options[idx-1], (2) idx == len(q.Options)+1 -> prompt via
fmt.Fprintf(c.Writer, "Enter your answer: "), call otherLine, otherErr :=
c.readLine() and set ans.Answer = strings.TrimSpace(otherLine) on no error, and
(3) otherwise -> fall back to treating input as freeform ans.Answer = input;
keep variable names matched, input, q.Options, ans.Answer and c.readLine as in
the diff.

In `@tui/interview_content.go`:
- Around line 635-636: The Q label currently uses f.question.Index with
len(ic.fields), which can mismatch if indices aren’t sequential or some
questions are filtered; change the label construction to use the current cursor
position (ic.cursor) for the displayed ordinal (use ic.cursor + 1) so the
progress reads "Q{current} of {len(ic.fields)}" consistently—update the
fmt.Sprintf invocation that builds qLabel (and any related uses of
f.question.Index for progress display) to reference ic.cursor + 1 instead of
f.question.Index.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4090bf60-dd6c-47fa-b9b4-3a9912fcc729

📥 Commits

Reviewing files that changed from the base of the PR and between e7edb0c and b78a4a1.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (55)
  • .pre-commit
  • CHANGELOG.md
  • CLAUDE.md
  • Makefile
  • README.md
  • agent/config.go
  • agent/config_test.go
  • agent/session.go
  • agent/session_run.go
  • cmd/tracker/embed_test.go
  • cmd/tracker/workflows/deep_review.dip
  • docs/requests/dippin-lang-structured-output.md
  • docs/site/changelog.html
  • docs/site/workflows.html
  • docs/superpowers/plans/2026-03-30-interview-mode.md
  • docs/superpowers/plans/2026-03-30-p0-critical-safety-fixes.md
  • docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md
  • examples/ask_and_execute.dip
  • examples/build_product.dip
  • examples/build_product_with_superspec.dip
  • examples/deep_review.dip
  • examples/subgraphs/interview-loop.dip
  • go.mod
  • llm/anthropic/adapter.go
  • pipeline/context.go
  • pipeline/dippin_adapter.go
  • pipeline/dippin_adapter_test.go
  • pipeline/engine_checkpoint.go
  • pipeline/engine_goal_gate_test.go
  • pipeline/engine_run.go
  • pipeline/events.go
  • pipeline/events_test.go
  • pipeline/handlers/autopilot.go
  • pipeline/handlers/autopilot_claudecode.go
  • pipeline/handlers/autopilot_claudecode_test.go
  • pipeline/handlers/autopilot_test.go
  • pipeline/handlers/codergen.go
  • pipeline/handlers/human.go
  • pipeline/handlers/human_test.go
  • pipeline/handlers/interview_integration_test.go
  • pipeline/handlers/interview_parse.go
  • pipeline/handlers/interview_parse_test.go
  • pipeline/handlers/interview_result.go
  • pipeline/handlers/interview_result_test.go
  • pipeline/handlers/parallel.go
  • pipeline/handlers/tool_test.go
  • tui/app.go
  • tui/autopilot_interviewer.go
  • tui/autopilot_interviewer_test.go
  • tui/interview_content.go
  • tui/interview_content_test.go
  • tui/interviewer.go
  • tui/interviewer_test.go
  • tui/messages.go
  • tui/modal.go
✅ Files skipped from review due to trivial changes (8)
  • go.mod
  • examples/ask_and_execute.dip
  • pipeline/events_test.go
  • pipeline/events.go
  • examples/build_product_with_superspec.dip
  • examples/build_product.dip
  • agent/config_test.go
  • CLAUDE.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • pipeline/context.go
  • pipeline/handlers/parallel.go

Comment on lines +90 to +102
if c.ResponseFormat != "" {
if c.ResponseFormat != "json_object" && c.ResponseFormat != "json_schema" {
return fmt.Errorf("ResponseFormat must be \"json_object\" or \"json_schema\", got %q", c.ResponseFormat)
}
if c.ResponseFormat == "json_schema" {
if c.ResponseSchema == "" {
return fmt.Errorf("ResponseSchema must be non-empty when ResponseFormat is \"json_schema\"")
}
if !json.Valid([]byte(c.ResponseSchema)) {
return fmt.Errorf("ResponseSchema must be valid JSON, got %q", c.ResponseSchema)
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reject orphaned ResponseSchema to prevent silent misconfiguration

Line 90 onward validates schema only for json_schema, but a non-empty ResponseSchema is currently accepted when ResponseFormat is unset or json_object. That silently drops intent.

🔧 Suggested validation guard
+	if c.ResponseSchema != "" && c.ResponseFormat != "json_schema" {
+		return fmt.Errorf("ResponseSchema requires ResponseFormat to be %q", "json_schema")
+	}
 	if c.ResponseFormat != "" {
 		if c.ResponseFormat != "json_object" && c.ResponseFormat != "json_schema" {
 			return fmt.Errorf("ResponseFormat must be \"json_object\" or \"json_schema\", got %q", c.ResponseFormat)
 		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if c.ResponseFormat != "" {
if c.ResponseFormat != "json_object" && c.ResponseFormat != "json_schema" {
return fmt.Errorf("ResponseFormat must be \"json_object\" or \"json_schema\", got %q", c.ResponseFormat)
}
if c.ResponseFormat == "json_schema" {
if c.ResponseSchema == "" {
return fmt.Errorf("ResponseSchema must be non-empty when ResponseFormat is \"json_schema\"")
}
if !json.Valid([]byte(c.ResponseSchema)) {
return fmt.Errorf("ResponseSchema must be valid JSON, got %q", c.ResponseSchema)
}
}
}
if c.ResponseSchema != "" && c.ResponseFormat != "json_schema" {
return fmt.Errorf("ResponseSchema requires ResponseFormat to be %q", "json_schema")
}
if c.ResponseFormat != "" {
if c.ResponseFormat != "json_object" && c.ResponseFormat != "json_schema" {
return fmt.Errorf("ResponseFormat must be \"json_object\" or \"json_schema\", got %q", c.ResponseFormat)
}
if c.ResponseFormat == "json_schema" {
if c.ResponseSchema == "" {
return fmt.Errorf("ResponseSchema must be non-empty when ResponseFormat is \"json_schema\"")
}
if !json.Valid([]byte(c.ResponseSchema)) {
return fmt.Errorf("ResponseSchema must be valid JSON, got %q", c.ResponseSchema)
}
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent/config.go` around lines 90 - 102, The current validation only checks
ResponseSchema when ResponseFormat == "json_schema", allowing a non-empty
ResponseSchema to be accepted when ResponseFormat is empty or "json_object"; add
a guard to reject orphaned ResponseSchema by returning an error if
c.ResponseSchema != "" and c.ResponseFormat != "json_schema". Update the
validation block (around the existing checks for c.ResponseFormat,
c.ResponseSchema and json.Valid) to first check that if c.ResponseSchema is
non-empty then c.ResponseFormat must equal "json_schema", otherwise return a
clear error message referencing both ResponseSchema and ResponseFormat.

Comment on lines +25 to +30
tool SaveGoal
label: "Snapshot the developer's goal"
command:
mkdir -p .ai/review
printf '%s\n' "${ctx.human_response}" > .ai/review/goal.txt
echo "Goal saved to .ai/review/goal.txt"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how tool command interpolation handles escaping
rg -n -B 2 -A 10 'ExpandVariables|interpolat' pipeline/*.go | head -60

Repository: 2389-research/tracker

Length of output: 3057


🏁 Script executed:

# Get the full ExpandVariables implementation
cat -n pipeline/expand.go | head -100

Repository: 2389-research/tracker

Length of output: 3574


🏁 Script executed:

# Check how tool commands are executed and if escaping is applied
rg -n -B 5 -A 15 'command:' cmd/tracker/workflows/deep_review.dip

Repository: 2389-research/tracker

Length of output: 879


🏁 Script executed:

# Search for shell command execution logic in the pipeline
rg -n -B 3 -A 10 'exec|sh|bash|cmd' pipeline/*.go | grep -i 'command\|exec\|shell' | head -40

Repository: 2389-research/tracker

Length of output: 4340


🏁 Script executed:

# Look for any shell-specific escaping or quote handling functions
rg -n 'ShellQuote|EscapeShell|shlex|Quote' pipeline/*.go

Repository: 2389-research/tracker

Length of output: 621


🏁 Script executed:

# Search for tool handler implementation
rg -n -B 5 -A 20 'type.*Tool.*Handler|func.*Tool.*Execute' pipeline/*.go

Repository: 2389-research/tracker

Length of output: 47


🏁 Script executed:

# Search for where shell commands are executed
rg -n -B 3 -A 10 'exec.Command|os.Sh|sh\s+-c' pipeline/*.go

Repository: 2389-research/tracker

Length of output: 47


🏁 Script executed:

# Look for any recent changes or protections added to tool commands
git log --oneline --all | grep -i 'tool\|escap\|protect' | head -20

Repository: 2389-research/tracker

Length of output: 1573


🏁 Script executed:

# Show the changes in the key commits
git show 830dafc --stat

Repository: 2389-research/tracker

Length of output: 1409


🏁 Script executed:

git show 1f1c98f --stat

Repository: 2389-research/tracker

Length of output: 795


🏁 Script executed:

# Look for tool handler implementation
find . -name '*.go' -type f | xargs grep -l 'tool' | grep handler

Repository: 2389-research/tracker

Length of output: 482


🏁 Script executed:

# Search for tool execution more broadly
rg -n 'type.*Handler' pipeline/*.go | grep -i tool

Repository: 2389-research/tracker

Length of output: 47


🏁 Script executed:

# Read the tool handler implementation
cat -n pipeline/handlers/tool.go

Repository: 2389-research/tracker

Length of output: 5237


Shell command executes expanded variables without escaping — shell injection risk.

The SaveGoal tool expands ${ctx.human_response} via ExpandVariables() and passes the result directly to sh -c. The variable expansion layer applies no shell escaping (see pipeline/expand.go), so shell metacharacters in human_response (e.g., $(rm -rf /), backticks, quotes) will be interpreted by the shell. While the working_dir attribute has metacharacter validation (pipeline/handlers/tool.go lines 64-72), the expanded tool command itself has no such protection. Ensure all expanded variables in tool commands are properly shell-escaped before execution, or constrain input values at the source.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/tracker/workflows/deep_review.dip` around lines 25 - 30, The SaveGoal
tool's command concatenates the expanded ${ctx.human_response} and runs it via
sh -c without escaping, enabling shell injection; update the expansion/execution
path (where ExpandVariables() is used for tool commands) to shell-escape or
quote expanded values before invoking sh -c (or avoid sh -c and pass arguments
directly), and/or apply input validation on human_response at the source;
specifically, modify the code handling tool command execution (the code that
calls ExpandVariables() and then executes the command via sh -c) to use a safe
escaping routine or to build argv arrays so metacharacters in human_response
cannot be interpreted, mirroring the existing working_dir validation approach in
pipeline/handlers/tool.go.

Comment on lines +365 to +368
WriteRemediationPlan -> ReviewPlan
ReviewPlan -> ApplyFormat label: "approve"
ReviewPlan -> WriteRemediationPlan label: "revise" restart: true
ReviewPlan -> Done label: "reject"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if there's default failure handling in the pipeline engine for nodes without fail edges
rg -n -A 5 'OutcomeFail|outcome.*fail' pipeline/engine*.go | head -50

Repository: 2389-research/tracker

Length of output: 2838


🏁 Script executed:

cat -n cmd/tracker/workflows/deep_review.dip | head -400 | tail -100

Repository: 2389-research/tracker

Length of output: 4359


🏁 Script executed:

# Check freeform mode handling and OutcomeFail scenarios
rg -n "mode.*freeform|freeform.*outcome" pipeline/ cmd/tracker/

Repository: 2389-research/tracker

Length of output: 1540


🏁 Script executed:

# Look for max_restarts defaults
rg -n "max_restarts|defaults" cmd/tracker/workflows/deep_review.dip

Repository: 2389-research/tracker

Length of output: 100


🏁 Script executed:

sed -n '380,440p' pipeline/handlers/human.go

Repository: 2389-research/tracker

Length of output: 2341


🏁 Script executed:

sed -n '440,480p' pipeline/handlers/human.go

Repository: 2389-research/tracker

Length of output: 1376


Add comment documenting the restart limit for the revision loop.

The revise edge loops back to WriteRemediationPlan with restart: true. The max_restarts: 20 default provides a safety limit to prevent infinite revision loops — consider adding a comment to document this behavior explicitly.

Note: ReviewPlan is a mode: freeform node with labeled edges (approve, revise, reject) that exhaustively route all responses, so no additional failure handling is needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/tracker/workflows/deep_review.dip` around lines 365 - 368, Add an inline
comment near the `ReviewPlan -> WriteRemediationPlan` edge (the `revise` edge
with `restart: true`) explaining that the restart will be limited by the
workflow's `max_restarts: 20` default to prevent infinite revision loops;
mention that `ReviewPlan` is `mode: freeform` with exhaustive labeled edges
(approve, revise, reject) so no extra failure handling is required.

Comment on lines +478 to +483
func TestAutoApproveInterviewInterviewer(t *testing.T) {
ai := &AutoApproveFreeformInterviewer{}
// Type-assert to InterviewInterviewer should work
ii, ok := ai.(InterviewInterviewer) // Will fail until implemented
result, err := ii.AskInterview(questions, nil)
// All answers empty, incomplete=false, canceled=false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n docs/superpowers/plans/2026-03-30-interview-mode.md | sed -n '475,490p'

Repository: 2389-research/tracker

Length of output: 652


🌐 Web query:

Go type assertion concrete type vs interface type rules

💡 Result:

In Go, type assertions x.(T) on an interface value x have different rules depending on whether the target type T is a concrete (non-interface) type or an interface type. For concrete T (not an interface): - Asserts that the dynamic type of x is identical to T. - T must implement x's static interface type (compile-time check); otherwise invalid. - If true, returns the stored value with static type T; else panics (unless comma-ok form used). For interface T: - Asserts that the dynamic type of x implements the interface T. - Succeeds if x's dynamic type satisfies T's type set. - Returns the value wrapped in interface T. Examples: Concrete: var x any = 7; i := x.(int) // succeeds, i is int Interface: type Fooer interface{Foo}; type Thing int; func(t Thing)Foo{}; var y any = Thing(3); f := y.(Fooer) // succeeds Always prefer comma-ok form v, ok := x.(T) for safety to avoid panics. Sources: Go spec (6,13,25), Boldly Go analyses (12,20), GeeksforGeeks/Rednafi (2,3).

Citations:


This interface-check example is invalid Go.

Line 481 uses a type assertion on a concrete *AutoApproveFreeformInterviewer type, but Go type assertions only work on interface values. This code will never compile, even after AskInterview is implemented. Replace with an assignment to perform the compile-time interface check idiomatically:

var ii InterviewInterviewer = ai // compile-time check once implemented
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/plans/2026-03-30-interview-mode.md` around lines 478 - 483,
The test uses a runtime type assertion on a concrete
*AutoApproveFreeformInterviewer (ii, ok := ai.(InterviewInterviewer)), which is
invalid in Go because assertions only work on interface values; replace that
assertion with a compile-time type assignment so the compiler will check that
AutoApproveFreeformInterviewer implements InterviewInterviewer (assign ai to a
variable of type InterviewInterviewer), then call ii.AskInterview(...) once
AskInterview is implemented.

Comment on lines +153 to +198
func TestGoalGateFallbackTargetAttributeRecognized(t *testing.T) {
// Verify that "fallback_target" (not just "fallback_retry_target") is recognized
// as a retry target when retries remain.
g := NewGraph("goal-gate-attr")

g.AddNode(&Node{ID: "start", Shape: "Mdiamond"})
g.AddNode(&Node{ID: "work", Shape: "box", Attrs: map[string]string{
"goal_gate": "true",
"fallback_target": "repair",
}})
g.AddNode(&Node{ID: "repair", Shape: "box"})
g.AddNode(&Node{ID: "done", Shape: "Msquare"})

g.AddEdge(&Edge{From: "start", To: "work"})
g.AddEdge(&Edge{From: "work", To: "done", Condition: "ctx.outcome = success"})
g.AddEdge(&Edge{From: "work", To: "done", Condition: "ctx.outcome = fail"})
g.AddEdge(&Edge{From: "repair", To: "work"})

reg := newTestRegistry()
workAttempts := 0
reg.Register(&testHandler{
name: "codergen",
executeFn: func(ctx context.Context, node *Node, pctx *PipelineContext) (Outcome, error) {
if node.ID == "work" {
workAttempts++
if workAttempts == 1 {
return Outcome{Status: OutcomeFail}, nil
}
return Outcome{Status: OutcomeSuccess}, nil
}
return Outcome{Status: OutcomeSuccess}, nil
},
})

engine := NewEngine(g, reg)
result, err := engine.Run(context.Background())
if err != nil {
t.Fatalf("engine.Run failed: %v", err)
}
if result.Status != OutcomeSuccess {
t.Fatalf("status = %q, want %q", result.Status, OutcomeSuccess)
}
if workAttempts < 2 {
t.Fatalf("work ran %d times, expected at least 2", workAttempts)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use retry_target for the repair hop in this test.

This test currently treats fallback_target as the normal retry path, but the PR contract describes fallback as the route taken after retries are exhausted. The workAttempts < 2 assertion on Line 195 is also too weak; extra loops still pass.

💡 Suggested fix
-func TestGoalGateFallbackTargetAttributeRecognized(t *testing.T) {
-	// Verify that "fallback_target" (not just "fallback_retry_target") is recognized
-	// as a retry target when retries remain.
+func TestGoalGateRetryTargetAttributeRecognized(t *testing.T) {
+	// Verify that "retry_target" is used while retries remain.
 	g := NewGraph("goal-gate-attr")
@@
 	g.AddNode(&Node{ID: "work", Shape: "box", Attrs: map[string]string{
-		"goal_gate":       "true",
-		"fallback_target": "repair",
+		"goal_gate":    "true",
+		"retry_target": "repair",
 	}})
@@
-	if workAttempts < 2 {
-		t.Fatalf("work ran %d times, expected at least 2", workAttempts)
+	if workAttempts != 2 {
+		t.Fatalf("work ran %d times, want 2", workAttempts)
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/engine_goal_gate_test.go` around lines 153 - 198, The test uses
"fallback_target" but should use "retry_target" on the "work" node to represent
the retry hop; update the Attrs map in
TestGoalGateFallbackTargetAttributeRecognized (node ID "work") to set
"retry_target":"repair" instead of "fallback_target", and tighten the assertion
on workAttempts (the workAttempts variable) to require the expected retry count
(e.g., assert workAttempts == 2) so the test fails if retries didn't occur
exactly as intended.

Comment on lines +227 to +233
func (b *BubbleteaInterviewer) askMode2Interview(questions []handlers.Question, prev *handlers.InterviewResult) (*handlers.InterviewResult, error) {
ch := make(chan string, 1)
b.send(MsgGateInterview{Questions: questions, Previous: prev, ReplyCh: ch})
reply, ok := <-ch
if !ok {
return &handlers.InterviewResult{Canceled: true}, nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Treat a closed interview reply channel as a TUI failure, not a cancel.

The existing choice/freeform gates return an error when the TUI program closes before replying. Returning Canceled: true here makes a crashed TUI or broken modal wiring indistinguishable from an intentional user cancel.

💡 Suggested fix
 func (b *BubbleteaInterviewer) askMode2Interview(questions []handlers.Question, prev *handlers.InterviewResult) (*handlers.InterviewResult, error) {
 	ch := make(chan string, 1)
 	b.send(MsgGateInterview{Questions: questions, Previous: prev, ReplyCh: ch})
 	reply, ok := <-ch
 	if !ok {
-		return &handlers.InterviewResult{Canceled: true}, nil
+		return nil, fmt.Errorf("TUI program closed before responding to interview gate")
 	}
 	result, err := handlers.DeserializeInterviewResult(reply)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (b *BubbleteaInterviewer) askMode2Interview(questions []handlers.Question, prev *handlers.InterviewResult) (*handlers.InterviewResult, error) {
ch := make(chan string, 1)
b.send(MsgGateInterview{Questions: questions, Previous: prev, ReplyCh: ch})
reply, ok := <-ch
if !ok {
return &handlers.InterviewResult{Canceled: true}, nil
}
func (b *BubbleteaInterviewer) askMode2Interview(questions []handlers.Question, prev *handlers.InterviewResult) (*handlers.InterviewResult, error) {
ch := make(chan string, 1)
b.send(MsgGateInterview{Questions: questions, Previous: prev, ReplyCh: ch})
reply, ok := <-ch
if !ok {
return nil, fmt.Errorf("TUI program closed before responding to interview gate")
}
result, err := handlers.DeserializeInterviewResult(reply)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tui/interviewer.go` around lines 227 - 233, In askMode2Interview, when
receiving from the ReplyCh created for MsgGateInterview, treat a closed channel
(ok==false) as a TUI failure instead of a user cancel: replace the current
return of &handlers.InterviewResult{Canceled: true} with returning a non-nil
error that indicates the TUI closed before replying (for example errors.New or
fmt.Errorf), so callers can distinguish a crashed/closed TUI from an intentional
cancel; update callers of askMode2Interview to handle this error path if
necessary.

- Resolve merge conflict in human.go (keep IsYesNo-first ordering)
- Move FallbackTaken guard to dedicated Checkpoint field (persists
  through saveCheckpoint which overwrites cp.Context with pctx.Snapshot)
- Update goalGateRetryTarget doc to describe fallback_target dual role
- Tighten fallback test assertion to require OutcomeFail only
- Truncate debug response logging to 200 bytes (avoid leaking payloads)
- Treat whitespace-only interview answers as skipped
- Fix markdown heading level jump (MD001) and code fence language (MD040)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants