fix: P0 critical safety fixes — retry budgets, context diffing, parallel capture by clintecker · Pull Request #48 · 2389-research/tracker

clintecker · 2026-03-31T15:18:45Z

Summary

Goal-gate retry budgets: goalGateRetryTarget now checks RetryCount against maxRetries before allowing another loop. When exhausted, routes to fallback_target or signals failure with EventStageFailed.
Context diffing: New PipelineContext.DiffFrom() method detects side-effect writes (pctx.Set()) that handlers make directly, not just values returned via Outcome.ContextUpdates.
Parallel side-effect capture: The parallel handler now merges DiffFrom(snapshot) with explicit ContextUpdates, preventing silent data loss from branch handlers that write to context as a side effect.
Event emission: EventStageRetrying and EventStageFailed events are now emitted for goal-gate retry loops, giving the TUI and trace visibility into retry behavior.
Includes design spec and implementation plan docs.

Test plan

TestContextDiffFrom* — covers new keys, changed keys, unchanged exclusion, empty baseline/context
TestParallelHandlerCapturesSideEffectWrites — verifies side-effect-only writes are captured
TestParallelHandlerExplicitOverridesSideEffects — verifies explicit ContextUpdates win over side-effects
pipeline/engine_goal_gate_test.go — new file covering goal-gate retry budget and fallback routing
go test ./... -short — all packages pass

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added goal-gate fallback routing and escalation loop prevention with checkpoint-persisted tracking.
- Enhanced parallel branch context handling to capture implicit side effects alongside explicit updates.
- Improved interview question prompting for yes/no scenarios.
Bug Fixes
- Enforce per-goal-gate retry budgets with proper exhaustion detection.
- Treat whitespace-only interview answers as skipped.
Documentation
- Added P0 critical safety fixes implementation plan and design specification covering circuit breaker, atomic checkpoints, and tool-command injection protection.

Design spec informed by 5-expert adversarial review panel. 10-task implementation plan covering #15, #16, #20, plus atomic checkpoints and global circuit breaker. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…effect capture - Add DiffFrom() to PipelineContext for detecting side-effect writes - Parallel handler now captures pctx.Set() calls, not just Outcome.ContextUpdates - Goal-gate retry checks budget before looping; routes to fallback when exhausted - Emit EventStageRetrying/EventStageFailed events for goal-gate retries - Add comprehensive tests for all three fixes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-03-31T15:21:56Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 98ad7c10-a9b2-4e03-9198-a6f6be438192

📥 Commits

Reviewing files that changed from the base of the PR and between b78a4a1 and 8cd1727.

📒 Files selected for processing (8)

cmd/tracker/embed_test.go
docs/superpowers/plans/2026-03-30-p0-critical-safety-fixes.md
llm/anthropic/adapter.go
pipeline/checkpoint.go
pipeline/engine_checkpoint.go
pipeline/engine_goal_gate_test.go
pipeline/handlers/human.go
pipeline/handlers/interview_result.go

Walkthrough

This PR implements P0 critical safety fixes for the pipeline engine, including: adding context diffing to capture and merge side-effect writes from parallel branches, enforcing per-goal-gate retry budgets with fallback routing and checkpoint-persisted fallback tracking, adding corresponding tests, and refining interview handler logic.

Changes

Cohort / File(s)	Summary
Documentation & Planning `docs/superpowers/plans/2026-03-30-p0-critical-safety-fixes.md`, `docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md`	Added implementation plan and design specification for four P0 pipeline-engine safety issues: parallel context correctness via `DiffFrom` mechanism, goal-gate retry termination with fallback tracking, tool_command interpolation safety, and atomic checkpoint persistence.
Pipeline Context & Diffing `pipeline/context.go`, `pipeline/context_test.go`	Added exported constants `ContextKeyInterviewQuestions` and `ContextKeyInterviewAnswers`, and new `DiffFrom(baseline)` method to compute context changes against a snapshot; includes five unit tests covering key-present, value-changed, unchanged, empty baseline, and empty context scenarios.
Parallel Context Merging `pipeline/handlers/parallel.go`, `pipeline/handlers/parallel_test.go`	Updated `ParallelHandler.runBranch` to merge explicit `ContextUpdates` with implicit side-effect writes captured via `DiffFrom`, with tests validating side-effect inclusion and explicit-update override semantics.
Goal-Gate Retry Logic `pipeline/engine_checkpoint.go`, `pipeline/engine_run.go`, `pipeline/engine_goal_gate_test.go`	Expanded `goalGateRetryTarget` signature to return goal-gate node ID; enforced per-node retry budgets and fallback routing guarded by checkpoint-persisted `FallbackTaken` tracking; updated `handleExitNode` to emit tagged events and route to fallback targets; added comprehensive tests for retry exhaustion, node-level max_retries, fallback redirection, and escalation paths.
Checkpoint State `pipeline/checkpoint.go`	Added new `FallbackTaken map[string]bool` field to track which goal-gate nodes have consumed their one-shot fallback route across checkpoint restore cycles.
Interview Handler `pipeline/handlers/human.go`, `pipeline/handlers/interview_result.go`	Reordered `ConsoleInterviewer.AskInterview` to evaluate `IsYesNo` first; updated `BuildMarkdownSummary` to treat whitespace-only answers as empty when computing skipped-question indicators and answer counts.
Test Updates `cmd/tracker/embed_test.go`	Renamed builtin workflows count test from 3 to 4, reflecting addition of a new workflow.
LLM Adapter Logging `llm/anthropic/adapter.go`	Modified `TRACKER_DEBUG` logging to always emit a truncated (~200 char) body preview with `...(truncated)` suffix and include original byte length instead of conditionally logging full response.

Sequence Diagram(s)

sequenceDiagram
    participant Engine as Engine
    participant Checkpoint as Checkpoint
    participant Handler as HandleExitNode
    participant EventBus as Event System

    loop Until retry budget exhausted or retry succeeds
        Engine->>Engine: Evaluate goal-gate node outcome
        
        Note over Engine: Check if retries exhausted<br/>cp.RetryCount(goalGateNodeID) >= maxRetries
        
        alt Retries remain
            Engine->>Checkpoint: Check FallbackTaken[goalGateNodeID]
            Checkpoint-->>Engine: false (no prior fallback)
            Engine->>Engine: Select retry target<br/>(fallback_target/retry_target/...)
            Engine->>Handler: Route to retry target
            Handler->>Checkpoint: Increment RetryCount(goalGateNodeID)
            Handler->>EventBus: Emit EventStageRetrying<br/>(with attempt/limit from retry count)
            Handler->>Checkpoint: SaveCheckpoint()
            Handler-->>Engine: Continue with new target
        else Retries exhausted
            Engine->>Checkpoint: Check FallbackTaken[goalGateNodeID]
            Checkpoint-->>Engine: false (available)
            Engine->>Engine: Attempt one-shot fallback<br/>(node/graph fallback_target)
            alt Fallback configured & valid
                Engine->>Checkpoint: Set FallbackTaken[goalGateNodeID] = true
                Engine->>Handler: Route to fallback target
                Handler->>EventBus: Emit EventStageFailed<br/>(routing to fallback)
                Handler->>Checkpoint: SaveCheckpoint()
                Handler-->>Engine: Continue with fallback
            else No fallback or already taken
                Engine->>Handler: Terminal failure
                Handler->>EventBus: Emit EventStageFailed<br/>(unsatisfied)
                Handler-->>Engine: Return OutcomeFail
            end
        end
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related issues

Expert Panel Review: .dip Format Support — Consolidated Report #39: This PR directly implements the P0 critical safety fixes outlined in the issue, including parallel context loss fixes (via DiffFrom and parallel handler merging), goal-gate retry loop prevention (via FallbackTaken checkpoint tracking), and fallback routing with retry budget enforcement.

Possibly related PRs

feat: implement interview mode for human nodes (#46) #47: Both PRs modify the pipeline's interview handling surface (context keys, interview result formatting, and console interviewer logic) and are working toward consistent interview-related state management.

Poem

🐰 Contexts now diff and merge with care,
Retries respect their budget fair,
Fallbacks prevent loops in flight,
Checkpoints hold the truth so bright,
A safer pipeline, oh what a sight! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 17.88% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the three primary changes: retry budgets (goal-gate), context diffing (DiffFrom), and parallel capture (side-effect merging). It directly addresses the main objectives.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/p0-critical-safety-fixes

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e7edb0c598

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-31T15:23:04Z

pipeline/engine_checkpoint.go

+					continue
+				}
+				if _, ok := e.graph.Nodes[fb]; ok {
+					return fb, nodeID, true, true


Stop re-entering fallback loop after goal-gate retries exhaust

When retries are exhausted, this branch still returns shouldRetry=true for a fallback target, so handleExitNode treats escalation as another retry (increments retry count and clears only the fallback subtree). That leaves the failed goal-gate node completed and unsatisfied, so reaching exit again selects the same fallback repeatedly. In graphs where fallback does not re-run/satisfy the gate (for example fallback goes straight to done), the pipeline can loop indefinitely instead of terminating. This path needs a one-time fallback guard (or a non-retry return state) so subsequent exits fail once escalation has already been attempted.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (7)

docs/superpowers/plans/2026-03-30-p0-critical-safety-fixes.md (2)
1393-1404: Add language identifier to the dependency graph code block.

The ASCII art dependency graph would benefit from a language identifier (e.g., text or plaintext) to satisfy linting and improve rendering consistency.
-```
+```text
 Task 1 (commit existing work)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/plans/2026-03-30-p0-critical-safety-fixes.md` around lines
1393 - 1404, The ASCII-art dependency graph code block lacks a language
identifier; update the opening triple-backtick that begins the graph (the block
starting with "Task 1 (commit existing work)") to include a plain language tag
such as ```text or ```plaintext so the block becomes a fenced code block with a
language identifier and satisfies linters and consistent rendering.
14-15: Minor markdown structure issue.

The document jumps from h1 (#) directly to h3 (###). Consider using h2 (##) for the task sections, or adding an h2 section header before the tasks. This is a minor documentation consistency issue flagged by markdownlint.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/plans/2026-03-30-p0-critical-safety-fixes.md` around lines
14 - 15, The markdown jumps from an h1 to h3 (e.g., the existing "### Task 1:
Commit existing parallel context fix (`#20`) and goal-gate retry fix (`#15`)");
change those task headings to h2 (replace "###" with "##") or add an
intermediate h2 section header before the task list so the structure goes H1 ->
H2 -> H3 as appropriate (update the "### Task 1..." and any subsequent "### Task
n..." headings to "##" or insert a descriptive h2 preceding them to satisfy
markdownlint).
pipeline/engine_goal_gate_test.go (3)
83-92: Same issue: check error from engine.Run.
 	engine := NewEngine(g, reg)
-	result, _ := engine.Run(context.Background())
+	result, err := engine.Run(context.Background())
+	if err != nil {
+		t.Fatalf("unexpected engine.Run error: %v", err)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/engine_goal_gate_test.go` around lines 83 - 92, The test currently
ignores the error return from engine.Run; change the call to capture both result
and err (e.g., result, err := engine.Run(...)) and assert err is nil before
inspecting result.Status and workAttempts, failing the test with t.Fatalf
including err if non-nil; reference NewEngine and Engine.Run to locate the call
to update.
41-50: Check error from engine.Run to avoid masking failures.

The test ignores the error returned by engine.Run. While the test focuses on retry termination, an unexpected error would be silently ignored, potentially masking bugs.
 	engine := NewEngine(g, reg)
-	result, _ := engine.Run(context.Background())
+	result, err := engine.Run(context.Background())
+	if err != nil {
+		t.Fatalf("unexpected engine.Run error: %v", err)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/engine_goal_gate_test.go` around lines 41 - 50, The test currently
ignores the error returned by engine.Run; call engine.Run(ctx) and check the
returned error (the error from engine.Run) before asserting on result.Status and
workAttempts, failing the test (t.Fatalf) if err != nil to avoid masking
failures; update the block around NewEngine, engine.Run, result and workAttempts
so any non-nil err aborts the test with a clear message referencing engine.Run.
131-142: Test appears incomplete — result and error are ignored.

The test assigns result and err but then explicitly ignores them with _ = err and _ = result. The comment on lines 134-136 indicates uncertainty about expected behavior ("it depends on whether clearDownstream clears work").

Consider completing the test by:

Determining expected final status (likely OutcomeFail since the goal-gate remains unsatisfied)

Adding assertions for result.Status

This aligns with the implementation plan Task 2, Step 6 which specifies updating this test to assert result.Status == OutcomeFail.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/engine_goal_gate_test.go` around lines 131 - 142, The test currently
ignores result and err after calling engine := NewEngine(g, reg) and result, err
:= engine.Run(...); update the test to assert the final pipeline outcome: check
err is nil (or as expected) and assert result.Status == OutcomeFail
(import/reference OutcomeFail) because the goal-gate remains unsatisfied; keep
the existing escalateVisited check but replace the lines "_ = err" and "_ =
result" with concrete assertions on err and result.Status to validate the
expected failure outcome from Run.
docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md (2)
168-169: Document the risk of allow_unsafe_interpolation: all.

The blanket override allow_unsafe_interpolation: all could be misused to bypass all tainted interpolation protections. While this is acceptable given the stated threat model (line 24: "pipeline author is the trust boundary"), consider documenting in CLAUDE.md that:

Per-variable allowlisting is strongly preferred: allow_unsafe_interpolation: last_response

The all keyword should only be used when the pipeline author has manually audited the tool_command for injection safety

The DIP130 lint warning still fires to ensure conscious opt-in
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md` around
lines 168 - 169, Add documentation to CLAUDE.md describing the risks of using
the blanket allow_unsafe_interpolation: all override and recommended practices:
prefer per-variable allowlisting (e.g., allow_unsafe_interpolation:
last_response), only use the all keyword after manually auditing the
tool_command for injection safety, and note that the DIP130 lint warning will
still trigger to enforce conscious opt-in; update the section that references
allow_unsafe_interpolation and the threat model wording so readers see these
three points adjacent to the existing pipeline-author trust-boundary guidance.
143-177: Two expansion systems add complexity but fix is comprehensive.

The doc correctly identifies that having two independent variable expansion paths (ExpandGraphVariables in prepareExecNode and ExpandVariables in ToolHandler.Execute) creates a larger attack surface. Both must be gated to prevent bypasses.

The proposed solution addresses both systems:

ExpandVariables gets tainted-key checks for ${ctx.*} patterns

ExpandGraphVariables excludes tainted graph vars for tool_command

Line 174 correctly identifies the cross-contamination risk: an LLM node could write pctx.Set("graph.foo", malicious), so both layers need protection.

Consider adding a follow-up task to consolidate to a single expansion system with context-aware safety checks, reducing the dual-layer complexity long-term.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md` around
lines 143 - 177, The review points out unsafe variable expansion for
tool_command: add tainted-key checks and an allowlist to both expansion systems;
in ExpandVariables (used by ToolHandler.Execute) add an opts/wrapper (e.g.,
ExpandToolCommand) that rejects ${ctx.<tainted_key>} using the
taintedContextKeys map and returns the prescribed error message unless the
node's allow_unsafe_interpolation attribute (comma-separated or "all")
explicitly allows the key; in prepareExecNode/ExpandGraphVariables (when
building GraphVarMap) detect graph.<tainted_key> sources, build a
taintedGraphVars set and exclude those vars when expanding the "tool_command"
attribute (leave other attributes unchanged); reference functions/types:
ExpandVariables, ExpandToolCommand (new), ToolHandler.Execute, prepareExecNode,
ExpandGraphVariables, GraphVarMap, and the taintedContextKeys allowlist.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md`:
- Around line 190-196: Add a language specifier to the fenced code block titled
"Safe pattern for LLM output in tool commands" so markdownlint MD040 is
satisfied: change the opening triple backticks to include the shell language
(e.g., "bash") for the block that starts with "### Safe pattern for LLM output
in tool commands" and ensure the block remains properly closed with triple
backticks; this affects the code block surrounding the lines that start with "Do
NOT interpolate ${ctx.last_response}..." and the numbered steps.
- Around line 87-95: Add a FallbackTaken map[string]bool field (with
`json:"fallbackTaken"` tag) to the Checkpoint struct in pipeline/checkpoint.go,
ensure it's initialized during checkpoint creation/restore (so code that
constructs or unmarshals Checkpoint sets cp.FallbackTaken = map[string]bool{}
when nil), and update any checkpoint constructors/restore paths to preserve this
map across saves so the escalation-loop fix survives restarts; then implement
the TestGoalGateEscalationSurvivesCheckpointRestore test to create a Checkpoint,
mark a gate in cp.FallbackTaken, serialize/deserialize (or round-trip via the
existing checkpoint save/restore helpers) and assert the map value is preserved.

In `@pipeline/engine_checkpoint.go`:
- Around line 162-181: Add a FallbackTaken guard to the Checkpoint state: extend
the Checkpoint struct with a FallbackTaken map[string]bool; in the
fallback-return logic inside the retry-exhaustion branch (the loop that iterates
node.Attrs["fallback_target"], node.Attrs["fallback_retry_target"], etc.) check
cp.FallbackTaken[nodeID] before returning a fallback (skip that
fallback/escalation if true) and when you decide to return a fallback mark
cp.FallbackTaken[nodeID] = true so the same fallback won’t be re-taken
repeatedly; keep the existing behavior for returning the escalation/unsatisfied
path unchanged.

---

Nitpick comments:
In `@docs/superpowers/plans/2026-03-30-p0-critical-safety-fixes.md`:
- Around line 1393-1404: The ASCII-art dependency graph code block lacks a
language identifier; update the opening triple-backtick that begins the graph
(the block starting with "Task 1 (commit existing work)") to include a plain
language tag such as ```text or ```plaintext so the block becomes a fenced code
block with a language identifier and satisfies linters and consistent rendering.
- Around line 14-15: The markdown jumps from an h1 to h3 (e.g., the existing
"### Task 1: Commit existing parallel context fix (`#20`) and goal-gate retry fix
(`#15`)"); change those task headings to h2 (replace "###" with "##") or add an
intermediate h2 section header before the task list so the structure goes H1 ->
H2 -> H3 as appropriate (update the "### Task 1..." and any subsequent "### Task
n..." headings to "##" or insert a descriptive h2 preceding them to satisfy
markdownlint).

In `@docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md`:
- Around line 168-169: Add documentation to CLAUDE.md describing the risks of
using the blanket allow_unsafe_interpolation: all override and recommended
practices: prefer per-variable allowlisting (e.g., allow_unsafe_interpolation:
last_response), only use the all keyword after manually auditing the
tool_command for injection safety, and note that the DIP130 lint warning will
still trigger to enforce conscious opt-in; update the section that references
allow_unsafe_interpolation and the threat model wording so readers see these
three points adjacent to the existing pipeline-author trust-boundary guidance.
- Around line 143-177: The review points out unsafe variable expansion for
tool_command: add tainted-key checks and an allowlist to both expansion systems;
in ExpandVariables (used by ToolHandler.Execute) add an opts/wrapper (e.g.,
ExpandToolCommand) that rejects ${ctx.<tainted_key>} using the
taintedContextKeys map and returns the prescribed error message unless the
node's allow_unsafe_interpolation attribute (comma-separated or "all")
explicitly allows the key; in prepareExecNode/ExpandGraphVariables (when
building GraphVarMap) detect graph.<tainted_key> sources, build a
taintedGraphVars set and exclude those vars when expanding the "tool_command"
attribute (leave other attributes unchanged); reference functions/types:
ExpandVariables, ExpandToolCommand (new), ToolHandler.Execute, prepareExecNode,
ExpandGraphVariables, GraphVarMap, and the taintedContextKeys allowlist.

In `@pipeline/engine_goal_gate_test.go`:
- Around line 83-92: The test currently ignores the error return from
engine.Run; change the call to capture both result and err (e.g., result, err :=
engine.Run(...)) and assert err is nil before inspecting result.Status and
workAttempts, failing the test with t.Fatalf including err if non-nil; reference
NewEngine and Engine.Run to locate the call to update.
- Around line 41-50: The test currently ignores the error returned by
engine.Run; call engine.Run(ctx) and check the returned error (the error from
engine.Run) before asserting on result.Status and workAttempts, failing the test
(t.Fatalf) if err != nil to avoid masking failures; update the block around
NewEngine, engine.Run, result and workAttempts so any non-nil err aborts the
test with a clear message referencing engine.Run.
- Around line 131-142: The test currently ignores result and err after calling
engine := NewEngine(g, reg) and result, err := engine.Run(...); update the test
to assert the final pipeline outcome: check err is nil (or as expected) and
assert result.Status == OutcomeFail (import/reference OutcomeFail) because the
goal-gate remains unsatisfied; keep the existing escalateVisited check but
replace the lines "_ = err" and "_ = result" with concrete assertions on err and
result.Status to validate the expected failure outcome from Run.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4402dbcb-fcf7-4d6d-b95e-934e377dd785

📥 Commits

Reviewing files that changed from the base of the PR and between 0e62260 and e7edb0c.

📒 Files selected for processing (9)

docs/superpowers/plans/2026-03-30-p0-critical-safety-fixes.md
docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md
pipeline/context.go
pipeline/context_test.go
pipeline/engine_checkpoint.go
pipeline/engine_goal_gate_test.go
pipeline/engine_run.go
pipeline/handlers/parallel.go
pipeline/handlers/parallel_test.go

coderabbitai · 2026-03-31T15:27:19Z

docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md

+### Bug to Fix: Escalation Loop
+
+**Problem:** When retries exhaust and the fallback path is taken, `goalGateRetryTarget` returns `retry=true`. This causes `handleExitNode` to call `IncrementRetry` and `clearDownstream` for the escalation target. If the escalation node also fails (or doesn't satisfy the goal gate), the pipeline bounces between exit and escalation forever — the retry counter goes past max but the `>=` check routes to fallback every time.
+
+**Fix:** Track whether we've already taken the fallback path for a given gate. Add a `FallbackTaken map[string]bool` field to `Checkpoint`. When `goalGateRetryTarget` finds retries exhausted:
+- If `cp.FallbackTaken[gateNodeID]` is true → return `(_, _, false, true)` (no retry, unsatisfied → pipeline fails)
+- If false → set `cp.FallbackTaken[gateNodeID] = true`, return the fallback target
+
+This guarantees at most one escalation attempt per gate. Serialized in checkpoint JSON for crash safety.


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Verify FallbackTaken field exists in Checkpoint and is tested # Check if FallbackTaken field is defined in Checkpoint struct echo "=== Checking Checkpoint struct for FallbackTaken field ===" ast-grep --pattern 'type Checkpoint struct { $$$ }' # Check for checkpoint serialization tests echo -e "\n=== Checking for checkpoint round-trip tests ===" rg -n "FallbackTaken" --type=go # Verify test mentioned in line 122 exists echo -e "\n=== Verifying TestGoalGateEscalationSurvivesCheckpointRestore ===" rg -n "TestGoalGateEscalationSurvivesCheckpointRestore" --type=go

Repository: 2389-research/tracker

Length of output: 1584

Add FallbackTaken field to Checkpoint struct — escalation loop fix will not survive restarts without it.

The design document claims at line 95 that FallbackTaken is "serialized in checkpoint JSON for crash safety," but verification reveals the field is missing from the Checkpoint struct in pipeline/checkpoint.go. Without persisting this map across checkpoints, the escalation loop prevention resets on restart and infinite bouncing can resume.

Required actions:

Add FallbackTaken map[string]bool field to the Checkpoint struct with JSON serialization tag

Implement the test TestGoalGateEscalationSurvivesCheckpointRestore (referenced at design line 122) to verify the map persists through checkpoint round-trips

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md` around lines 87 - 95, Add a FallbackTaken map[string]bool field (with `json:"fallbackTaken"` tag) to the Checkpoint struct in pipeline/checkpoint.go, ensure it's initialized during checkpoint creation/restore (so code that constructs or unmarshals Checkpoint sets cp.FallbackTaken = map[string]bool{} when nil), and update any checkpoint constructors/restore paths to preserve this map across saves so the escalation-loop fix survives restarts; then implement the TestGoalGateEscalationSurvivesCheckpointRestore test to create a Checkpoint, mark a gate in cp.FallbackTaken, serialize/deserialize (or round-trip via the existing checkpoint save/restore helpers) and assert the map value is preserved.

docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md

pipeline/engine_checkpoint.go

Copilot

Pull request overview

This PR targets P0 safety issues in the pipeline engine by preventing silent loss of parallel branch context writes and adding a retry budget + event visibility for goal-gate retry loops.

Changes:

Add PipelineContext.DiffFrom() and use it in the parallel handler to capture side-effect pctx.Set() writes, merging them with explicit Outcome.ContextUpdates.
Introduce goal-gate retry budgeting / fallback routing in goalGateRetryTarget, and emit EventStageRetrying / EventStageFailed events for goal-gate retry behavior.
Add new/updated tests covering context diffing, parallel side-effect capture, and goal-gate retry budget behavior; add design/spec docs.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
pipeline/handlers/parallel.go	Captures side-effect context writes via `DiffFrom(snapshot)` and merges with explicit `ContextUpdates`.
pipeline/handlers/parallel_test.go	Adds tests ensuring side-effect-only writes are captured and explicit updates override side effects.
pipeline/context.go	Adds `PipelineContext.DiffFrom()` for snapshot diffing of context values.
pipeline/context_test.go	Adds unit tests validating `DiffFrom()` behavior for new/changed/unchanged keys and empty cases.
pipeline/engine_checkpoint.go	Updates goal-gate retry target selection to enforce retry budgets and attempt fallback routing.
pipeline/engine_run.go	Wires new goal-gate retry tuple into exit handling; increments retry count and emits retry/failure events.
pipeline/engine_goal_gate_test.go	Introduces tests for goal-gate retry termination, node-level max retries, and fallback behavior.
docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md	Adds design spec describing the issues and intended fixes.
docs/superpowers/plans/2026-03-30-p0-critical-safety-fixes.md	Adds an implementation plan for remaining safety tasks and tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-31T15:30:05Z

pipeline/engine_checkpoint.go

+					return fb, nodeID, true, true
+				}
+			}
+			// No fallback — signal unsatisfied without retry.


In the retries-exhausted path, returning the fallback target with shouldRetry=true causes the engine to treat fallback routing as another retry. Because the goal gate remains unsatisfied, this can create an infinite loop (exit → fallback → exit → fallback …). Consider tracking whether fallback has already been taken (e.g., a FallbackTaken flag in the checkpoint) or returning a distinct "fallback" action so the engine only escalates once and then fails if still unsatisfied.

Suggested change

return fb, nodeID, true, true

}

}

// No fallback — signal unsatisfied without retry.

// Use checkpoint context to ensure we only take fallback once per node,

// and do not treat fallback routing as another generic retry.

if cp.Context == nil {

cp.Context = map[string]string{}

}

fbKey := "fallback_taken:" + nodeID

if cp.Context[fbKey] == "true" {

// Fallback already used; avoid infinite exit→fallback loops.

break

}

cp.Context[fbKey] = "true"

// Fallback is a one-time escalation, not a retry.

return fb, nodeID, false, true

}

}

// No fallback — or fallback already used — signal unsatisfied without retry.

Copilot · 2026-03-31T15:30:06Z

pipeline/engine_run.go

+		s.cp.IncrementRetry(gateNodeID)
+		gateNode := e.nodeOrDefault(gateNodeID)
+		e.emit(PipelineEvent{
+			Type:      EventStageRetrying,
+			Timestamp: time.Now(),
+			RunID:     s.runID,
+			NodeID:    gateNodeID,
+			Message: fmt.Sprintf("goal-gate retry for %q → %q (attempt %d/%d)",
+				gateNodeID, target,
+				s.cp.RetryCount(gateNodeID), e.maxRetries(gateNode)),
+		})


handleExitNode increments retry counts and emits EventStageRetrying whenever goalGateRetryTarget returns retry=true. If retry=true is also used for fallback/escalation routing, this will over-count retries and mislabel fallback as a retry attempt (and contributes to exit↔fallback looping). Suggest returning a separate signal for fallback routing (or only increment/emit when within the retry budget).

Suggested change

s.cp.IncrementRetry(gateNodeID)

gateNode := e.nodeOrDefault(gateNodeID)

e.emit(PipelineEvent{

Type: EventStageRetrying,

Timestamp: time.Now(),

RunID: s.runID,

NodeID: gateNodeID,

Message: fmt.Sprintf("goal-gate retry for %q → %q (attempt %d/%d)",

gateNodeID, target,

s.cp.RetryCount(gateNodeID), e.maxRetries(gateNode)),

})

gateNode := e.nodeOrDefault(gateNodeID)

currentRetries := s.cp.RetryCount(gateNodeID)

maxRetries := e.maxRetries(gateNode)

// Only count and emit a retry event while within the retry budget.

if currentRetries < maxRetries {

s.cp.IncrementRetry(gateNodeID)

e.emit(PipelineEvent{

Type: EventStageRetrying,

Timestamp: time.Now(),

RunID: s.runID,

NodeID: gateNodeID,

Message: fmt.Sprintf("goal-gate retry for %q → %q (attempt %d/%d)",

gateNodeID, target,

s.cp.RetryCount(gateNodeID), maxRetries),

})

}

Copilot · 2026-03-31T15:30:06Z

pipeline/engine_goal_gate_test.go

+	// The escalate node routes to done, so pipeline should succeed eventually.
+	// But done is the exit and work is still unsatisfied — so it depends on
+	// whether clearDownstream clears work. Let's check the result.
+	_ = err
+	_ = result


This test currently discards err/result (via _ = err / _ = result) and doesn't assert the expected engine status. That makes it easy for an infinite goal-gate fallback loop to slip through unnoticed, and it won't validate that fallback routing terminates correctly. Please assert err/result.Status and consider running the engine with a context timeout in this test to prevent hangs.

Copilot · 2026-03-31T15:30:07Z

docs/superpowers/plans/2026-03-30-p0-critical-safety-fixes.md

+
+**Architecture:** Four independent fixes plus one cross-cutting circuit breaker. Each fix is self-contained — commit after each task. The tool_command fix gates both variable expansion systems (`ExpandVariables` and `ExpandGraphVariables`) with a hardcoded tainted-key set rather than runtime taint tracking.
+
+**Tech Stack:** Go 1.24, standard library only. No new dependencies.


The implementation plan states "Go 1.24", but the repository's go.mod currently specifies go 1.25.5. Please update the plan to match the actual toolchain version (or vice versa) so readers don't run the wrong Go version.

Suggested change

**Tech Stack:** Go 1.24, standard library only. No new dependencies.

**Tech Stack:** Go 1.25.5, standard library only. No new dependencies.

…w mode Implements ParseQuestions(markdown string) []Question with support for numbered items (. and ) style), bullet questions, and imperative verb prompts. Extracts inline options from trailing parentheticals and detects yes/no questions by option values or interrogative text patterns. Fenced code blocks are skipped. 20 tests covering all specified cases pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements InterviewAnswer and InterviewResult structs with round-trip JSON serialization and a BuildMarkdownSummary helper for downstream agents. All 9 tests pass covering round-trip, invalid input, skipped answers, canceled flag, and markdown format. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add QuestionsKey, AnswersKey, and Prompt extraction to extractHumanAttrs so interview-mode human nodes can carry their context key config through the IR-to-Graph adapter layer. Also upgrades dippin-lang to v0.15.0-beta.1 which defines these fields on ir.HumanConfig. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rview mode Adds the InterviewInterviewer interface, routes mode=interview nodes to executeInterview(), and implements the full handler: question parsing, last_response fallback, zero-questions freeform degradation, retry pre-fill, custom context keys, and JSON+markdown answer storage. Includes 7 TDD tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements the InterviewInterviewer interface on AutoApproveFreeformInterviewer so --auto-approve mode and tests can drive interview-mode human gates. Picks first option for select questions, "yes" for yes/no, and "auto-approved" for open-ended. Adds compile-time interface assertion and unit test coverage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… forms Implements the TUI component shown when an interview-mode human gate fires. Supports radio select, yes/no toggle, and textarea fields with pagination, pre-fill from previous results, and JSON serialization via replyCh. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add MsgGateInterview message type to tui/messages.go, implement AskInterview on BubbleteaInterviewer (Mode 1 inline program + Mode 2 send-func delegation), and handle MsgGateInterview in app.go handleModalMsg to show the InterviewContent modal. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…implementations Implements InterviewInterviewer on AutopilotInterviewer (native LLM), ClaudeCodeAutopilotInterviewer (claude CLI), and AutopilotTUIInterviewer (TUI wrapper). Adds buildInterviewPrompt and parseInterviewResponse helpers with markdown fence stripping, JSON extraction, and a one-retry-on-parse-fail policy (provider errors hard-fail per CLAUDE.md). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements InterviewInterviewer on ConsoleInterviewer so interview-mode human gates work in non-TUI (piped/testing) contexts. Adds compile-time assertion and three unit tests covering name match, numeric index, and blank-to-skip behavior. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Exercises the full pipeline: context → ParseQuestions → AskInterview → context updates, including zero-questions freeform fallback and retry pre-fill pass-through. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Change "N questions answered" to "N of M questions answered" so skipped questions don't inflate the count. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

12 fixes from four-angle code review: Must-fix: 1. Set Incomplete=true in collectAnswers() when any answer is empty 2. Show "Tab → Add details (optional)" hint on focused select/confirm fields 3. ConsoleInterviewer pre-fill uses ID-based lookup instead of position index 4. Esc in fields navigates back instead of hard-canceling (cancel only at Q1) 5. Cap parsed questions at 100 (maxQuestions) to prevent unbounded allocation 6. Strip markdown emphasis (*bold*, _italic_) from question text labels Should-fix: 7. InterviewContent.Cancel() now closes channel (consistent with all other modals) 8. interviewRunner.Update discards cmd on quit (consistent with other runners) 9. Replace flaky time.Sleep test with blocking channel read 10. Promote default context keys to constants (ContextKeyInterviewQuestions/Answers) 11. Document ClaudeCodeAutopilot silent fallback asymmetry vs native hard-fail 12. Document autopilot prompt injection risk in buildInterviewPrompt Also: replace custom contains/searchString helpers with strings.Contains. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…eview workflow Interview mode UX: - One question at a time with progress bar, answered summary, selection feedback - Enter confirms and advances for select and yes/no fields - Text wraps to terminal width (questions, context, options, summaries) - Untouched select fields report empty answer (not first option) - "Other" variants auto-filtered from options (UI always provides its own) - Cancel returns OutcomeFail so pipelines can route on cancellation Structured output: - ParseStructuredQuestions validates JSON questions from agent output - extractJSON uses bracket-depth counting (not greedy first/last brace) - response_format plumbing: .dip → IR → adapter → codergen → session → LLM API - All three providers supported (Anthropic, OpenAI, Gemini) - dippin-lang v0.16.0 integration (ResponseFormat, ResponseSchema, Params) Reliability: - Empty API response detection with capped retry (2 attempts) and diagnostics - Anthropic adapter logs raw response on 0-token replies - Codergen empty-response guard fails node instead of silent success - AskFreeform now hard-fails on provider errors (was silently swallowing) - ClaudeCode autopilot retry-then-hardfail on parse failure - Goroutine leak fix in flashDecision with done channel - Context leak fix (explicit cancel() in retry loop) - Mode 1 tea.Cmd propagation in all three runners New content: - deep_review built-in workflow (3 interview gates, parallel analysis) - interview-loop.dip reusable subgraph - dippin-lang structured output request doc (resolved) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

CI caught formatting issues that the pre-commit hook missed because the hook was not installed (symlink to .git/hooks/pre-commit was missing). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- parseInterviewResponse fills missing answers with empty entries and sets Incomplete when LLM omits questions (Codex P1) - Console "Other" number (len+1) now prompts for freeform text instead of storing the literal number string (Codex P2) - Document unused prev param in ClaudeCodeAutopilotInterviewer.AskInterview - Detect real terminal size in Mode 1 interview via golang.org/x/term - Add language identifier to markdown code fence in plan doc Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ug logging - Pre-fill sets selected=true so retry answers survive collectAnswers - fieldAnswered checks otherInput is non-empty for Other selections - Cancel sends partial answers (Canceled=true + current state) before closing - parseInterviewResponse builds results in question order, validates options via matchChoice, normalizes yes/no, sets Incomplete for missing answers - Raw response body logging gated behind TRACKER_DEBUG env var Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Config validation: - ResponseFormat must be json_object or json_schema - json_schema requires non-empty valid JSON ResponseSchema Session reliability: - Exhausted empty response retries now hard-fail instead of stoppedNaturally Workflow fixes (deep_review.dip): - SaveGoal tool node preserves original developer goal before interviews - Analysis agents read goal from .ai/review/goal.txt - Empty priority questions allowed when no actionable findings - Revise loop includes ReviewPlan feedback via ${ctx.human_response} Interview UX: - IsYesNo takes precedence over Options in all code paths - Answered summary capped to viewport height - Blank retry input preserves previous answer (console) - Freeform fallback sets answersKey in ContextUpdates - Autopilot includes previous answers in retry prompt Other: - HTML-escape Mermaid arrows in workflows.html - interview-loop.dip prompts include ${ctx.interview_answers} - Added ResponseFormat validation and blank-preserve tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…re-commit CI failed because examples/ had stale copies of the 3 existing workflows and deep_review.dip was missing from examples/ entirely. - Copied embedded workflows to examples/ (source of truth for CI) - Added deep_review.dip to Makefile sync-workflows and check-workflows targets - Added workflow sync check to .pre-commit hook (gate #8) - Added deep_review.dip to dippin lint list in .pre-commit Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…kflow Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add a fallback-taken guard in goalGateRetryTarget using checkpoint context keys to ensure fallback/escalation routing is one-shot. Separate fallback routing from retry logic in handleExitNode so retries aren't over-counted and events are correctly labeled. Add timeout and assertions to TestGoalGateRetryFallsBackToFallbackTarget. Fix Go version in plan doc (1.24→1.25) and add language specifier to fenced code block in design spec. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 57 out of 58 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-03T16:07:00Z

pipeline/engine_checkpoint.go

+		// Check retry budget before allowing another loop.
+		maxR := e.maxRetries(node)
+		if cp.RetryCount(nodeID) >= maxR {
+			// Retries exhausted — look for a fallback/escalation target.
+			// Guard: only take the fallback once per gate to prevent infinite loops.
+			fallbackKey := "_fallback_taken:" + nodeID
+			if cp.Context != nil && cp.Context[fallbackKey] == "true" {
+				// Fallback already taken — signal unsatisfied without retry.
+				return "", nodeID, false, true
+			}
+			for _, fb := range []string{
+				node.Attrs["fallback_target"],
+				node.Attrs["fallback_retry_target"],
+				e.graph.Attrs["fallback_target"],
+				e.graph.Attrs["fallback_retry_target"],
+			} {
+				if fb == "" {
+					continue
+				}
+				if _, ok := e.graph.Nodes[fb]; ok {
+					// Mark fallback as taken so it won't loop.
+					if cp.Context == nil {
+						cp.Context = map[string]string{}
+					}
+					cp.Context[fallbackKey] = "true"
+					return fb, nodeID, false, true


The fallback-taken guard is stored in cp.Context (fallbackKey), but saveCheckpoint() overwrites cp.Context with pctx.Snapshot() on every save. That means the marker will be lost the next time a checkpoint is saved (including immediately after routing), so the engine can still loop between exit and fallback forever. It also pollutes the user-visible context on resume. Consider persisting this state in a dedicated Checkpoint field (e.g., FallbackTaken map[string]bool) or another checkpoint-owned structure that isn’t replaced by snapshotting the pipeline context.

Copilot · 2026-04-03T16:07:00Z

pipeline/handlers/human.go

+		fmt.Fprintf(c.Writer, "\nQ%d: %s\n", q.Index, q.Text)
+
+		if len(q.Options) > 0 {
+			// Print numbered options
+			for j, opt := range q.Options {
+				fmt.Fprintf(c.Writer, "  %d) %s\n", j+1, opt)
+			}
+			fmt.Fprintf(c.Writer, "  %d) Other\n", len(q.Options)+1)
+
+			// Pre-fill hint
+			if prevAns.Answer != "" {
+				fmt.Fprintf(c.Writer, "Previous: %s\n", prevAns.Answer)
+			}
+			fmt.Fprintf(c.Writer, "Enter choice (name or number, blank to skip): ")
+
+			line, err := c.readLine()
+			if err != nil {
+				canceled = true
+				answers[i] = ans
+				// Fill remaining questions with empty answers.
+				for j := i + 1; j < len(questions); j++ {
+					answers[j] = InterviewAnswer{
+						ID:   fmt.Sprintf("q%d", questions[j].Index),
+						Text: questions[j].Text,
+					}
+				}
+				break
+			}
+			input := strings.TrimSpace(line)
+			if input != "" {
+				// Match by name (case-insensitive) or number
+				matched := false
+				for _, opt := range q.Options {
+					if strings.EqualFold(input, opt) {
+						ans.Answer = opt
+						matched = true
+						break
+					}
+				}
+				if !matched {
+					var idx int
+					if _, err := fmt.Sscanf(input, "%d", &idx); err == nil && idx >= 1 && idx <= len(q.Options) {
+						ans.Answer = q.Options[idx-1]
+					} else if _, err := fmt.Sscanf(input, "%d", &idx); err == nil && idx == len(q.Options)+1 {
+						// User selected "Other" by number — prompt for freeform text.
+						fmt.Fprintf(c.Writer, "Enter your answer: ")
+						otherLine, otherErr := c.readLine()
+						if otherErr == nil {
+							ans.Answer = strings.TrimSpace(otherLine)
+						}
+					} else {
+						// Treat as "Other" freeform
+						ans.Answer = input
+					}
+				}
+			} else if prevAns.Answer != "" {
+				// Blank input preserves the previous answer on retry.
+				ans.Answer = prevAns.Answer
+			}
+		} else if q.IsYesNo {
+			if prevAns.Answer != "" {


ConsoleInterviewer.AskInterview checks len(q.Options) before q.IsYesNo, so a yes/no question that has explicit options (e.g. parsed from “(yes, no)”) will be presented as an option list and will also add an “Other” choice. This is inconsistent with the TUI behavior (which prioritizes IsYesNo) and can change semantics. Consider handling q.IsYesNo before options (or special-casing yes/no options) to keep the UX consistent.

Copilot · 2026-04-03T16:07:00Z

cmd/tracker/embed_test.go

 func TestListBuiltinWorkflowsReturnsThree(t *testing.T) {
 	workflows := listBuiltinWorkflows()
-	if len(workflows) != 3 {
-		t.Errorf("listBuiltinWorkflows returned %d workflows, want 3", len(workflows))
+	if len(workflows) != 4 {
+		t.Errorf("listBuiltinWorkflows returned %d workflows, want 4", len(workflows))
 	}


Test name no longer matches behavior: this test now expects 4 workflows but is still named TestListBuiltinWorkflowsReturnsThree. Renaming it will avoid confusion when diagnosing failures.

…priority, rename test - Move fallback-taken state from cp.Context (overwritten by pctx.Snapshot) to dedicated Checkpoint.FallbackTaken map that survives save/restore. - Reorder ConsoleInterviewer.AskInterview to check IsYesNo before len(Options), matching TUI behavior for yes/no questions with explicit option lists. - Rename TestListBuiltinWorkflowsReturnsThree to ReturnsFour to match the current assertion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 14

🧹 Nitpick comments (5)

.pre-commit (1)

94-98: Consider preserving check-workflows output on failure

Line 94 suppresses mismatch details, so developers only get a generic error. Letting output through improves fix speed.

🔧 Minimal tweak

-if make check-workflows > /dev/null 2>&1; then
+if make check-workflows; then
   pass "embedded workflows in sync"
 else
   fail "examples/*.dip out of sync with cmd/tracker/workflows/ — run 'make sync-workflows'"
 fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.pre-commit around lines 94 - 98, The current pre-commit hook hides the
output of the make check-workflows target by redirecting stdout/stderr to
/dev/null, so on failure developers only see a generic message; update the
conditional that runs check-workflows (the make check-workflows invocation in
the if/else block) to not discard its output so that mismatch details are
printed on failure—i.e., remove the ">/dev/null 2>&1" redirection or change the
logic to capture output only on success and echo the captured output when the
command fails.

docs/requests/dippin-lang-structured-output.md (1)

136-141: Add language specifier to fenced code block.

The fenced code block should have a language identifier for consistent formatting and linting compliance. Since these are validation rule identifiers, text or plaintext would be appropriate.

📝 Proposed fix

-```
+```text
 DIP_NEW_001: response_format must be "json_object" or "json_schema"
 DIP_NEW_002: response_schema requires response_format: json_schema
 DIP_NEW_003: response_schema must be valid JSON
 DIP_LINT_NEW: warning if json_schema without response_schema
+```

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/requests/dippin-lang-structured-output.md` around lines 136 - 141, The
fenced code block containing the validation rules (DIP_NEW_001, DIP_NEW_002,
DIP_NEW_003, DIP_LINT_NEW) lacks a language specifier; update the opening
triple-backtick to include a language such as "text" or "plaintext" (e.g.,
change ``` to ```text) so the block is consistently formatted and
linting-compliant.

pipeline/handlers/human_test.go (1)

659-747: Good console interviewer test coverage.

Tests verify:

Multi-question flow with different question types

Selection by numeric index

Blank input skipping behavior

Previous answer preservation on blank input

Consider adding a test for "Other" selection by number (entering the len(options)+1 index) to complete coverage of that code path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/handlers/human_test.go` around lines 659 - 747, Add a test that
covers selecting "Other" by numeric index for ConsoleInterviewer.AskInterview:
create a new test (e.g., TestConsoleInterviewer_AskInterview_OtherByNumber) that
supplies input selecting the options length+1 (e.g., "3\nCustom value\n"),
constructs a ConsoleInterviewer with that Reader/Writer, calls AskInterview with
a Question that has two options, and asserts the returned InterviewResult
contains the freeform "Custom value" as the answer; this ensures the code path
handling the numeric "Other" choice (len(options)+1) is exercised and validated.

pipeline/handlers/human.go (1)

285-299: Duplicate fmt.Sscanf parsing can be consolidated.

The code parses the input twice with fmt.Sscanf (lines 287 and 289). This works but is inefficient and slightly confusing.

♻️ Suggested consolidation

 				if !matched {
 					var idx int
-					if _, err := fmt.Sscanf(input, "%d", &idx); err == nil && idx >= 1 && idx <= len(q.Options) {
-						ans.Answer = q.Options[idx-1]
-					} else if _, err := fmt.Sscanf(input, "%d", &idx); err == nil && idx == len(q.Options)+1 {
-						// User selected "Other" by number — prompt for freeform text.
-						fmt.Fprintf(c.Writer, "Enter your answer: ")
-						otherLine, otherErr := c.readLine()
-						if otherErr == nil {
-							ans.Answer = strings.TrimSpace(otherLine)
+					if _, err := fmt.Sscanf(input, "%d", &idx); err == nil {
+						if idx >= 1 && idx <= len(q.Options) {
+							ans.Answer = q.Options[idx-1]
+						} else if idx == len(q.Options)+1 {
+							// User selected "Other" by number — prompt for freeform text.
+							fmt.Fprintf(c.Writer, "Enter your answer: ")
+							otherLine, otherErr := c.readLine()
+							if otherErr == nil {
+								ans.Answer = strings.TrimSpace(otherLine)
+							}
+						} else {
+							// Invalid index — treat as freeform "Other"
+							ans.Answer = input
 						}
 					} else {
 						// Treat as "Other" freeform
 						ans.Answer = input
 					}
 				}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pipeline/handlers/human.go` around lines 285 - 299, Consolidate the duplicate
fmt.Sscanf calls by parsing input into idx exactly once and reusing the result:
in the branch where !matched, call fmt.Sscanf(input, "%d", &idx) a single time,
check err==nil and then handle three cases (1) idx between 1 and len(q.Options)
-> set ans.Answer = q.Options[idx-1], (2) idx == len(q.Options)+1 -> prompt via
fmt.Fprintf(c.Writer, "Enter your answer: "), call otherLine, otherErr :=
c.readLine() and set ans.Answer = strings.TrimSpace(otherLine) on no error, and
(3) otherwise -> fall back to treating input as freeform ans.Answer = input;
keep variable names matched, input, q.Options, ans.Answer and c.readLine as in
the diff.

tui/interview_content.go (1)

635-636: Potential mismatch between question Index and total count display.

The label uses f.question.Index (which comes from parsed question data) alongside len(ic.fields) for the total. If question indices aren't sequential starting from 1, or if some questions were filtered, this could display confusing progress like "Q5 of 3". Consider using ic.cursor + 1 instead of f.question.Index for consistent "N of M" display.
💡 Suggested fix for consistent progress display
-		qLabel := fmt.Sprintf("Q%d of %d: %s", f.question.Index, len(ic.fields), f.question.Text)
+		qLabel := fmt.Sprintf("Q%d of %d: %s", ic.cursor+1, len(ic.fields), f.question.Text)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tui/interview_content.go` around lines 635 - 636, The Q label currently uses
f.question.Index with len(ic.fields), which can mismatch if indices aren’t
sequential or some questions are filtered; change the label construction to use
the current cursor position (ic.cursor) for the displayed ordinal (use ic.cursor
+ 1) so the progress reads "Q{current} of {len(ic.fields)}" consistently—update
the fmt.Sprintf invocation that builds qLabel (and any related uses of
f.question.Index for progress display) to reference ic.cursor + 1 instead of
f.question.Index.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@agent/config.go`:
- Around line 90-102: The current validation only checks ResponseSchema when
ResponseFormat == "json_schema", allowing a non-empty ResponseSchema to be
accepted when ResponseFormat is empty or "json_object"; add a guard to reject
orphaned ResponseSchema by returning an error if c.ResponseSchema != "" and
c.ResponseFormat != "json_schema". Update the validation block (around the
existing checks for c.ResponseFormat, c.ResponseSchema and json.Valid) to first
check that if c.ResponseSchema is non-empty then c.ResponseFormat must equal
"json_schema", otherwise return a clear error message referencing both
ResponseSchema and ResponseFormat.

In `@cmd/tracker/embed_test.go`:
- Around line 41-45: Rename the test function
TestListBuiltinWorkflowsReturnsThree to TestListBuiltinWorkflowsReturnsFour to
match the assertion expecting 4 workflows; update the function identifier in the
test file so the test name and its expectation (len(workflows) != 4) are
consistent and the test runner reflects the correct behavior.

In `@cmd/tracker/workflows/deep_review.dip`:
- Around line 365-368: Add an inline comment near the `ReviewPlan ->
WriteRemediationPlan` edge (the `revise` edge with `restart: true`) explaining
that the restart will be limited by the workflow's `max_restarts: 20` default to
prevent infinite revision loops; mention that `ReviewPlan` is `mode: freeform`
with exhaustive labeled edges (approve, revise, reject) so no extra failure
handling is required.
- Around line 25-30: The SaveGoal tool's command concatenates the expanded
${ctx.human_response} and runs it via sh -c without escaping, enabling shell
injection; update the expansion/execution path (where ExpandVariables() is used
for tool commands) to shell-escape or quote expanded values before invoking sh
-c (or avoid sh -c and pass arguments directly), and/or apply input validation
on human_response at the source; specifically, modify the code handling tool
command execution (the code that calls ExpandVariables() and then executes the
command via sh -c) to use a safe escaping routine or to build argv arrays so
metacharacters in human_response cannot be interpreted, mirroring the existing
working_dir validation approach in pipeline/handlers/tool.go.

In `@docs/superpowers/plans/2026-03-30-interview-mode.md`:
- Around line 478-483: The test uses a runtime type assertion on a concrete
*AutoApproveFreeformInterviewer (ii, ok := ai.(InterviewInterviewer)), which is
invalid in Go because assertions only work on interface values; replace that
assertion with a compile-time type assignment so the compiler will check that
AutoApproveFreeformInterviewer implements InterviewInterviewer (assign ai to a
variable of type InterviewInterviewer), then call ii.AskInterview(...) once
AskInterview is implemented.

In `@docs/superpowers/plans/2026-03-30-p0-critical-safety-fixes.md`:
- Line 15: The heading "### Task 1: Commit existing parallel context fix (`#20`)
and goal-gate retry fix (`#15`)" jumps from the document-level `#` to `###`
(MD001); change this to a `##` heading or insert an intermediate `##` section
above it so headings progress properly. Update the "Task 1: Commit existing
parallel context fix (`#20`) and goal-gate retry fix (`#15`)" line to use `##` (or
add a new `##` container heading before the `###`) to resolve the MD001 rule.
- Around line 1393-1404: The fenced dependency-graph block that starts with ```
(the tree listing of Task 1 → Task 10) needs an explicit language to satisfy
markdownlint MD040; update the opening fence from ``` to ```text so the block is
fenced as a text code block (leave the block contents unchanged and keep the
closing ```).

In `@examples/subgraphs/interview-loop.dip`:
- Around line 37-38: The current Interviewer output of {"questions": []} still
routes to the Answer node and triggers freeform input; update the graph so that
when Interviewer emits an empty question list it does not continue to Answer:
either add a conditional branch that checks for an empty "questions" array and
routes to a completion/exit node (short-circuiting the normal Answer path), or
change Interviewer to emit an explicit signal (e.g., {"complete": true} plus no
"questions") and alter downstream edges (the transitions from Interviewer to
Answer) to inspect that flag and bypass Answer when present; update all
equivalent places where Interviewer returns empty questions (the other two
branches mentioned) to use the same branching or explicit-complete signal.

In `@llm/anthropic/adapter.go`:
- Around line 118-120: The code currently logs the full raw provider response
via log.Printf when TRACKER_DEBUG is set, which can leak sensitive user/tool
data; update the logging in the anthropic response handling (where respBody is
used and log.Printf is called) to avoid printing cleartext: compute a stable
fingerprint (e.g., SHA-256) of respBody and log the hex prefix plus the byte
length, and if you must include content for diagnostics log only a small
redacted/truncated snippet (e.g., first 100–200 bytes) with sensitive characters
masked or replaced, ensuring the full respBody is never emitted; keep the
TRACKER_DEBUG conditional but replace the direct string interpolation of
respBody with these hashed/truncated diagnostics.

In `@pipeline/engine_checkpoint.go`:
- Around line 195-210: Update the function comment that currently says
fallback_target is only used when retries are exhausted to state that
fallback_target (both node-level and graph-level: node.Attrs["fallback_target"]
and e.graph.Attrs["fallback_target"]) is now included in the retry candidate
list and therefore can act as a retry destination when retries remain as well as
an escalation destination when retries are exhausted; mention that this behavior
is exercised by TestGoalGateFallbackTargetAttributeRecognized and that pipelines
using both retry_target and fallback_target together remain unaffected.

In `@pipeline/engine_goal_gate_test.go`:
- Around line 153-198: The test uses "fallback_target" but should use
"retry_target" on the "work" node to represent the retry hop; update the Attrs
map in TestGoalGateFallbackTargetAttributeRecognized (node ID "work") to set
"retry_target":"repair" instead of "fallback_target", and tighten the assertion
on workAttempts (the workAttempts variable) to require the expected retry count
(e.g., assert workAttempts == 2) so the test fails if retries didn't occur
exactly as intended.
- Around line 146-150: The test currently allows a green exit by checking
result.Status against OutcomeFail OR OutcomeSuccess; change it to require only
OutcomeFail so an engine that escalates but incorrectly exits success fails the
test. Locate the assertion using result.Status and OutcomeSuccess/OutcomeFail in
the failing test (around the terminal-check after escalation), replace the
boolean OR condition with a single check for OutcomeFail, and update the
t.Fatalf message to state "want fail" (or similar) so the intent is clear.

In `@pipeline/handlers/interview_result.go`:
- Around line 74-90: The code treats whitespace-only answers as non-empty;
update both places that check q.Answer to trim whitespace before testing: use
strings.TrimSpace(q.Answer) when deciding to render "(skipped)" in the first
loop (the block that writes Answer/Elaboration via b.WriteString) and likewise
use strings.TrimSpace(q.Answer) when counting answered in the loop that computes
answered/len(r.Questions); ensure strings is imported if not already.

In `@tui/interviewer.go`:
- Around line 227-233: In askMode2Interview, when receiving from the ReplyCh
created for MsgGateInterview, treat a closed channel (ok==false) as a TUI
failure instead of a user cancel: replace the current return of
&handlers.InterviewResult{Canceled: true} with returning a non-nil error that
indicates the TUI closed before replying (for example errors.New or fmt.Errorf),
so callers can distinguish a crashed/closed TUI from an intentional cancel;
update callers of askMode2Interview to handle this error path if necessary.

---

Nitpick comments:
In @.pre-commit:
- Around line 94-98: The current pre-commit hook hides the output of the make
check-workflows target by redirecting stdout/stderr to /dev/null, so on failure
developers only see a generic message; update the conditional that runs
check-workflows (the make check-workflows invocation in the if/else block) to
not discard its output so that mismatch details are printed on failure—i.e.,
remove the ">/dev/null 2>&1" redirection or change the logic to capture output
only on success and echo the captured output when the command fails.

In `@docs/requests/dippin-lang-structured-output.md`:
- Around line 136-141: The fenced code block containing the validation rules
(DIP_NEW_001, DIP_NEW_002, DIP_NEW_003, DIP_LINT_NEW) lacks a language
specifier; update the opening triple-backtick to include a language such as
"text" or "plaintext" (e.g., change ``` to ```text) so the block is consistently
formatted and linting-compliant.

In `@pipeline/handlers/human_test.go`:
- Around line 659-747: Add a test that covers selecting "Other" by numeric index
for ConsoleInterviewer.AskInterview: create a new test (e.g.,
TestConsoleInterviewer_AskInterview_OtherByNumber) that supplies input selecting
the options length+1 (e.g., "3\nCustom value\n"), constructs a
ConsoleInterviewer with that Reader/Writer, calls AskInterview with a Question
that has two options, and asserts the returned InterviewResult contains the
freeform "Custom value" as the answer; this ensures the code path handling the
numeric "Other" choice (len(options)+1) is exercised and validated.

In `@pipeline/handlers/human.go`:
- Around line 285-299: Consolidate the duplicate fmt.Sscanf calls by parsing
input into idx exactly once and reusing the result: in the branch where
!matched, call fmt.Sscanf(input, "%d", &idx) a single time, check err==nil and
then handle three cases (1) idx between 1 and len(q.Options) -> set ans.Answer =
q.Options[idx-1], (2) idx == len(q.Options)+1 -> prompt via
fmt.Fprintf(c.Writer, "Enter your answer: "), call otherLine, otherErr :=
c.readLine() and set ans.Answer = strings.TrimSpace(otherLine) on no error, and
(3) otherwise -> fall back to treating input as freeform ans.Answer = input;
keep variable names matched, input, q.Options, ans.Answer and c.readLine as in
the diff.

In `@tui/interview_content.go`:
- Around line 635-636: The Q label currently uses f.question.Index with
len(ic.fields), which can mismatch if indices aren’t sequential or some
questions are filtered; change the label construction to use the current cursor
position (ic.cursor) for the displayed ordinal (use ic.cursor + 1) so the
progress reads "Q{current} of {len(ic.fields)}" consistently—update the
fmt.Sprintf invocation that builds qLabel (and any related uses of
f.question.Index for progress display) to reference ic.cursor + 1 instead of
f.question.Index.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4090bf60-dd6c-47fa-b9b4-3a9912fcc729

📥 Commits

Reviewing files that changed from the base of the PR and between e7edb0c and b78a4a1.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (55)

.pre-commit
CHANGELOG.md
CLAUDE.md
Makefile
README.md
agent/config.go
agent/config_test.go
agent/session.go
agent/session_run.go
cmd/tracker/embed_test.go
cmd/tracker/workflows/deep_review.dip
docs/requests/dippin-lang-structured-output.md
docs/site/changelog.html
docs/site/workflows.html
docs/superpowers/plans/2026-03-30-interview-mode.md
docs/superpowers/plans/2026-03-30-p0-critical-safety-fixes.md
docs/superpowers/specs/2026-03-30-p0-critical-safety-fixes-design.md
examples/ask_and_execute.dip
examples/build_product.dip
examples/build_product_with_superspec.dip
examples/deep_review.dip
examples/subgraphs/interview-loop.dip
go.mod
llm/anthropic/adapter.go
pipeline/context.go
pipeline/dippin_adapter.go
pipeline/dippin_adapter_test.go
pipeline/engine_checkpoint.go
pipeline/engine_goal_gate_test.go
pipeline/engine_run.go
pipeline/events.go
pipeline/events_test.go
pipeline/handlers/autopilot.go
pipeline/handlers/autopilot_claudecode.go
pipeline/handlers/autopilot_claudecode_test.go
pipeline/handlers/autopilot_test.go
pipeline/handlers/codergen.go
pipeline/handlers/human.go
pipeline/handlers/human_test.go
pipeline/handlers/interview_integration_test.go
pipeline/handlers/interview_parse.go
pipeline/handlers/interview_parse_test.go
pipeline/handlers/interview_result.go
pipeline/handlers/interview_result_test.go
pipeline/handlers/parallel.go
pipeline/handlers/tool_test.go
tui/app.go
tui/autopilot_interviewer.go
tui/autopilot_interviewer_test.go
tui/interview_content.go
tui/interview_content_test.go
tui/interviewer.go
tui/interviewer_test.go
tui/messages.go
tui/modal.go

✅ Files skipped from review due to trivial changes (8)

go.mod
examples/ask_and_execute.dip
pipeline/events_test.go
pipeline/events.go
examples/build_product_with_superspec.dip
examples/build_product.dip
agent/config_test.go
CLAUDE.md

🚧 Files skipped from review as they are similar to previous changes (2)

pipeline/context.go
pipeline/handlers/parallel.go

coderabbitai · 2026-04-03T16:14:17Z

agent/config.go

+	if c.ResponseFormat != "" {
+		if c.ResponseFormat != "json_object" && c.ResponseFormat != "json_schema" {
+			return fmt.Errorf("ResponseFormat must be \"json_object\" or \"json_schema\", got %q", c.ResponseFormat)
+		}
+		if c.ResponseFormat == "json_schema" {
+			if c.ResponseSchema == "" {
+				return fmt.Errorf("ResponseSchema must be non-empty when ResponseFormat is \"json_schema\"")
+			}
+			if !json.Valid([]byte(c.ResponseSchema)) {
+				return fmt.Errorf("ResponseSchema must be valid JSON, got %q", c.ResponseSchema)
+			}
+		}
+	}


⚠️ Potential issue | 🟡 Minor

Reject orphaned ResponseSchema to prevent silent misconfiguration

Line 90 onward validates schema only for json_schema, but a non-empty ResponseSchema is currently accepted when ResponseFormat is unset or json_object. That silently drops intent.

🔧 Suggested validation guard

+ if c.ResponseSchema != "" && c.ResponseFormat != "json_schema" { + return fmt.Errorf("ResponseSchema requires ResponseFormat to be %q", "json_schema") + } if c.ResponseFormat != "" { if c.ResponseFormat != "json_object" && c.ResponseFormat != "json_schema" { return fmt.Errorf("ResponseFormat must be \"json_object\" or \"json_schema\", got %q", c.ResponseFormat) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if c.ResponseFormat != "" {

if c.ResponseFormat != "json_object" && c.ResponseFormat != "json_schema" {

return fmt.Errorf("ResponseFormat must be \"json_object\" or \"json_schema\", got %q", c.ResponseFormat)

}

if c.ResponseFormat == "json_schema" {

if c.ResponseSchema == "" {

return fmt.Errorf("ResponseSchema must be non-empty when ResponseFormat is \"json_schema\"")

}

if !json.Valid([]byte(c.ResponseSchema)) {

return fmt.Errorf("ResponseSchema must be valid JSON, got %q", c.ResponseSchema)

}

}

}

if c.ResponseSchema != "" && c.ResponseFormat != "json_schema" {

return fmt.Errorf("ResponseSchema requires ResponseFormat to be %q", "json_schema")

}

if c.ResponseFormat != "" {

if c.ResponseFormat != "json_object" && c.ResponseFormat != "json_schema" {

return fmt.Errorf("ResponseFormat must be \"json_object\" or \"json_schema\", got %q", c.ResponseFormat)

}

if c.ResponseFormat == "json_schema" {

if c.ResponseSchema == "" {

return fmt.Errorf("ResponseSchema must be non-empty when ResponseFormat is \"json_schema\"")

}

if !json.Valid([]byte(c.ResponseSchema)) {

return fmt.Errorf("ResponseSchema must be valid JSON, got %q", c.ResponseSchema)

}

}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@agent/config.go` around lines 90 - 102, The current validation only checks ResponseSchema when ResponseFormat == "json_schema", allowing a non-empty ResponseSchema to be accepted when ResponseFormat is empty or "json_object"; add a guard to reject orphaned ResponseSchema by returning an error if c.ResponseSchema != "" and c.ResponseFormat != "json_schema". Update the validation block (around the existing checks for c.ResponseFormat, c.ResponseSchema and json.Valid) to first check that if c.ResponseSchema is non-empty then c.ResponseFormat must equal "json_schema", otherwise return a clear error message referencing both ResponseSchema and ResponseFormat.

cmd/tracker/embed_test.go

coderabbitai · 2026-04-03T16:14:17Z

cmd/tracker/workflows/deep_review.dip

+  tool SaveGoal
+    label: "Snapshot the developer's goal"
+    command:
+      mkdir -p .ai/review
+      printf '%s\n' "${ctx.human_response}" > .ai/review/goal.txt
+      echo "Goal saved to .ai/review/goal.txt"


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check how tool command interpolation handles escaping rg -n -B 2 -A 10 'ExpandVariables|interpolat' pipeline/*.go | head -60

Repository: 2389-research/tracker

Length of output: 3057

🏁 Script executed:

# Get the full ExpandVariables implementation cat -n pipeline/expand.go | head -100

Repository: 2389-research/tracker

Length of output: 3574

🏁 Script executed:

# Check how tool commands are executed and if escaping is applied rg -n -B 5 -A 15 'command:' cmd/tracker/workflows/deep_review.dip

Repository: 2389-research/tracker

Length of output: 879

🏁 Script executed:

# Search for shell command execution logic in the pipeline rg -n -B 3 -A 10 'exec|sh|bash|cmd' pipeline/*.go | grep -i 'command\|exec\|shell' | head -40

Repository: 2389-research/tracker

Length of output: 4340

🏁 Script executed:

# Look for any shell-specific escaping or quote handling functions rg -n 'ShellQuote|EscapeShell|shlex|Quote' pipeline/*.go

Repository: 2389-research/tracker

Length of output: 621

🏁 Script executed:

# Search for tool handler implementation rg -n -B 5 -A 20 'type.*Tool.*Handler|func.*Tool.*Execute' pipeline/*.go

Repository: 2389-research/tracker

Length of output: 47

🏁 Script executed:

# Search for where shell commands are executed rg -n -B 3 -A 10 'exec.Command|os.Sh|sh\s+-c' pipeline/*.go

Repository: 2389-research/tracker

Length of output: 47

🏁 Script executed:

# Look for any recent changes or protections added to tool commands git log --oneline --all | grep -i 'tool\|escap\|protect' | head -20

Repository: 2389-research/tracker

Length of output: 1573

🏁 Script executed:

# Show the changes in the key commits git show 830dafc --stat

Repository: 2389-research/tracker

Length of output: 1409

🏁 Script executed:

git show 1f1c98f --stat

Repository: 2389-research/tracker

Length of output: 795

🏁 Script executed:

# Look for tool handler implementation find . -name '*.go' -type f | xargs grep -l 'tool' | grep handler

Repository: 2389-research/tracker

Length of output: 482

🏁 Script executed:

# Search for tool execution more broadly rg -n 'type.*Handler' pipeline/*.go | grep -i tool

Repository: 2389-research/tracker

Length of output: 47

🏁 Script executed:

# Read the tool handler implementation cat -n pipeline/handlers/tool.go

Repository: 2389-research/tracker

Length of output: 5237

Shell command executes expanded variables without escaping — shell injection risk.

The SaveGoal tool expands ${ctx.human_response} via ExpandVariables() and passes the result directly to sh -c. The variable expansion layer applies no shell escaping (see pipeline/expand.go), so shell metacharacters in human_response (e.g., $(rm -rf /), backticks, quotes) will be interpreted by the shell. While the working_dir attribute has metacharacter validation (pipeline/handlers/tool.go lines 64-72), the expanded tool command itself has no such protection. Ensure all expanded variables in tool commands are properly shell-escaped before execution, or constrain input values at the source.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/tracker/workflows/deep_review.dip` around lines 25 - 30, The SaveGoal tool's command concatenates the expanded ${ctx.human_response} and runs it via sh -c without escaping, enabling shell injection; update the expansion/execution path (where ExpandVariables() is used for tool commands) to shell-escape or quote expanded values before invoking sh -c (or avoid sh -c and pass arguments directly), and/or apply input validation on human_response at the source; specifically, modify the code handling tool command execution (the code that calls ExpandVariables() and then executes the command via sh -c) to use a safe escaping routine or to build argv arrays so metacharacters in human_response cannot be interpreted, mirroring the existing working_dir validation approach in pipeline/handlers/tool.go.

coderabbitai · 2026-04-03T16:14:17Z

cmd/tracker/workflows/deep_review.dip

+    WriteRemediationPlan -> ReviewPlan
+    ReviewPlan -> ApplyFormat                label: "approve"
+    ReviewPlan -> WriteRemediationPlan       label: "revise"  restart: true
+    ReviewPlan -> Done                       label: "reject"


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if there's default failure handling in the pipeline engine for nodes without fail edges rg -n -A 5 'OutcomeFail|outcome.*fail' pipeline/engine*.go | head -50

Repository: 2389-research/tracker

Length of output: 2838

🏁 Script executed:

cat -n cmd/tracker/workflows/deep_review.dip | head -400 | tail -100

Repository: 2389-research/tracker

Length of output: 4359

🏁 Script executed:

# Check freeform mode handling and OutcomeFail scenarios rg -n "mode.*freeform|freeform.*outcome" pipeline/ cmd/tracker/

Repository: 2389-research/tracker

Length of output: 1540

🏁 Script executed:

# Look for max_restarts defaults rg -n "max_restarts|defaults" cmd/tracker/workflows/deep_review.dip

Repository: 2389-research/tracker

Length of output: 100

🏁 Script executed:

sed -n '380,440p' pipeline/handlers/human.go

Repository: 2389-research/tracker

Length of output: 2341

🏁 Script executed:

sed -n '440,480p' pipeline/handlers/human.go

Repository: 2389-research/tracker

Length of output: 1376

Add comment documenting the restart limit for the revision loop.

The revise edge loops back to WriteRemediationPlan with restart: true. The max_restarts: 20 default provides a safety limit to prevent infinite revision loops — consider adding a comment to document this behavior explicitly.

Note: ReviewPlan is a mode: freeform node with labeled edges (approve, revise, reject) that exhaustively route all responses, so no additional failure handling is needed.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/tracker/workflows/deep_review.dip` around lines 365 - 368, Add an inline comment near the `ReviewPlan -> WriteRemediationPlan` edge (the `revise` edge with `restart: true`) explaining that the restart will be limited by the workflow's `max_restarts: 20` default to prevent infinite revision loops; mention that `ReviewPlan` is `mode: freeform` with exhaustive labeled edges (approve, revise, reject) so no extra failure handling is required.

coderabbitai · 2026-04-03T16:14:17Z

docs/superpowers/plans/2026-03-30-interview-mode.md

+func TestAutoApproveInterviewInterviewer(t *testing.T) {
+    ai := &AutoApproveFreeformInterviewer{}
+    // Type-assert to InterviewInterviewer should work
+    ii, ok := ai.(InterviewInterviewer) // Will fail until implemented
+    result, err := ii.AskInterview(questions, nil)
+    // All answers empty, incomplete=false, canceled=false


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n docs/superpowers/plans/2026-03-30-interview-mode.md | sed -n '475,490p'

Repository: 2389-research/tracker

Length of output: 652

🌐 Web query:

Go type assertion concrete type vs interface type rules

💡 Result:

In Go, type assertions x.(T) on an interface value x have different rules depending on whether the target type T is a concrete (non-interface) type or an interface type. For concrete T (not an interface): - Asserts that the dynamic type of x is identical to T. - T must implement x's static interface type (compile-time check); otherwise invalid. - If true, returns the stored value with static type T; else panics (unless comma-ok form used). For interface T: - Asserts that the dynamic type of x implements the interface T. - Succeeds if x's dynamic type satisfies T's type set. - Returns the value wrapped in interface T. Examples: Concrete: var x any = 7; i := x.(int) // succeeds, i is int Interface: type Fooer interface{Foo}; type Thing int; func(t Thing)Foo{}; var y any = Thing(3); f := y.(Fooer) // succeeds Always prefer comma-ok form v, ok := x.(T) for safety to avoid panics. Sources: Go spec (6,13,25), Boldly Go analyses (12,20), GeeksforGeeks/Rednafi (2,3).

Citations:

1: https://go.dev/ref/spec

2: https://boldlygo.tech/archive/2023-12-11-type-assertion-values

3: https://go.dev/doc/go1.17_spec

4: https://boldlygo.tech/archive/2023-12-06-type-assertions-of-non-interface-types/

5: https://www.geeksforgeeks.org/go-type-assertion-vs-type-switches/

6: https://rednafi.com/go/type-assertion-vs-type-switches/

7: https://yourbasic.org/golang/type-assertion-switch

This interface-check example is invalid Go.

Line 481 uses a type assertion on a concrete *AutoApproveFreeformInterviewer type, but Go type assertions only work on interface values. This code will never compile, even after AskInterview is implemented. Replace with an assignment to perform the compile-time interface check idiomatically:

var ii InterviewInterviewer = ai // compile-time check once implemented

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/plans/2026-03-30-interview-mode.md` around lines 478 - 483, The test uses a runtime type assertion on a concrete *AutoApproveFreeformInterviewer (ii, ok := ai.(InterviewInterviewer)), which is invalid in Go because assertions only work on interface values; replace that assertion with a compile-time type assignment so the compiler will check that AutoApproveFreeformInterviewer implements InterviewInterviewer (assign ai to a variable of type InterviewInterviewer), then call ii.AskInterview(...) once AskInterview is implemented.

pipeline/engine_checkpoint.go

pipeline/engine_goal_gate_test.go

coderabbitai · 2026-04-03T16:14:18Z

pipeline/engine_goal_gate_test.go

+func TestGoalGateFallbackTargetAttributeRecognized(t *testing.T) {
+	// Verify that "fallback_target" (not just "fallback_retry_target") is recognized
+	// as a retry target when retries remain.
+	g := NewGraph("goal-gate-attr")
+
+	g.AddNode(&Node{ID: "start", Shape: "Mdiamond"})
+	g.AddNode(&Node{ID: "work", Shape: "box", Attrs: map[string]string{
+		"goal_gate":       "true",
+		"fallback_target": "repair",
+	}})
+	g.AddNode(&Node{ID: "repair", Shape: "box"})
+	g.AddNode(&Node{ID: "done", Shape: "Msquare"})
+
+	g.AddEdge(&Edge{From: "start", To: "work"})
+	g.AddEdge(&Edge{From: "work", To: "done", Condition: "ctx.outcome = success"})
+	g.AddEdge(&Edge{From: "work", To: "done", Condition: "ctx.outcome = fail"})
+	g.AddEdge(&Edge{From: "repair", To: "work"})
+
+	reg := newTestRegistry()
+	workAttempts := 0
+	reg.Register(&testHandler{
+		name: "codergen",
+		executeFn: func(ctx context.Context, node *Node, pctx *PipelineContext) (Outcome, error) {
+			if node.ID == "work" {
+				workAttempts++
+				if workAttempts == 1 {
+					return Outcome{Status: OutcomeFail}, nil
+				}
+				return Outcome{Status: OutcomeSuccess}, nil
+			}
+			return Outcome{Status: OutcomeSuccess}, nil
+		},
+	})
+
+	engine := NewEngine(g, reg)
+	result, err := engine.Run(context.Background())
+	if err != nil {
+		t.Fatalf("engine.Run failed: %v", err)
+	}
+	if result.Status != OutcomeSuccess {
+		t.Fatalf("status = %q, want %q", result.Status, OutcomeSuccess)
+	}
+	if workAttempts < 2 {
+		t.Fatalf("work ran %d times, expected at least 2", workAttempts)
+	}
+}


⚠️ Potential issue | 🟠 Major

Use retry_target for the repair hop in this test.

This test currently treats fallback_target as the normal retry path, but the PR contract describes fallback as the route taken after retries are exhausted. The workAttempts < 2 assertion on Line 195 is also too weak; extra loops still pass.

💡 Suggested fix

-func TestGoalGateFallbackTargetAttributeRecognized(t *testing.T) { - // Verify that "fallback_target" (not just "fallback_retry_target") is recognized - // as a retry target when retries remain. +func TestGoalGateRetryTargetAttributeRecognized(t *testing.T) { + // Verify that "retry_target" is used while retries remain. g := NewGraph("goal-gate-attr") @@ g.AddNode(&Node{ID: "work", Shape: "box", Attrs: map[string]string{ - "goal_gate": "true", - "fallback_target": "repair", + "goal_gate": "true", + "retry_target": "repair", }}) @@ - if workAttempts < 2 { - t.Fatalf("work ran %d times, expected at least 2", workAttempts) + if workAttempts != 2 { + t.Fatalf("work ran %d times, want 2", workAttempts) } }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pipeline/engine_goal_gate_test.go` around lines 153 - 198, The test uses "fallback_target" but should use "retry_target" on the "work" node to represent the retry hop; update the Attrs map in TestGoalGateFallbackTargetAttributeRecognized (node ID "work") to set "retry_target":"repair" instead of "fallback_target", and tighten the assertion on workAttempts (the workAttempts variable) to require the expected retry count (e.g., assert workAttempts == 2) so the test fails if retries didn't occur exactly as intended.

pipeline/handlers/interview_result.go

coderabbitai · 2026-04-03T16:14:18Z

tui/interviewer.go

+func (b *BubbleteaInterviewer) askMode2Interview(questions []handlers.Question, prev *handlers.InterviewResult) (*handlers.InterviewResult, error) {
+	ch := make(chan string, 1)
+	b.send(MsgGateInterview{Questions: questions, Previous: prev, ReplyCh: ch})
+	reply, ok := <-ch
+	if !ok {
+		return &handlers.InterviewResult{Canceled: true}, nil
+	}


⚠️ Potential issue | 🟠 Major

Treat a closed interview reply channel as a TUI failure, not a cancel.

The existing choice/freeform gates return an error when the TUI program closes before replying. Returning Canceled: true here makes a crashed TUI or broken modal wiring indistinguishable from an intentional user cancel.

💡 Suggested fix

func (b *BubbleteaInterviewer) askMode2Interview(questions []handlers.Question, prev *handlers.InterviewResult) (*handlers.InterviewResult, error) { ch := make(chan string, 1) b.send(MsgGateInterview{Questions: questions, Previous: prev, ReplyCh: ch}) reply, ok := <-ch if !ok { - return &handlers.InterviewResult{Canceled: true}, nil + return nil, fmt.Errorf("TUI program closed before responding to interview gate") } result, err := handlers.DeserializeInterviewResult(reply)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func (b *BubbleteaInterviewer) askMode2Interview(questions []handlers.Question, prev *handlers.InterviewResult) (*handlers.InterviewResult, error) {

ch := make(chan string, 1)

b.send(MsgGateInterview{Questions: questions, Previous: prev, ReplyCh: ch})

reply, ok := <-ch

if !ok {

return &handlers.InterviewResult{Canceled: true}, nil

}

func (b *BubbleteaInterviewer) askMode2Interview(questions []handlers.Question, prev *handlers.InterviewResult) (*handlers.InterviewResult, error) {

ch := make(chan string, 1)

b.send(MsgGateInterview{Questions: questions, Previous: prev, ReplyCh: ch})

reply, ok := <-ch

if !ok {

return nil, fmt.Errorf("TUI program closed before responding to interview gate")

}

result, err := handlers.DeserializeInterviewResult(reply)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tui/interviewer.go` around lines 227 - 233, In askMode2Interview, when receiving from the ReplyCh created for MsgGateInterview, treat a closed channel (ok==false) as a TUI failure instead of a user cancel: replace the current return of &handlers.InterviewResult{Canceled: true} with returning a non-nil error that indicates the TUI closed before replying (for example errors.New or fmt.Errorf), so callers can distinguish a crashed/closed TUI from an intentional cancel; update callers of askMode2Interview to handle this error path if necessary.

- Resolve merge conflict in human.go (keep IsYesNo-first ordering) - Move FallbackTaken guard to dedicated Checkpoint field (persists through saveCheckpoint which overwrites cp.Context with pctx.Snapshot) - Update goalGateRetryTarget doc to describe fallback_target dual role - Tighten fallback test assertion to require OutcomeFail only - Truncate debug response logging to 200 bytes (avoid leaking payloads) - Treat whitespace-only interview answers as skipped - Fix markdown heading level jump (MD001) and code fence language (MD040) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

clintecker and others added 2 commits March 30, 2026 16:16

chatgpt-codex-connector bot reviewed Mar 31, 2026

View reviewed changes

clintecker requested a review from Copilot March 31, 2026 15:23

Copilot started reviewing on behalf of clintecker March 31, 2026 15:23 View session

coderabbitai bot requested changes Mar 31, 2026

View reviewed changes

Copilot AI reviewed Mar 31, 2026

View reviewed changes

clintecker and others added 20 commits April 3, 2026 10:57

fix: show accurate answered-count in interview markdown summary

c503664

Change "N questions answered" to "N of M questions answered" so skipped questions don't inflate the count. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

style: gofmt formatting on 5 files

ce05aa9

CI caught formatting issues that the pre-commit hook missed because the hook was not installed (symlink to .git/hooks/pre-commit was missing). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

release: v0.14.0 — interview mode, structured output, deep_review wor…

763f03b

…kflow Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

clintecker requested a review from Copilot April 3, 2026 16:00

Copilot started reviewing on behalf of clintecker April 3, 2026 16:00 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

coderabbitai bot requested changes Apr 3, 2026

View reviewed changes

clintecker merged commit 24255f1 into main Apr 3, 2026
1 of 2 checks passed

This was referenced Apr 3, 2026

fix(pipeline): goal-gate retry has no termination bound #15

Closed

fix(pipeline): parallel branch context updates lost on fan-in #20

Closed

This was referenced Apr 4, 2026

feat: add ACP (Agent Client Protocol) as third backend #51

Open

security: tool_command sandboxing — 4 defense-in-depth layers #56

Merged

-					return fb, nodeID, true, true
-				}
-			}
-			// No fallback — signal unsatisfied without retry.
+					// Use checkpoint context to ensure we only take fallback once per node,
+					// and do not treat fallback routing as another generic retry.
+					if cp.Context == nil {
+						cp.Context = map[string]string{}
+					}
+					fbKey := "fallback_taken:" + nodeID
+					if cp.Context[fbKey] == "true" {
+						// Fallback already used; avoid infinite exit→fallback loops.
+						break
+					}
+					cp.Context[fbKey] = "true"
+					// Fallback is a one-time escalation, not a retry.
+					return fb, nodeID, false, true
+				}
+			}
+			// No fallback — or fallback already used — signal unsatisfied without retry.


		Architecture: Four independent fixes plus one cross-cutting circuit breaker. Each fix is self-contained — commit after each task. The tool_command fix gates both variable expansion systems (`ExpandVariables` and `ExpandGraphVariables`) with a hardcoded tainted-key set rather than runtime taint tracking.

		Tech Stack: Go 1.24, standard library only. No new dependencies.

	Tech Stack: Go 1.24, standard library only. No new dependencies.
	Tech Stack: Go 1.25.5, standard library only. No new dependencies.

Conversation

clintecker commented Mar 31, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

clintecker commented Mar 31, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 31, 2026 •

edited

Loading