Skip to content

feat: Claude Code execution backend for pipeline nodes#41

Merged
clintecker merged 26 commits intomainfrom
feat/claude-code-backend
Mar 27, 2026
Merged

feat: Claude Code execution backend for pipeline nodes#41
clintecker merged 26 commits intomainfrom
feat/claude-code-backend

Conversation

@harperreed
Copy link
Copy Markdown
Member

@harperreed harperreed commented Mar 27, 2026

Summary

Adds a pluggable AgentBackend abstraction to tracker's pipeline engine, with a Claude Code backend that delegates agent nodes to the claude CLI. No external SDK dependency — the backend is ~400 lines of internal Go that spawns claude as a subprocess and parses NDJSON.

Two activation paths:

  • Per-node: backend: "claude-code" in .dip files (requires upstream dippin-lang IR change)
  • Global override: tracker --backend claude-code pipeline.dip — all agent nodes run via Claude Code

Why: Pipeline nodes get access to MCP servers, Claude Code skills/hooks, CLAUDE.md awareness, and the full Claude Code tool suite — capabilities unreachable by tracker's 6 built-in tools.

Architecture

One handler (codergen), multiple backends. The handler does prompt resolution, config building, artifact writing, outcome parsing. The backend just runs the agent session.

CodergenHandler.Execute()
  ├─ ResolvePrompt() (shared)
  ├─ selectBackend() → NativeBackend or ClaudeCodeBackend
  ├─ backend.Run(ctx, config, emit)
  └─ Build Outcome from SessionResult

Key design decisions

  • AgentBackend interface emits agent.Event directly and returns agent.SessionResult — no new types, no translation layer
  • Extra any field on AgentRunConfig carries backend-specific config (*ClaudeCodeConfig or *agent.SessionConfig)
  • Zero external deps — internal NDJSON parser, subprocess management with process group isolation
  • Concurrency-safe — per-run state is local, struct is stateless
  • Config validation — malformed MCP servers, invalid permission modes, conflicting tool lists all fail fast

New files

File Purpose
pipeline/backend.go AgentBackend interface, config types, MCP parsing, tool list validation
pipeline/handlers/prompt.go Shared ResolvePrompt (extracted from codergen)
pipeline/handlers/backend_native.go Wraps existing agent.Session
pipeline/handlers/backend_claudecode.go Spawns claude CLI, parses NDJSON, emits events

Subprocess management

  • Process group isolation (Setpgid)
  • SIGTERM then SIGKILL (5s grace period for MCP server cleanup)
  • Minimal subprocess env (PATH, HOME, TERM, CLAUDE_API_KEY only)
  • Stderr-based error classification with priority ordering
  • context.WithTimeout from node config

Review notes

  • The spec went through 3 rounds of review (pedantic panel, expert panel x2) before implementation
  • Implementation went through a 7-reviewer expert panel (Systems Architect, Go Engineer, Security, Reliability, DX, Spec Compliance, Steve Yegge)
  • All critical and important findings were fixed
  • The dippin adapter passthrough (extractAgentBackendAttrs) is wired to nil — it's a placeholder pending an upstream ir.AgentConfig.Params field in dippin-lang
  • The --backend claude-code CLI path works today; per-node .dip declaration requires the upstream change

Test plan

  • All 14 packages pass (go test ./...)
  • Config type tests: PermissionMode validation, MCP server parsing (8 edge cases), tool list parsing/validation
  • NDJSON parsing tests: text, reasoning, tool_use, tool_result, result, unknown type, multi-content, JSON object input
  • Error classification tests: auth, rate limit, budget, network, OOM, success, unknown, false-positive resistance
  • CLI arg construction tests: basic, with optionals, disallowed tools, no extra config, system prompt without ClaudeCodeConfig, invalid permission mode
  • NativeBackend tests: event emission, result fields, config propagation, defaults
  • ResolvePrompt parity tests: graph variables, context injection, missing prompt, goal expansion, fidelity
  • Integration test with real claude CLI (requires CLI installed, //go:build integration)
  • Manual smoke test: tracker --backend claude-code --no-tui <pipeline>

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added --backend CLI flag to specify agent execution backend ("native" or "claude-code")
    • Implemented Claude Code backend for pipeline execution with support for MCP server attachments, permission modes, tool allow/deny lists, and budget configuration
  • Documentation

    • Added design specification and implementation plan for Claude Code backend integration

harperreed and others added 14 commits March 26, 2026 06:59
Spec for two new activation paths:
1. Per-node: provider: "claude-code" in .dip files
2. Global: --backend claude-code CLI flag

Uses partio-io/claude-agent-sdk-go to delegate agent nodes to Claude
Code CLI sessions with full MCP, tools, and ecosystem access.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Major revisions based on feedback from 5-reviewer panel:

- Drop third-party SDK dependency (Dependency Hawk): write internal
  ~300 line wrapper instead of depending on 21-day-old single-maintainer repo
- Backend platform abstraction (Yegge): AgentBackend interface so
  adding backend #3 is config, not code. One handler, multiple backends.
- No registry aliasing needed: backend selection inside codergen handler
- No build tags: zero external deps, always compile
- Concrete error mapping: exit codes, stderr patterns, outcome rules
- Subprocess lifecycle: process group kill, orphan prevention, concurrency
- Honest observability: document what's degraded, show "—" not zeros
- MCP server parsing rules with all edge cases
- Backend selection precedence: node > flag > default
- Platform constraints: Linux/macOS only, CLI version check

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Key changes from 7-reviewer panel:
- Use agent.Event and agent.SessionResult directly (no new types)
- Separate backend: attr from provider: (no semantic overload)
- Split AgentRunConfig common + Extra any for backend-specific config
- SIGTERM then SIGKILL (graceful shutdown for MCP connections)
- Minimal subprocess env (prevent credential leakage)
- Specific stderr patterns with priority order for error classification
- Explicit goroutine contract for NDJSON streaming
- Pipeline-level budget accumulator
- Document dippin-lang prerequisite for .dip activation path
- Permission mode typed enum, default fullAuto for headless use
- Lazy CLI version check to avoid startup latency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8 tasks with dependency graph:
1. AgentBackend interface + config types
2. Extract shared ResolvePrompt
3. NativeBackend (wraps agent.Session)
4. Refactor CodergenHandler to use backends
5. ClaudeCodeBackend (CLI subprocess, NDJSON)
6. --backend CLI flag
7. Dippin adapter support
8. Integration test + rebuild

Tasks 1-3 are parallelizable. Task 4 is the key refactoring.
Task 5 is the new capability. Tasks 6-8 wire it up.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduces AgentBackend interface for pluggable execution backends,
AgentRunConfig for common session config, ClaudeCodeConfig for
Claude Code-specific settings, PermissionMode validation, MCP server
parsing, and tool list validation helpers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extract the private resolvePrompt method from CodergenHandler into a
public ResolvePrompt function in prompt.go. The codergen handler now
delegates to the shared function. This enables other backends (e.g.
ClaudeCodeBackend) to reuse the same prompt resolution logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements the AgentBackend interface using the built-in agent.Session.
Translates AgentRunConfig fields (model, provider, max_turns, system_prompt,
working_dir) into SessionConfig and forwards events via the emit callback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CodergenHandler now selects and delegates to an AgentBackend for session
execution instead of creating agent.Session directly. Adds selectBackend()
with node-attr and global-default resolution, buildRunConfig() to translate
node config into AgentRunConfig, and WithDefaultBackend registry option.

NativeBackend accepts full SessionConfig via Extra field to preserve all
config fields (reasoning effort, caching, compaction, etc.).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements pipeline.AgentBackend by spawning the claude CLI as a
subprocess, parsing NDJSON stream output, and emitting agent.Event
objects. Includes error classification, minimal env construction,
MCP server config, and two-phase process shutdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds --backend CLI flag to select the agent execution backend.
Valid values are "" (default/native), "native", and "claude-code".
Invalid values produce a clear error message and exit 1.
Threads the backend value through to handlers.WithDefaultBackend in
both run and runTUI registry construction paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add extractAgentBackendAttrs helper to dippin_adapter.go that maps
backend-selection and Claude Code-specific keys (backend, mcp_servers,
allowed_tools, disallowed_tools, max_budget_usd, permission_mode) from
a generic params map into node attrs consumed by CodergenHandler and
ClaudeCodeBackend.

The helper is called from extractAgentAttrs with nil for now. When
dippin-lang upstream adds Params map[string]string to ir.AgentConfig,
replace nil with cfg.Params to activate full .dip file passthrough.

Also bump dippin-lang to v0.9.1 (no new AgentConfig fields in this
release; confirms upstream IR change is still pending).

PARTIAL: full .dip backend attr support requires upstream dippin-lang
to add Params or dedicated fields to ir.AgentConfig.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move toolUseIDs and lastResult from struct fields to local runState,
eliminating race conditions when Run() is called concurrently.

Also: fix system prompt gated behind ccCfg nil check, change Input to
json.RawMessage, log recovered panics, emit EventLLMRequestPreparing
for system messages, apply timeout from config, validate PermissionMode,
improve "CLI not found" error message, and pass through common env vars.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
selectBackend now returns (AgentBackend, error) and lazy-inits
ClaudeCodeBackend when backend=claude-code is set on a node or via
--backend flag. buildRunConfig builds ClaudeCodeConfig from node attrs
(mcp_servers, allowed_tools, disallowed_tools, max_budget_usd,
permission_mode) when the selected backend is claude-code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e, env scope

- buildClaudeCodeConfig returns error on malformed attrs (mcp_servers,
  max_budget_usd, permission_mode) instead of silently ignoring
- ValidateToolLists called to reject conflicting allow/deny lists
- Default PermissionMode set to fullAuto for headless pipeline use
- System NDJSON message emits both EventLLMRequestPreparing and
  EventTurnStart so TUI transitions from "waiting" to "thinking"
- Reverted SSH_AUTH_SOCK and proxy vars from buildEnv — keep minimal
  env, pass credentials via node-level config if needed
- buildRunConfig returns error for config validation propagation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@harperreed harperreed requested a review from clintecker March 27, 2026 13:40
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This pull request introduces a pluggable backend abstraction for agent execution in pipeline nodes. It adds support for executing nodes via either the native agent session or an external Claude Code CLI, implements both backends, adds a --backend CLI flag to select execution mode, refactors the handler to route through the abstraction, and provides comprehensive test coverage and design documentation.

Changes

Cohort / File(s) Summary
CLI Backend Selection
cmd/tracker/commands.go, cmd/tracker/flags.go, cmd/tracker/main.go, cmd/tracker/main_test.go, cmd/tracker/run.go
Added --backend CLI flag with validation (accepts "native", "claude-code", or empty default). Updated command dependency signatures to thread backend parameter through run() and runTUI() functions. Added tests for flag parsing and backend passthrough behavior.
Backend Abstraction & Native Implementation
pipeline/backend.go, pipeline/backend_test.go, pipeline/handlers/backend_native.go, pipeline/handlers/backend_native_test.go
Introduced AgentBackend interface with Run() method and supporting structures (AgentRunConfig, ClaudeCodeConfig, PermissionMode). Implemented NativeBackend wrapper around existing agent.Session. Added parsers for MCP servers, tool lists, and validation helpers. Comprehensive test coverage for configuration parsing, validation, and backend behavior.
Claude Code Backend Implementation
pipeline/handlers/backend_claudecode.go, pipeline/handlers/backend_claudecode_test.go
Implemented ClaudeCodeBackend that spawns the claude CLI, decodes NDJSON output, translates message types to agent.Event values, and maps exit codes/stderr patterns to session outcomes. Handles tool calls, MCP servers, permission modes, and budget configuration. Extensive test suite covering argument assembly, NDJSON parsing, error classification, and integration scenarios.
Handler & Prompt Refactoring
pipeline/handlers/codergen.go, pipeline/handlers/codergen_test.go, pipeline/handlers/prompt.go, pipeline/handlers/prompt_test.go, pipeline/handlers/registry.go, pipeline/handlers/transcript.go
Refactored CodergenHandler to select and dispatch to backends via pipeline.AgentBackend abstraction. Extracted prompt resolution logic into reusable ResolvePrompt() function. Added transcriptCollector for event-to-text conversion and buildSessionStats() helper. Extended NewDefaultRegistry() to initialize backends. New registry option WithDefaultBackend() for backend selection.
Graph Processing Updates
pipeline/dippin_adapter.go, pipeline/dippin_adapter_test.go, pipeline/dippin_adapter_edges.go
Moved implicit edge synthesis and start/exit node validation into separate dippin_adapter_edges.go file. Added backend attribute extraction via extractAgentBackendAttrs() placeholder. New tests validate parallel fan-out/fan-in edge synthesis, node attribute extraction for agent backend configuration, and enforced start/exit node shapes.
Documentation
docs/superpowers/plans/2026-03-26-claude-code-backend.md, docs/superpowers/specs/2026-03-26-claude-code-backend-design.md
Added implementation plan and design specification for Claude Code backend, covering architecture, task breakdown, test strategy, CLI integration, subprocess lifecycle, error classification, and MCP/tool configuration.
Miscellaneous
.gitignore
Added .claude/settings.local.json to ignore list.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI Flag Parser
    participant Cmd as executeCommand
    participant Run as run()/runTUI()
    participant Registry as Handler Registry
    participant CodergenH as CodergenHandler
    participant Selector as selectBackend
    participant Backend as AgentBackend (Native or ClaudeCode)

    CLI->>Cmd: cfg.backend = "claude-code"
    Cmd->>Run: backend="claude-code"
    Run->>Registry: NewDefaultRegistry(WithDefaultBackend("claude-code"))
    Registry->>CodergenH: defaultBackendName = "claude-code"
    Registry-->>Run: CodergenHandler ready
    
    Run->>CodergenH: Execute(ctx, node, ...)
    CodergenH->>Selector: selectBackend(node attrs, defaultBackendName)
    Selector-->>CodergenH: ClaudeCodeBackend
    CodergenH->>CodergenH: buildRunConfig(AgentRunConfig, ClaudeCodeConfig)
    CodergenH->>Backend: Run(ctx, runCfg, emitCallback)
    
    alt Backend = ClaudeCode
        Backend->>Backend: spawn claude CLI process
        Backend->>Backend: decode NDJSON from stdout
        Backend->>Backend: emit events via callback
        Backend->>Backend: classify exit code/stderr
        Backend-->>CodergenH: SessionResult + error
    else Backend = Native
        Backend->>Backend: create agent.Session
        Backend->>Backend: sess.Run(ctx, prompt)
        Backend-->>CodergenH: SessionResult + error
    end
    
    CodergenH->>CodergenH: handle result, construct outcome
    CodergenH-->>Run: Result
    Run-->>CLI: Success/Error
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 Two paths now shimmer on the executor's way,
A native hop, or Claude's bright CLI sway.
The handler picks its steed with backends in the fold,
Each message flows through pipelines of configurable gold!
Abstract and concrete dance hand-in-paw today. 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.79% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: introducing a Claude Code execution backend for pipeline nodes, which is the central focus across all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/claude-code-backend

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (9)
pipeline/dippin_adapter_test.go (1)

885-913: Test name suggests integration, but it’s helper-level simulation.

Optional: rename this test to reflect its true scope and reduce future confusion.

📝 Suggested rename
-func TestFromDippinIR_AgentBackendAttrsViaParams(t *testing.T) {
+func TestExtractAgentBackendAttrs_SimulatedIRParams(t *testing.T) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/dippin_adapter_test.go` around lines 885 - 913, Rename the test to
reflect that it's a unit/helper-level check of extractAgentBackendAttrs rather
than an integration: change TestFromDippinIR_AgentBackendAttrsViaParams to a
clearer name such as TestExtractAgentBackendAttrs_ViaParams (or
TestExtractAgentBackendAttrs_SimulatedParams), update the test function name
accordingly, and adjust the top comment to mention it's directly exercising
extractAgentBackendAttrs with simulated params to avoid future confusion.
cmd/tracker/main_test.go (1)

238-245: Consider asserting backend passthrough in this run-path test.

The new argument is added, but not validated. Adding one assertion here will protect the new CLI→run wiring from regressions.

🧪 Suggested test tightening
 	err := executeCommand(runConfig{
 		mode:         modeRun,
 		pipelineFile: "pipeline.dot",
 		workdir:      "/tmp/workdir",
 		noTUI:        true,
+		backend:      "claude-code",
 	}, commandDeps{
 ...
 		run: func(pipelineFile, workdir, checkpoint, format, backend string, verbose bool, jsonOut bool) error {
 			runCalled = true
 			if pipelineFile != "pipeline.dot" {
 				t.Fatalf("pipelineFile = %q, want %q", pipelineFile, "pipeline.dot")
 			}
+			if backend != "claude-code" {
+				t.Fatalf("backend = %q, want %q", backend, "claude-code")
+			}
 			return nil
 		},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/tracker/main_test.go` around lines 238 - 245, The test's anonymous run
handler doesn't assert the new backend parameter; update the test to validate
backend passthrough by adding an expectedBackend variable (e.g., expectedBackend
:= "some-backend") and inside the run function assert backend == expectedBackend
(use t.Fatalf to fail with actual vs expected), and mirror the same assertion in
the runTUI stub if that path should also exercise backend; reference the
anonymous run and runTUI functions and the runCalled variable when making this
change.
docs/superpowers/specs/2026-03-26-claude-code-backend-design.md (3)

148-148: Add language specifier to CLI invocation block.

-```
+```text
 claude -p "<prompt>"

As per static analysis hints: "Fenced code blocks should have a language specified (MD040)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/specs/2026-03-26-claude-code-backend-design.md` at line 148,
Add a language specifier to the fenced code block containing the CLI invocation
"claude -p \"<prompt>\"" so the markdown linter MD040 is satisfied; replace the
opening fence ``` with a language-tagged fence such as ```bash or ```text in the
block that wraps the claude -p "<prompt>" example.

105-105: Add language specifier to fenced code block.

The flow diagram code block lacks a language specifier. Consider using text or plaintext to satisfy markdown linting.

-```
+```text
 .dip file → node attrs → CodergenHandler.Execute()

As per static analysis hints: "Fenced code blocks should have a language specified (MD040)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/specs/2026-03-26-claude-code-backend-design.md` at line 105,
The fenced code block for the flow diagram is missing a language specifier which
triggers MD040; update the opening fence from ``` to include a language (e.g.,
change the opening fence to ```text or ```plaintext) so the block containing
".dip file → node attrs → CodergenHandler.Execute()" is recognized as a
plaintext/code block by the linter.

19-19: Minor: Hyphenate compound modifier.

"~400 line wrapper" should be "~400-line wrapper" when used as a compound modifier before a noun.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/specs/2026-03-26-claude-code-backend-design.md` at line 19,
Update the phrase "~400 line wrapper" to use the compound modifier form
"~400-line wrapper" so the sentence reads "Write an internal ~400-line wrapper."
Locate the exact string "~400 line wrapper" in the spec text and replace it with
"~400-line wrapper" to correct the hyphenation.
pipeline/handlers/backend_native_test.go (1)

94-124: Test exercises fallback path, not the primary production path.

This test verifies config propagation when cfg.Extra is nil, exercising the selective override logic in NativeBackend.Run (lines 39-53 of backend_native.go). However, based on the context snippet from codergen.go:168, CodergenHandler.buildRunConfig() always sets cfg.Extra = &sessionCfg for the native backend. This means:

  1. The tested code path (building config from individual AgentRunConfig fields) is effectively dead code in production
  2. The test provides value for direct NativeBackend usage outside of CodergenHandler, but doesn't test the actual production path

Consider adding a test that passes cfg.Extra with a *agent.SessionConfig to verify the primary code path used by CodergenHandler.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/handlers/backend_native_test.go` around lines 94 - 124, Add a new
unit test that exercises the production path where cfg.Extra is populated with a
*agent.SessionConfig: copy/TestNativeBackendRespectsConfig and create a
sessionCfg := agent.SessionConfig{Model: "gpt-4o", Provider: "openai",
SystemPrompt: "be concise", ...} then set cfg.Extra = &sessionCfg before calling
NativeBackend.Run (the same modelCapturingCompleter can be reused) and assert
that capturedModel equals sessionCfg.Model; this verifies the code path in
NativeBackend.Run that reads overrides from cfg.Extra (used by
CodergenHandler.buildRunConfig).
pipeline/handlers/backend_native.go (1)

28-54: The fallback config-building branch (lines 37-54) is dead code in production.

Based on codergen.go:168, CodergenHandler.buildRunConfig() always sets cfg.Extra = &sessionCfg when the backend is not ClaudeCodeBackend. This means the else branch that builds config from individual AgentRunConfig fields is never executed in the normal pipeline flow.

Options:

  1. Keep for flexibility: The fallback enables direct NativeBackend usage without CodergenHandler, which may be useful for testing or future use cases
  2. Remove: Simplify by always requiring *agent.SessionConfig in Extra
  3. Document: Add a comment clarifying when this path is used

If keeping, consider documenting this in the else branch:

 	} else {
+		// Fallback for direct NativeBackend usage without CodergenHandler.
+		// CodergenHandler always provides cfg.Extra with *agent.SessionConfig.
 		sessionCfg = agent.DefaultConfig()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/handlers/backend_native.go` around lines 28 - 54, The else branch in
NativeBackend.Run that builds sessionCfg from individual AgentRunConfig fields
is dead in normal pipeline flow because CodergenHandler.buildRunConfig always
sets cfg.Extra to a *agent.SessionConfig; update this code by keeping the
fallback but adding a clear comment above the else branch (near
NativeBackend.Run, cfg.Extra and agent.SessionConfig references) stating that
CodergenHandler.buildRunConfig normally populates Extra, and that this branch is
only used when NativeBackend is invoked directly (e.g., tests or special
callers) — alternatively, if you prefer to remove it, enforce that cfg.Extra
must be a *agent.SessionConfig and return an error when absent to simplify
behavior.
pipeline/handlers/codergen.go (1)

94-111: Consider documenting the "codergen" alias for "native" backend.

Line 102 accepts "codergen" as an alias for "native", but this isn't mentioned in the design spec's backend selection precedence section. Either:

  1. Document this alias in the spec and user-facing docs
  2. Remove it if it's not intended behavior
 		case "claude-code":
 			return h.ensureClaudeCodeBackend()
-		case "native", "codergen":
+		case "native":
 			return h.ensureNativeBackend(), nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/handlers/codergen.go` around lines 94 - 111, The selectBackend
function accepts "codergen" as an alias for the "native" backend but this alias
isn't in the spec; either document the alias or remove it—update selectBackend
in CodergenHandler to remove the "codergen" case (leaving only "native") if the
alias is unintended, or add documentation to the design spec and user-facing
docs stating that "codergen" maps to ensureNativeBackend (referencing the
"selectBackend" function and the symbols "codergen", "native", and method
ensureNativeBackend()).
pipeline/handlers/backend_claudecode.go (1)

63-149: SessionResult is sparsely populated compared to NativeBackend.

Looking at agent.SessionResult (context snippet 2) and how NativeBackend populates it (context snippet 4), the ClaudeCodeBackend leaves several fields at zero values:

  • SessionID - not set
  • Duration - not tracked (easy to add with time.Since)
  • MaxTurnsUsed, LoopDetected, ContextUtilization - not derived from Claude CLI output
  • FilesModified, FilesCreated - not tracked

While some fields may not be available from the Claude CLI's NDJSON output, Duration can be easily tracked. Consider adding it for consistency with NativeBackend and for observability.

♻️ Suggested fix to track Duration
 func (b *ClaudeCodeBackend) Run(ctx context.Context, cfg pipeline.AgentRunConfig, emit func(agent.Event)) (agent.SessionResult, error) {
+	start := time.Now()
 	// Per-run state: local to this invocation, safe for concurrent use.
 	state := &runState{
 		toolUseIDs: make(map[string]string),
 	}

Then in storeResult:

 func storeResult(msg ndjsonMessage, state *runState) {
 	result := &agent.SessionResult{
 		Turns:     msg.Turns,
 		ToolCalls: make(map[string]int),
 	}

And update buildResult to set Duration:

-func buildResult(state *runState) agent.SessionResult {
+func buildResult(state *runState, duration time.Duration) agent.SessionResult {
 	if state.lastResult != nil {
+		state.lastResult.Duration = duration
 		return *state.lastResult
 	}
-	return agent.SessionResult{}
+	return agent.SessionResult{Duration: duration}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/handlers/backend_claudecode.go` around lines 63 - 149, The Run
implementation doesn't populate SessionResult.Duration; record start :=
time.Now() at the top of ClaudeCodeBackend.Run (after runState creation),
compute dur := time.Since(start) after the process completes (before returning)
and pass that duration into buildResult (either by adding a duration parameter
to buildResult or by storing it on runState and reading it in buildResult).
Update buildResult to set the SessionResult.Duration field from the provided
duration/runState value so the backend matches NativeBackend's observability.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/tracker/flags.go`:
- Line 115: The flag help text for the backend flag (where
fs.StringVar(&cfg.backend, "backend", "", ... ) is defined) only mentions
"claude-code" but the validator also accepts "native"; update the help string(s)
(both the occurrence tied to cfg.backend and the other occurrence noted) to list
all accepted values (e.g., "Agent backend override: native, claude-code") or use
a generic description that enumerates both allowed backends so the help message
matches the validator.

In `@docs/superpowers/plans/2026-03-26-claude-code-backend.md`:
- Line 7: The phrase "~400 line internal package" is missing a hyphen for the
compound modifier; update the documentation text (the line describing the
"Claude Code backend" and the agent backend architecture) to read "~400-line
internal package" so the compound adjective is grammatically correct.
- Around line 615-627: The fenced code block containing the dependency graph
(the ASCII diagram starting with "Task 1 (interface + config types)" and showing
arrows through Task 8) needs a language specifier for correct rendering; update
the opening triple-backtick to include a language like "text" or "plaintext"
(e.g., replace ``` with ```text) so the diagram renders as plain text. Ensure
only the opening fence is changed and no diagram content is altered.

In `@pipeline/handlers/codergen.go`:
- Around line 113-135: The lazy-init in ensureClaudeCodeBackend and
ensureNativeBackend is racy; protect their check-then-set with sync.Once. Add
per-backend sync.Once fields (e.g., claudeCodeOnce, nativeOnce) to
CodergenHandler and use Once.Do in ensureClaudeCodeBackend (preserving its error
return) and ensureNativeBackend to perform the
NewClaudeCodeBackend/NewNativeBackend construction and assignment inside the Do
closures so the backend is only created once even under concurrent Execute()
calls.

---

Nitpick comments:
In `@cmd/tracker/main_test.go`:
- Around line 238-245: The test's anonymous run handler doesn't assert the new
backend parameter; update the test to validate backend passthrough by adding an
expectedBackend variable (e.g., expectedBackend := "some-backend") and inside
the run function assert backend == expectedBackend (use t.Fatalf to fail with
actual vs expected), and mirror the same assertion in the runTUI stub if that
path should also exercise backend; reference the anonymous run and runTUI
functions and the runCalled variable when making this change.

In `@docs/superpowers/specs/2026-03-26-claude-code-backend-design.md`:
- Line 148: Add a language specifier to the fenced code block containing the CLI
invocation "claude -p \"<prompt>\"" so the markdown linter MD040 is satisfied;
replace the opening fence ``` with a language-tagged fence such as ```bash or
```text in the block that wraps the claude -p "<prompt>" example.
- Line 105: The fenced code block for the flow diagram is missing a language
specifier which triggers MD040; update the opening fence from ``` to include a
language (e.g., change the opening fence to ```text or ```plaintext) so the
block containing ".dip file → node attrs → CodergenHandler.Execute()" is
recognized as a plaintext/code block by the linter.
- Line 19: Update the phrase "~400 line wrapper" to use the compound modifier
form "~400-line wrapper" so the sentence reads "Write an internal ~400-line
wrapper." Locate the exact string "~400 line wrapper" in the spec text and
replace it with "~400-line wrapper" to correct the hyphenation.

In `@pipeline/dippin_adapter_test.go`:
- Around line 885-913: Rename the test to reflect that it's a unit/helper-level
check of extractAgentBackendAttrs rather than an integration: change
TestFromDippinIR_AgentBackendAttrsViaParams to a clearer name such as
TestExtractAgentBackendAttrs_ViaParams (or
TestExtractAgentBackendAttrs_SimulatedParams), update the test function name
accordingly, and adjust the top comment to mention it's directly exercising
extractAgentBackendAttrs with simulated params to avoid future confusion.

In `@pipeline/handlers/backend_claudecode.go`:
- Around line 63-149: The Run implementation doesn't populate
SessionResult.Duration; record start := time.Now() at the top of
ClaudeCodeBackend.Run (after runState creation), compute dur :=
time.Since(start) after the process completes (before returning) and pass that
duration into buildResult (either by adding a duration parameter to buildResult
or by storing it on runState and reading it in buildResult). Update buildResult
to set the SessionResult.Duration field from the provided duration/runState
value so the backend matches NativeBackend's observability.

In `@pipeline/handlers/backend_native_test.go`:
- Around line 94-124: Add a new unit test that exercises the production path
where cfg.Extra is populated with a *agent.SessionConfig:
copy/TestNativeBackendRespectsConfig and create a sessionCfg :=
agent.SessionConfig{Model: "gpt-4o", Provider: "openai", SystemPrompt: "be
concise", ...} then set cfg.Extra = &sessionCfg before calling NativeBackend.Run
(the same modelCapturingCompleter can be reused) and assert that capturedModel
equals sessionCfg.Model; this verifies the code path in NativeBackend.Run that
reads overrides from cfg.Extra (used by CodergenHandler.buildRunConfig).

In `@pipeline/handlers/backend_native.go`:
- Around line 28-54: The else branch in NativeBackend.Run that builds sessionCfg
from individual AgentRunConfig fields is dead in normal pipeline flow because
CodergenHandler.buildRunConfig always sets cfg.Extra to a *agent.SessionConfig;
update this code by keeping the fallback but adding a clear comment above the
else branch (near NativeBackend.Run, cfg.Extra and agent.SessionConfig
references) stating that CodergenHandler.buildRunConfig normally populates
Extra, and that this branch is only used when NativeBackend is invoked directly
(e.g., tests or special callers) — alternatively, if you prefer to remove it,
enforce that cfg.Extra must be a *agent.SessionConfig and return an error when
absent to simplify behavior.

In `@pipeline/handlers/codergen.go`:
- Around line 94-111: The selectBackend function accepts "codergen" as an alias
for the "native" backend but this alias isn't in the spec; either document the
alias or remove it—update selectBackend in CodergenHandler to remove the
"codergen" case (leaving only "native") if the alias is unintended, or add
documentation to the design spec and user-facing docs stating that "codergen"
maps to ensureNativeBackend (referencing the "selectBackend" function and the
symbols "codergen", "native", and method ensureNativeBackend()).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8fcfecfe-ecd6-474e-aac6-2031415b87db

📥 Commits

Reviewing files that changed from the base of the PR and between 0d7c424 and caa7997.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (20)
  • cmd/tracker/commands.go
  • cmd/tracker/flags.go
  • cmd/tracker/main.go
  • cmd/tracker/main_test.go
  • cmd/tracker/run.go
  • docs/superpowers/plans/2026-03-26-claude-code-backend.md
  • docs/superpowers/specs/2026-03-26-claude-code-backend-design.md
  • go.mod
  • pipeline/backend.go
  • pipeline/backend_test.go
  • pipeline/dippin_adapter.go
  • pipeline/dippin_adapter_test.go
  • pipeline/handlers/backend_claudecode.go
  • pipeline/handlers/backend_claudecode_test.go
  • pipeline/handlers/backend_native.go
  • pipeline/handlers/backend_native_test.go
  • pipeline/handlers/codergen.go
  • pipeline/handlers/prompt.go
  • pipeline/handlers/prompt_test.go
  • pipeline/handlers/registry.go

# Conflicts:
#	cmd/tracker/flags.go
#	cmd/tracker/main.go
#	go.mod
#	go.sum
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
cmd/tracker/main_test.go (1)

1-14: Consider adding test coverage for the --backend flag.

Other CLI flags like --autopilot, --auto-approve, and --json have dedicated test cases (e.g., TestParseFlagsAutopilot, TestParseFlagsJsonFlag). Adding similar tests for --backend would verify:

  1. Valid values are accepted (native, claude-code)
  2. Invalid values produce the expected error
  3. The value is correctly threaded to the run callbacks

Example test cases to consider:

func TestParseFlagsBackendValid(t *testing.T) {
	for _, backend := range []string{"native", "claude-code"} {
		cfg, err := parseFlags([]string{"tracker", "--backend", backend, "pipeline.dip"})
		if err != nil {
			t.Fatalf("parseFlags returned error for backend %q: %v", backend, err)
		}
		if cfg.backend != backend {
			t.Fatalf("backend = %q, want %q", cfg.backend, backend)
		}
	}
}

func TestParseFlagsBackendInvalid(t *testing.T) {
	_, err := parseFlags([]string{"tracker", "--backend", "invalid", "pipeline.dip"})
	if err == nil {
		t.Fatal("expected error for invalid backend")
	}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/tracker/main_test.go` around lines 1 - 14, Add tests covering the
--backend flag by exercising parseFlags and the run callbacks: create one test
(e.g., TestParseFlagsBackendValid) that calls parseFlags with each valid backend
string ("native", "claude-code") and asserts no error and that cfg.backend
equals the provided value, and another test (e.g., TestParseFlagsBackendInvalid)
that calls parseFlags with an invalid backend and asserts an error is returned;
also add a test that ensures the parsed cfg.backend is propagated into the run
callbacks or the function that invokes handlers (reference parseFlags,
cfg.backend, and the run callback entry point used in tests) so the selected
backend is threaded through.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cmd/tracker/main_test.go`:
- Around line 1-14: Add tests covering the --backend flag by exercising
parseFlags and the run callbacks: create one test (e.g.,
TestParseFlagsBackendValid) that calls parseFlags with each valid backend string
("native", "claude-code") and asserts no error and that cfg.backend equals the
provided value, and another test (e.g., TestParseFlagsBackendInvalid) that calls
parseFlags with an invalid backend and asserts an error is returned; also add a
test that ensures the parsed cfg.backend is propagated into the run callbacks or
the function that invokes handlers (reference parseFlags, cfg.backend, and the
run callback entry point used in tests) so the selected backend is threaded
through.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 33f8b7a5-1860-4acf-8d2c-fef06d1f72e1

📥 Commits

Reviewing files that changed from the base of the PR and between caa7997 and 959c920.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (6)
  • .claude/settings.local.json
  • cmd/tracker/commands.go
  • cmd/tracker/flags.go
  • cmd/tracker/main.go
  • cmd/tracker/main_test.go
  • cmd/tracker/run.go
✅ Files skipped from review due to trivial changes (1)
  • .claude/settings.local.json
🚧 Files skipped from review as they are similar to previous changes (2)
  • cmd/tracker/commands.go
  • cmd/tracker/main.go

CodeRabbit review findings:
- ensureClaudeCodeBackend and ensureNativeBackend now use sync.Once
  for thread-safe lazy init during parallel node execution
- Help text for --backend shows both valid values (native, claude-code)
- Minor grammar and code block fixes in plan doc

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
clintecker and others added 2 commits March 27, 2026 10:30
Extract transcriptCollector and buildSessionStats to transcript.go
(codergen.go was 536 lines). Extract implicit edge synthesis functions
to dippin_adapter_edges.go (dippin_adapter.go was 532 lines). Split
buildClaudeCodeConfig into parseClaudeCodeToolAttrs and
parseClaudeCodeBudgetAttrs (cyclomatic was 16).

All complexity gates now pass. Coverage at 78.9% (below 80% threshold)
is a pre-existing issue from the backend feature — needs a small test
to cover the extracted edge synthesis functions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
pipeline/dippin_adapter.go (1)

223-229: Placeholder call is a no-op by design.

The call extractAgentBackendAttrs(nil, attrs) will never copy any keys since iterating over a nil map yields zero iterations. The comment clearly documents this as intentional scaffolding pending the upstream cfg.Params field addition.

Consider adding a TODO comment to make this easier to find later:

 	// NOTE: cfg.Params is a placeholder for a future field on ir.AgentConfig.
 	// When dippin-lang upstream adds Params map[string]string to AgentConfig,
 	// replace nil below with cfg.Params.
+	// TODO(dippin-lang): Replace nil with cfg.Params once available.
 	extractAgentBackendAttrs(nil, attrs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/dippin_adapter.go` around lines 223 - 229, The call
extractAgentBackendAttrs(nil, attrs) is intentionally a no-op now but should be
annotated for future work; add a clear TODO comment directly above or beside the
call mentioning that nil is a placeholder until ir.AgentConfig gains Params
(e.g., "TODO: replace nil with cfg.Params when ir.AgentConfig.Params is added —
see extractAgentBackendAttrs"), so developers can easily find and update the
call in the future.
pipeline/dippin_adapter_edges.go (1)

32-43: Last-write-wins for duplicate fan-in sources.

If multiple fan-in nodes list the same source in their cfg.Sources, the last one processed will overwrite the mapping. This is likely acceptable for valid workflows (where a source should only feed one fan-in), but worth noting.

Also, this function has the same pointer type handling gap (*ir.FanInConfig not checked) as the main switch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/dippin_adapter_edges.go` around lines 32 - 43, buildFanInSourceMap
currently overwrites entries when multiple fan-in nodes share a source and
ignores pointer-typed configs; update buildFanInSourceMap to handle both
ir.FanInConfig and *ir.FanInConfig type assertions when iterating workflow.Nodes
(check for cfg, ok := irNode.Config.(ir.FanInConfig) and cfgPtr, ok :=
irNode.Config.(*ir.FanInConfig)), and add duplicate detection before setting
fanInBySource[source] (e.g., if an entry already exists and the existing ID
differs from irNode.ID, either log/return an error or otherwise surface the
conflict instead of silently overwriting) so duplicate sources are handled
explicitly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pipeline/dippin_adapter_edges.go`:
- Around line 88-97: The current logic only sets startNode.Handler and
exitNode.Handler when the Shape differs, which leaves Handler incorrect if Shape
already matches; update the block that handles startNode and exitNode so Shape
is still set conditionally (keep the if startNode.Shape != "Mdiamond" and if
exitNode.Shape != "Msquare") but always assign startNode.Handler = "start" and
exitNode.Handler = "exit" unconditionally after retrieving the nodes (i.e.,
ensure Handler is set regardless of the Shape check), referencing startNode,
exitNode, g.Nodes, g.StartNode and g.ExitNode to locate and modify the code.
- Around line 22-29: The type switch in the loop currently only matches value
types and should also handle pointer variants to mirror extractNodeAttrs'
defensive handling; update the switch in the loop that calls
synthesizeParallelEdges and synthesizeFanInEdges to include cases for
*ir.ParallelConfig and *ir.FanInConfig (dereference or pass through as your
functions expect), and likewise update buildFanInSourceMap to check for both
ir.FanInConfig and *ir.FanInConfig so pointer configs are recognized; use the
existing function names synthesizeParallelEdges, synthesizeFanInEdges,
buildFanInSourceMap and keep behavior consistent with extractNodeAttrs when
unwrapping pointer vs value.

In `@pipeline/handlers/transcript.go`:
- Around line 62-79: The doc comment for buildSessionStats is misleading (it
claims it can return nil when sessResult is nil but the parameter is a value),
so update the comment to remove the "Returns nil" claim or change the signature
to accept *agent.SessionResult if nil semantics are required; meanwhile,
defensively clone slice fields FilesModified and FilesCreated (like ToolCalls is
cloned) before assigning them into the pipeline.SessionStats (e.g. copy via
append([]string(nil), r.FilesModified...) or make+copy) to avoid leaking backing
arrays.

---

Nitpick comments:
In `@pipeline/dippin_adapter_edges.go`:
- Around line 32-43: buildFanInSourceMap currently overwrites entries when
multiple fan-in nodes share a source and ignores pointer-typed configs; update
buildFanInSourceMap to handle both ir.FanInConfig and *ir.FanInConfig type
assertions when iterating workflow.Nodes (check for cfg, ok :=
irNode.Config.(ir.FanInConfig) and cfgPtr, ok :=
irNode.Config.(*ir.FanInConfig)), and add duplicate detection before setting
fanInBySource[source] (e.g., if an entry already exists and the existing ID
differs from irNode.ID, either log/return an error or otherwise surface the
conflict instead of silently overwriting) so duplicate sources are handled
explicitly.

In `@pipeline/dippin_adapter.go`:
- Around line 223-229: The call extractAgentBackendAttrs(nil, attrs) is
intentionally a no-op now but should be annotated for future work; add a clear
TODO comment directly above or beside the call mentioning that nil is a
placeholder until ir.AgentConfig gains Params (e.g., "TODO: replace nil with
cfg.Params when ir.AgentConfig.Params is added — see extractAgentBackendAttrs"),
so developers can easily find and update the call in the future.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 74da7284-0139-47e1-8f57-19e93b008ef6

📥 Commits

Reviewing files that changed from the base of the PR and between 6864ead and 2e3d0b9.

📒 Files selected for processing (4)
  • pipeline/dippin_adapter.go
  • pipeline/dippin_adapter_edges.go
  • pipeline/handlers/codergen.go
  • pipeline/handlers/transcript.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pipeline/handlers/codergen.go

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a pluggable pipeline.AgentBackend abstraction for codergen/agent pipeline nodes and adds a new claude-code backend that runs sessions via the claude CLI (NDJSON stream), selectable per-node or via a global --backend CLI flag.

Changes:

  • Added pipeline.AgentBackend + shared AgentRunConfig/Claude Code config parsing/validation utilities.
  • Refactored CodergenHandler to select a backend (native vs claude-code) and delegate execution while keeping shared prompt resolution/artifact/outcome logic.
  • Added CLI wiring (--backend) and supporting adapter/docs/tests.

Reviewed changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
pipeline/handlers/transcript.go Extracts transcript collection + session stats building from codergen.go.
pipeline/handlers/registry.go Adds WithDefaultBackend and injects default backend into codergen handler.
pipeline/handlers/prompt.go Extracts shared ResolvePrompt for prompt resolution + fidelity/context injection.
pipeline/handlers/prompt_test.go Unit tests for ResolvePrompt behavior (expansion, context injection, fidelity).
pipeline/handlers/codergen.go Refactors codergen execution to choose/run an AgentBackend; adds Claude Code config parsing.
pipeline/handlers/backend_native.go Implements AgentBackend using existing agent.Session.
pipeline/handlers/backend_native_test.go Tests for NativeBackend event emission, config propagation, defaults.
pipeline/handlers/backend_claudecode.go Implements Claude Code backend via subprocess + NDJSON parsing + error classification.
pipeline/handlers/backend_claudecode_test.go Tests for CLI arg/env building, NDJSON parsing, error classification, MCP JSON.
pipeline/backend.go Defines AgentBackend, AgentRunConfig, ClaudeCodeConfig, MCP/tool validation helpers.
pipeline/backend_test.go Tests permission mode validation, MCP parsing, tool list parsing/validation.
pipeline/dippin_adapter.go Adds placeholder passthrough helper for backend/Claude Code node attrs (pending upstream IR changes).
pipeline/dippin_adapter_edges.go Moves implicit edge synthesis/start-exit validation helpers out of adapter.
pipeline/dippin_adapter_test.go Adds unit tests for backend attr passthrough helper behavior.
cmd/tracker/run.go Threads backend selection into registry construction in console + TUI run paths.
cmd/tracker/main.go Extends runConfig and dependency signatures to include backend selection.
cmd/tracker/main_test.go Updates command dependency stubs for the new backend parameter.
cmd/tracker/flags.go Adds --backend flag + validation and usage output.
cmd/tracker/commands.go Passes backend selection to run / runTUI.
go.sum Updates dippin-lang dependency checksums (v0.10.0).
docs/superpowers/specs/2026-03-26-claude-code-backend-design.md Design spec for backend architecture + Claude Code execution model.
docs/superpowers/plans/2026-03-26-claude-code-backend.md Detailed implementation plan and testing checklist.
.claude/settings.local.json Adds Claude Code local permissions settings (see security feedback).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +226 to +234
// Pass through env vars needed for basic operation. Credentials
// (SSH_AUTH_SOCK) and network config (proxy vars) are intentionally
// excluded — pass them via node-level env attrs if needed.
for _, name := range []string{
"USER", "TMPDIR", "LANG",
} {
if val := os.Getenv(name); val != "" {
env = append(env, name+"="+val)
}
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says proxy/SSH vars can be passed via "node-level env attrs", but there doesn't appear to be any mechanism for agent nodes to declare extra environment variables (searching the codebase shows no env-related attrs for codergen/claude-code). Either implement an explicit env passthrough for this backend or adjust the comment to avoid implying unsupported configuration.

Copilot uses AI. Check for mistakes.
case "claude-code":
return h.ensureClaudeCodeBackend()
case "native", "codergen":
return h.ensureNativeBackend(), nil
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a node sets backend to an unrecognized value, this function silently falls back to the default backend. That makes typos hard to detect and can cause pipelines to run with the wrong execution engine. Consider returning an explicit error for unknown backend values (similar to the --backend flag validation) so misconfiguration fails fast.

Suggested change
return h.ensureNativeBackend(), nil
return h.ensureNativeBackend(), nil
default:
return nil, fmt.Errorf("unknown backend %q for node %q", backend, node.Name)

Copilot uses AI. Check for mistakes.
}

// buildSessionStats converts an agent.SessionResult into a pipeline.SessionStats
// for inclusion in the trace entry. Returns nil if sessResult is nil.
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says "Returns nil if sessResult is nil", but buildSessionStats takes an agent.SessionResult value (not a pointer) so it can never be nil. Update the comment (or change the function signature if nil is a meaningful case) to avoid misleading callers.

Suggested change
// for inclusion in the trace entry. Returns nil if sessResult is nil.
// for inclusion in the trace entry.

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +8
"Bash(ssh aibox01:*)",
"WebSearch"
]
}
}
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be a machine-local Claude Code permissions file (note the filename settings.local.json) and includes an environment-specific SSH allow-rule. Committing this is risky (it can unintentionally broaden tool permissions for other developers/CI and may leak internal hostnames). Recommend removing it from the repo and adding .claude/settings.local.json to .gitignore; if shared defaults are needed, use a non-"local" settings file with safe, minimal permissions.

Suggested change
"Bash(ssh aibox01:*)",
"WebSearch"
]
}
}
"WebSearch"
]
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +326 to +333
case "tool_use":
state.toolUseIDs[c.ToolUseID] = c.Name
events = append(events, agent.Event{
Type: agent.EventToolCallStart,
Timestamp: now,
ToolName: c.Name,
ToolInput: string(c.Input),
})
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseAssistantContent emits tool call events but never increments SessionResult.ToolCalls. As a result sessResult.TotalToolCalls() stays 0, which can incorrectly trip codergen's "produced no output (0 tokens, 0 tool calls)" failure path for tool-only runs and also makes trace stats inaccurate. Consider incrementing counts in runState on each tool_use and copying them into SessionResult.ToolCalls (or updating state.lastResult as tool_use events arrive).

Copilot uses AI. Check for mistakes.
clintecker and others added 2 commits March 27, 2026 11:01
1. selectBackend now rejects unknown backend names with actionable error
2. buildEnv passes SSH_AUTH_SOCK and SSH_AGENT_PID for git-over-SSH
3. classifyError uses precise "throttled"/"throttling" match, excludes "unthrottled"
4. PermissionMode("") is no longer valid — callers must set an explicit mode
5. Removed .claude/settings.local.json from repo, added to .gitignore
6. extractAgentBackendAttrs(nil, attrs) documented as placeholder with warning infra
7. buildMCPServersJSON returns (string, error) instead of falling back to "{}"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ralph 1 (data integrity):
- storeResult populates ToolCalls from toolUseIDs (was always 0)
- NDJSON decode errors tracked + surfaced when no result received
- parseMessage unmarshal failures counted in decodeErrors
- buildResult returns error when no "result" message received
- buildSessionStats docstring corrected

Ralph 2 (safety — already on branch via direct commit):
- selectBackend returns error for unknown backend names
- buildEnv passes SSH_AUTH_SOCK + SSH_AGENT_PID
- classifyError uses precise throttle matching
- PermissionMode("") no longer valid
- buildMCPServersJSON propagates errors
- .claude/settings.local.json removed + gitignored

Ralph 3 (tests):
- CLI flag tests for --backend (native, claude-code, invalid)
- Backend passthrough assertion in executeCommand test
- Edge synthesis + ensureStartExitNodes tests for coverage

Complexity: extracted decodeNDJSON + collectResult from Run()
to bring cyclomatic/cognitive under thresholds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
pipeline/dippin_adapter_test.go (3)

951-999: Test name should reflect what is actually validated.

This test currently verifies standard ir.AgentConfig extraction (prompt, llm_model, etc.), not backend-param passthrough. Renaming it (or expanding it once IR params exist) will reduce confusion and overlap with TestFromDippinIR_AgentConfig.

✏️ Suggested rename
-func TestExtractAgentBackendAttrs_SimulatedIRParams(t *testing.T) {
+func TestExtractAgentAttrs_SelectedFields(t *testing.T) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/dippin_adapter_test.go` around lines 951 - 999, Rename the test
TestExtractAgentBackendAttrs_SimulatedIRParams to accurately reflect that it
validates standard ir.AgentConfig extraction (e.g.,
TestExtractAgentConfig_SimulatedIRParams or
TestFromDippinIR_AgentConfig_Simulated); update the test function name and any
references to it so it no longer implies backend-param passthrough, and ensure
the test body (which calls FromDippinIR and asserts node.Attrs for prompt,
llm_model, llm_provider, system_prompt, reasoning_effort, max_turns) remains
unchanged.

858-907: Consider asserting all overlapping synthesized edges remain unique.

Right now only dispatch -> branch_a duplication is checked. Adding checks for branch_a -> join (and dispatch -> join) would make this duplicate-prevention test more complete.

🧪 Suggested enhancement
 	// Count edges from dispatch -> branch_a — should be exactly 1.
-	count := 0
+	countDispatchBranch := 0
+	countBranchJoin := 0
+	countDispatchJoin := 0
 	for _, e := range graph.Edges {
 		if e.From == "dispatch" && e.To == "branch_a" {
-			count++
+			countDispatchBranch++
+		}
+		if e.From == "branch_a" && e.To == "join" {
+			countBranchJoin++
+		}
+		if e.From == "dispatch" && e.To == "join" {
+			countDispatchJoin++
 		}
 	}
-	if count != 1 {
-		t.Errorf("expected 1 edge dispatch->branch_a, got %d", count)
+	if countDispatchBranch != 1 {
+		t.Errorf("expected 1 edge dispatch->branch_a, got %d", countDispatchBranch)
+	}
+	if countBranchJoin != 1 {
+		t.Errorf("expected 1 edge branch_a->join, got %d", countBranchJoin)
+	}
+	if countDispatchJoin != 1 {
+		t.Errorf("expected 1 edge dispatch->join, got %d", countDispatchJoin)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/dippin_adapter_test.go` around lines 858 - 907, The test
TestSynthesizeImplicitEdges_NoDuplicates currently only asserts the
dispatch->branch_a edge is unique; extend it to also count and assert uniqueness
for branch_a->join and dispatch->join (the overlaps created by synthesizing
fan-in/parallel edges). After calling FromDippinIR and building graph
(graph.Edges), add similar counters for edges where
e.From=="branch_a"&&e.To=="join" and e.From=="dispatch"&&e.To=="join" and assert
each count == 1 so all overlapping synthesized edges remain unique.

911-945: Test intent and fixture are slightly misaligned.

The comment says override works “regardless of original kind,” but both nodes are currently ir.NodeAgent. Consider using non-start/exit-native kinds in the fixture to validate the stronger claim.

🔧 Suggested fixture tweak
-			{ID: "start", Kind: ir.NodeAgent, Config: ir.AgentConfig{Prompt: "begin"}},
-			{ID: "exit", Kind: ir.NodeAgent, Config: ir.AgentConfig{Prompt: "end"}},
+			{ID: "start", Kind: ir.NodeTool, Config: ir.ToolConfig{Command: "echo begin"}},
+			{ID: "exit", Kind: ir.NodeHuman, Config: ir.HumanConfig{Mode: "choice", Default: "ok"}},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/dippin_adapter_test.go` around lines 911 - 945, The test
TestEnsureStartExitNodes_OverridesShape currently uses ir.NodeAgent for both
start and exit nodes which doesn't validate the "regardless of original kind"
claim; update the fixture in the workflow passed to FromDippinIR so the start
and exit nodes use a non-native kind (e.g., ir.NodeTask or another
non-start/exit kind) instead of ir.NodeAgent, leaving the assertions on
graph.Nodes["start"].Shape == "Mdiamond" and graph.Nodes["exit"].Shape ==
"Msquare" and the Handler checks unchanged; this ensures FromDippinIR enforces
the start/exit shape override regardless of the original node Kind.
pipeline/handlers/backend_claudecode.go (1)

396-418: Consider documenting which SessionResult fields are not populated.

storeResult only populates Turns, ToolCalls, and Usage from the Claude CLI NDJSON output. The following agent.SessionResult fields remain zero-valued: SessionID, Duration, MaxTurnsUsed, LoopDetected, FilesModified, FilesCreated, ContextUtilization, ToolCacheHits, ToolCacheMisses, ToolTimings, CompactionsApplied, LongestTurn, and Error.

If this is intentional because the Claude CLI doesn't provide this data, consider adding a brief comment to document the limitation for future maintainers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/handlers/backend_claudecode.go` around lines 396 - 418, Add a brief
comment above the storeResult function explaining that agent.SessionResult is
only partially populated from Claude CLI NDJSON (storeResult fills Turns,
ToolCalls and Usage) and explicitly list the other fields left zero-valued
(SessionID, Duration, MaxTurnsUsed, LoopDetected, FilesModified, FilesCreated,
ContextUtilization, ToolCacheHits, ToolCacheMisses, ToolTimings,
CompactionsApplied, LongestTurn, Error) so future maintainers know this is
intentional due to missing data in the Claude CLI output; reference storeResult,
agent.SessionResult, runState, and state.lastResult in the comment for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pipeline/dippin_adapter_test.go`:
- Around line 951-999: Rename the test
TestExtractAgentBackendAttrs_SimulatedIRParams to accurately reflect that it
validates standard ir.AgentConfig extraction (e.g.,
TestExtractAgentConfig_SimulatedIRParams or
TestFromDippinIR_AgentConfig_Simulated); update the test function name and any
references to it so it no longer implies backend-param passthrough, and ensure
the test body (which calls FromDippinIR and asserts node.Attrs for prompt,
llm_model, llm_provider, system_prompt, reasoning_effort, max_turns) remains
unchanged.
- Around line 858-907: The test TestSynthesizeImplicitEdges_NoDuplicates
currently only asserts the dispatch->branch_a edge is unique; extend it to also
count and assert uniqueness for branch_a->join and dispatch->join (the overlaps
created by synthesizing fan-in/parallel edges). After calling FromDippinIR and
building graph (graph.Edges), add similar counters for edges where
e.From=="branch_a"&&e.To=="join" and e.From=="dispatch"&&e.To=="join" and assert
each count == 1 so all overlapping synthesized edges remain unique.
- Around line 911-945: The test TestEnsureStartExitNodes_OverridesShape
currently uses ir.NodeAgent for both start and exit nodes which doesn't validate
the "regardless of original kind" claim; update the fixture in the workflow
passed to FromDippinIR so the start and exit nodes use a non-native kind (e.g.,
ir.NodeTask or another non-start/exit kind) instead of ir.NodeAgent, leaving the
assertions on graph.Nodes["start"].Shape == "Mdiamond" and
graph.Nodes["exit"].Shape == "Msquare" and the Handler checks unchanged; this
ensures FromDippinIR enforces the start/exit shape override regardless of the
original node Kind.

In `@pipeline/handlers/backend_claudecode.go`:
- Around line 396-418: Add a brief comment above the storeResult function
explaining that agent.SessionResult is only partially populated from Claude CLI
NDJSON (storeResult fills Turns, ToolCalls and Usage) and explicitly list the
other fields left zero-valued (SessionID, Duration, MaxTurnsUsed, LoopDetected,
FilesModified, FilesCreated, ContextUtilization, ToolCacheHits, ToolCacheMisses,
ToolTimings, CompactionsApplied, LongestTurn, Error) so future maintainers know
this is intentional due to missing data in the Claude CLI output; reference
storeResult, agent.SessionResult, runState, and state.lastResult in the comment
for clarity.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1424eb95-dd7a-442a-925d-70e5fe63aa15

📥 Commits

Reviewing files that changed from the base of the PR and between 2e3d0b9 and 5c78c54.

📒 Files selected for processing (11)
  • .gitignore
  • cmd/tracker/main_test.go
  • pipeline/backend.go
  • pipeline/backend_test.go
  • pipeline/dippin_adapter.go
  • pipeline/dippin_adapter_test.go
  • pipeline/handlers/backend_claudecode.go
  • pipeline/handlers/backend_claudecode_test.go
  • pipeline/handlers/codergen.go
  • pipeline/handlers/codergen_test.go
  • pipeline/handlers/transcript.go
✅ Files skipped from review due to trivial changes (2)
  • .gitignore
  • pipeline/backend_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • pipeline/dippin_adapter.go
  • pipeline/backend.go
  • pipeline/handlers/codergen.go

clintecker and others added 6 commits March 27, 2026 11:21
…ional, slice clone

- synthesizeImplicitEdges + buildFanInSourceMap handle *ir.ParallelConfig
  and *ir.FanInConfig pointer types (matching main adapter pattern)
- ensureStartExitNodes unconditionally sets Handler (was skipped when
  shape already matched — could leave Handler empty)
- buildSessionStats clones FilesModified/FilesCreated slices to prevent
  shared backing array leaks into trace stats
- Added TODO(dippin-lang) comment on placeholder call

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Missing --print and --verbose flags in buildArgs. Claude Code's
   --output-format stream-json requires both flags. Without them,
   the subprocess produces no NDJSON output.

2. Wrong permission mode values. "fullAuto"/"autoEdit" don't exist
   in Claude Code CLI. Real values: plan, acceptEdits,
   bypassPermissions, default, dontAsk, auto.

3. NDJSON schema mismatch. Assistant messages nest content inside
   msg.message.content, not msg.content. Result messages use
   num_turns (not turns) and total_cost_usd at top level.

Also: system messages now only emit turn start on subtype "init"
(not on hook_started/hook_response). Added Subtype and Message
fields to ndjsonMessage struct.

Verified: end-to-end pipeline run with --backend claude-code
completes successfully (Start → Test → Done, 1 turn, 4.9s).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ified e2e

Three critical bugs found by live smoke testing (not caught by 464 lines
of unit tests or 3 code reviewers):

1. Missing --print --verbose flags: claude CLI requires both for
   stream-json output
2. Wrong permission mode values: fullAuto/autoEdit don't exist in
   Claude Code CLI. Fixed to bypassPermissions/acceptEdits/etc.
3. NDJSON schema mismatch: assistant content is in msg.message.content
   (nested), not msg.content. Result uses num_turns, not turns.

Also: extracted NDJSON types + parser to backend_claudecode_ndjson.go
(main file was 530 lines, now 345). Added rate_limit_event as known
type. System messages only emit turn start on subtype "init".

Verified end-to-end: simple text response, tool calling (Write + Read),
autopilot + backend combo. All produce correct outcomes, turns, and
tool call counts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Token tracking: claude-code backend usage now flows through
   TokenTracker via new AddUsage() method. Run summary shows
   "Tokens by Provider: claude-code" with input/output/cost.

2. Retryable init: replaced sync.Once with mutex for claude-code
   backend initialization. Installing claude mid-run now works
   without restarting tracker.

3. Tool use ID field: Claude Code uses "id" (not "tool_use_id")
   for tool_use blocks in assistant messages. Fixed ndjsonContent
   struct to check both fields.

Verified: tool calling pipeline shows correct counts (4 turns,
3 tools) and token usage in run summary.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The guard that triggers retry on empty responses now also checks
sessResult.Turns > 0. A session with turns but no text output is
a valid pattern with the claude-code backend (agent uses tools
without narrating). Previously this incorrectly triggered retry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Conflicts:
#	docs/superpowers/plans/2026-03-26-claude-code-backend.md
@clintecker clintecker merged commit 6ee12c8 into main Mar 27, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants