feat: Claude Code execution backend for pipeline nodes#41
Conversation
Spec for two new activation paths: 1. Per-node: provider: "claude-code" in .dip files 2. Global: --backend claude-code CLI flag Uses partio-io/claude-agent-sdk-go to delegate agent nodes to Claude Code CLI sessions with full MCP, tools, and ecosystem access. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Major revisions based on feedback from 5-reviewer panel: - Drop third-party SDK dependency (Dependency Hawk): write internal ~300 line wrapper instead of depending on 21-day-old single-maintainer repo - Backend platform abstraction (Yegge): AgentBackend interface so adding backend #3 is config, not code. One handler, multiple backends. - No registry aliasing needed: backend selection inside codergen handler - No build tags: zero external deps, always compile - Concrete error mapping: exit codes, stderr patterns, outcome rules - Subprocess lifecycle: process group kill, orphan prevention, concurrency - Honest observability: document what's degraded, show "—" not zeros - MCP server parsing rules with all edge cases - Backend selection precedence: node > flag > default - Platform constraints: Linux/macOS only, CLI version check Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Key changes from 7-reviewer panel: - Use agent.Event and agent.SessionResult directly (no new types) - Separate backend: attr from provider: (no semantic overload) - Split AgentRunConfig common + Extra any for backend-specific config - SIGTERM then SIGKILL (graceful shutdown for MCP connections) - Minimal subprocess env (prevent credential leakage) - Specific stderr patterns with priority order for error classification - Explicit goroutine contract for NDJSON streaming - Pipeline-level budget accumulator - Document dippin-lang prerequisite for .dip activation path - Permission mode typed enum, default fullAuto for headless use - Lazy CLI version check to avoid startup latency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8 tasks with dependency graph: 1. AgentBackend interface + config types 2. Extract shared ResolvePrompt 3. NativeBackend (wraps agent.Session) 4. Refactor CodergenHandler to use backends 5. ClaudeCodeBackend (CLI subprocess, NDJSON) 6. --backend CLI flag 7. Dippin adapter support 8. Integration test + rebuild Tasks 1-3 are parallelizable. Task 4 is the key refactoring. Task 5 is the new capability. Tasks 6-8 wire it up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduces AgentBackend interface for pluggable execution backends, AgentRunConfig for common session config, ClaudeCodeConfig for Claude Code-specific settings, PermissionMode validation, MCP server parsing, and tool list validation helpers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extract the private resolvePrompt method from CodergenHandler into a public ResolvePrompt function in prompt.go. The codergen handler now delegates to the shared function. This enables other backends (e.g. ClaudeCodeBackend) to reuse the same prompt resolution logic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements the AgentBackend interface using the built-in agent.Session. Translates AgentRunConfig fields (model, provider, max_turns, system_prompt, working_dir) into SessionConfig and forwards events via the emit callback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CodergenHandler now selects and delegates to an AgentBackend for session execution instead of creating agent.Session directly. Adds selectBackend() with node-attr and global-default resolution, buildRunConfig() to translate node config into AgentRunConfig, and WithDefaultBackend registry option. NativeBackend accepts full SessionConfig via Extra field to preserve all config fields (reasoning effort, caching, compaction, etc.). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements pipeline.AgentBackend by spawning the claude CLI as a subprocess, parsing NDJSON stream output, and emitting agent.Event objects. Includes error classification, minimal env construction, MCP server config, and two-phase process shutdown. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds --backend CLI flag to select the agent execution backend. Valid values are "" (default/native), "native", and "claude-code". Invalid values produce a clear error message and exit 1. Threads the backend value through to handlers.WithDefaultBackend in both run and runTUI registry construction paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add extractAgentBackendAttrs helper to dippin_adapter.go that maps backend-selection and Claude Code-specific keys (backend, mcp_servers, allowed_tools, disallowed_tools, max_budget_usd, permission_mode) from a generic params map into node attrs consumed by CodergenHandler and ClaudeCodeBackend. The helper is called from extractAgentAttrs with nil for now. When dippin-lang upstream adds Params map[string]string to ir.AgentConfig, replace nil with cfg.Params to activate full .dip file passthrough. Also bump dippin-lang to v0.9.1 (no new AgentConfig fields in this release; confirms upstream IR change is still pending). PARTIAL: full .dip backend attr support requires upstream dippin-lang to add Params or dedicated fields to ir.AgentConfig. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move toolUseIDs and lastResult from struct fields to local runState, eliminating race conditions when Run() is called concurrently. Also: fix system prompt gated behind ccCfg nil check, change Input to json.RawMessage, log recovered panics, emit EventLLMRequestPreparing for system messages, apply timeout from config, validate PermissionMode, improve "CLI not found" error message, and pass through common env vars. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
selectBackend now returns (AgentBackend, error) and lazy-inits ClaudeCodeBackend when backend=claude-code is set on a node or via --backend flag. buildRunConfig builds ClaudeCodeConfig from node attrs (mcp_servers, allowed_tools, disallowed_tools, max_budget_usd, permission_mode) when the selected backend is claude-code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e, env scope - buildClaudeCodeConfig returns error on malformed attrs (mcp_servers, max_budget_usd, permission_mode) instead of silently ignoring - ValidateToolLists called to reject conflicting allow/deny lists - Default PermissionMode set to fullAuto for headless pipeline use - System NDJSON message emits both EventLLMRequestPreparing and EventTurnStart so TUI transitions from "waiting" to "thinking" - Reverted SSH_AUTH_SOCK and proxy vars from buildEnv — keep minimal env, pass credentials via node-level config if needed - buildRunConfig returns error for config validation propagation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis pull request introduces a pluggable backend abstraction for agent execution in pipeline nodes. It adds support for executing nodes via either the native agent session or an external Claude Code CLI, implements both backends, adds a Changes
Sequence DiagramsequenceDiagram
participant CLI as CLI Flag Parser
participant Cmd as executeCommand
participant Run as run()/runTUI()
participant Registry as Handler Registry
participant CodergenH as CodergenHandler
participant Selector as selectBackend
participant Backend as AgentBackend (Native or ClaudeCode)
CLI->>Cmd: cfg.backend = "claude-code"
Cmd->>Run: backend="claude-code"
Run->>Registry: NewDefaultRegistry(WithDefaultBackend("claude-code"))
Registry->>CodergenH: defaultBackendName = "claude-code"
Registry-->>Run: CodergenHandler ready
Run->>CodergenH: Execute(ctx, node, ...)
CodergenH->>Selector: selectBackend(node attrs, defaultBackendName)
Selector-->>CodergenH: ClaudeCodeBackend
CodergenH->>CodergenH: buildRunConfig(AgentRunConfig, ClaudeCodeConfig)
CodergenH->>Backend: Run(ctx, runCfg, emitCallback)
alt Backend = ClaudeCode
Backend->>Backend: spawn claude CLI process
Backend->>Backend: decode NDJSON from stdout
Backend->>Backend: emit events via callback
Backend->>Backend: classify exit code/stderr
Backend-->>CodergenH: SessionResult + error
else Backend = Native
Backend->>Backend: create agent.Session
Backend->>Backend: sess.Run(ctx, prompt)
Backend-->>CodergenH: SessionResult + error
end
CodergenH->>CodergenH: handle result, construct outcome
CodergenH-->>Run: Result
Run-->>CLI: Success/Error
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (9)
pipeline/dippin_adapter_test.go (1)
885-913: Test name suggests integration, but it’s helper-level simulation.Optional: rename this test to reflect its true scope and reduce future confusion.
📝 Suggested rename
-func TestFromDippinIR_AgentBackendAttrsViaParams(t *testing.T) { +func TestExtractAgentBackendAttrs_SimulatedIRParams(t *testing.T) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pipeline/dippin_adapter_test.go` around lines 885 - 913, Rename the test to reflect that it's a unit/helper-level check of extractAgentBackendAttrs rather than an integration: change TestFromDippinIR_AgentBackendAttrsViaParams to a clearer name such as TestExtractAgentBackendAttrs_ViaParams (or TestExtractAgentBackendAttrs_SimulatedParams), update the test function name accordingly, and adjust the top comment to mention it's directly exercising extractAgentBackendAttrs with simulated params to avoid future confusion.cmd/tracker/main_test.go (1)
238-245: Consider asserting backend passthrough in this run-path test.The new argument is added, but not validated. Adding one assertion here will protect the new CLI→run wiring from regressions.
🧪 Suggested test tightening
err := executeCommand(runConfig{ mode: modeRun, pipelineFile: "pipeline.dot", workdir: "/tmp/workdir", noTUI: true, + backend: "claude-code", }, commandDeps{ ... run: func(pipelineFile, workdir, checkpoint, format, backend string, verbose bool, jsonOut bool) error { runCalled = true if pipelineFile != "pipeline.dot" { t.Fatalf("pipelineFile = %q, want %q", pipelineFile, "pipeline.dot") } + if backend != "claude-code" { + t.Fatalf("backend = %q, want %q", backend, "claude-code") + } return nil },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/tracker/main_test.go` around lines 238 - 245, The test's anonymous run handler doesn't assert the new backend parameter; update the test to validate backend passthrough by adding an expectedBackend variable (e.g., expectedBackend := "some-backend") and inside the run function assert backend == expectedBackend (use t.Fatalf to fail with actual vs expected), and mirror the same assertion in the runTUI stub if that path should also exercise backend; reference the anonymous run and runTUI functions and the runCalled variable when making this change.docs/superpowers/specs/2026-03-26-claude-code-backend-design.md (3)
148-148: Add language specifier to CLI invocation block.-``` +```text claude -p "<prompt>"As per static analysis hints: "Fenced code blocks should have a language specified (MD040)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/specs/2026-03-26-claude-code-backend-design.md` at line 148, Add a language specifier to the fenced code block containing the CLI invocation "claude -p \"<prompt>\"" so the markdown linter MD040 is satisfied; replace the opening fence ``` with a language-tagged fence such as ```bash or ```text in the block that wraps the claude -p "<prompt>" example.
105-105: Add language specifier to fenced code block.The flow diagram code block lacks a language specifier. Consider using
textorplaintextto satisfy markdown linting.-``` +```text .dip file → node attrs → CodergenHandler.Execute()As per static analysis hints: "Fenced code blocks should have a language specified (MD040)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/specs/2026-03-26-claude-code-backend-design.md` at line 105, The fenced code block for the flow diagram is missing a language specifier which triggers MD040; update the opening fence from ``` to include a language (e.g., change the opening fence to ```text or ```plaintext) so the block containing ".dip file → node attrs → CodergenHandler.Execute()" is recognized as a plaintext/code block by the linter.
19-19: Minor: Hyphenate compound modifier."~400 line wrapper" should be "~400-line wrapper" when used as a compound modifier before a noun.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/specs/2026-03-26-claude-code-backend-design.md` at line 19, Update the phrase "~400 line wrapper" to use the compound modifier form "~400-line wrapper" so the sentence reads "Write an internal ~400-line wrapper." Locate the exact string "~400 line wrapper" in the spec text and replace it with "~400-line wrapper" to correct the hyphenation.pipeline/handlers/backend_native_test.go (1)
94-124: Test exercises fallback path, not the primary production path.This test verifies config propagation when
cfg.Extrais nil, exercising the selective override logic inNativeBackend.Run(lines 39-53 ofbackend_native.go). However, based on the context snippet fromcodergen.go:168,CodergenHandler.buildRunConfig()always setscfg.Extra = &sessionCfgfor the native backend. This means:
- The tested code path (building config from individual
AgentRunConfigfields) is effectively dead code in production- The test provides value for direct
NativeBackendusage outside ofCodergenHandler, but doesn't test the actual production pathConsider adding a test that passes
cfg.Extrawith a*agent.SessionConfigto verify the primary code path used byCodergenHandler.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pipeline/handlers/backend_native_test.go` around lines 94 - 124, Add a new unit test that exercises the production path where cfg.Extra is populated with a *agent.SessionConfig: copy/TestNativeBackendRespectsConfig and create a sessionCfg := agent.SessionConfig{Model: "gpt-4o", Provider: "openai", SystemPrompt: "be concise", ...} then set cfg.Extra = &sessionCfg before calling NativeBackend.Run (the same modelCapturingCompleter can be reused) and assert that capturedModel equals sessionCfg.Model; this verifies the code path in NativeBackend.Run that reads overrides from cfg.Extra (used by CodergenHandler.buildRunConfig).pipeline/handlers/backend_native.go (1)
28-54: The fallback config-building branch (lines 37-54) is dead code in production.Based on
codergen.go:168,CodergenHandler.buildRunConfig()always setscfg.Extra = &sessionCfgwhen the backend is notClaudeCodeBackend. This means theelsebranch that builds config from individualAgentRunConfigfields is never executed in the normal pipeline flow.Options:
- Keep for flexibility: The fallback enables direct
NativeBackendusage withoutCodergenHandler, which may be useful for testing or future use cases- Remove: Simplify by always requiring
*agent.SessionConfiginExtra- Document: Add a comment clarifying when this path is used
If keeping, consider documenting this in the else branch:
} else { + // Fallback for direct NativeBackend usage without CodergenHandler. + // CodergenHandler always provides cfg.Extra with *agent.SessionConfig. sessionCfg = agent.DefaultConfig()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pipeline/handlers/backend_native.go` around lines 28 - 54, The else branch in NativeBackend.Run that builds sessionCfg from individual AgentRunConfig fields is dead in normal pipeline flow because CodergenHandler.buildRunConfig always sets cfg.Extra to a *agent.SessionConfig; update this code by keeping the fallback but adding a clear comment above the else branch (near NativeBackend.Run, cfg.Extra and agent.SessionConfig references) stating that CodergenHandler.buildRunConfig normally populates Extra, and that this branch is only used when NativeBackend is invoked directly (e.g., tests or special callers) — alternatively, if you prefer to remove it, enforce that cfg.Extra must be a *agent.SessionConfig and return an error when absent to simplify behavior.pipeline/handlers/codergen.go (1)
94-111: Consider documenting the "codergen" alias for "native" backend.Line 102 accepts
"codergen"as an alias for"native", but this isn't mentioned in the design spec's backend selection precedence section. Either:
- Document this alias in the spec and user-facing docs
- Remove it if it's not intended behavior
case "claude-code": return h.ensureClaudeCodeBackend() - case "native", "codergen": + case "native": return h.ensureNativeBackend(), nil🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pipeline/handlers/codergen.go` around lines 94 - 111, The selectBackend function accepts "codergen" as an alias for the "native" backend but this alias isn't in the spec; either document the alias or remove it—update selectBackend in CodergenHandler to remove the "codergen" case (leaving only "native") if the alias is unintended, or add documentation to the design spec and user-facing docs stating that "codergen" maps to ensureNativeBackend (referencing the "selectBackend" function and the symbols "codergen", "native", and method ensureNativeBackend()).pipeline/handlers/backend_claudecode.go (1)
63-149: SessionResult is sparsely populated compared to NativeBackend.Looking at
agent.SessionResult(context snippet 2) and howNativeBackendpopulates it (context snippet 4), theClaudeCodeBackendleaves several fields at zero values:
SessionID- not setDuration- not tracked (easy to add withtime.Since)MaxTurnsUsed,LoopDetected,ContextUtilization- not derived from Claude CLI outputFilesModified,FilesCreated- not trackedWhile some fields may not be available from the Claude CLI's NDJSON output,
Durationcan be easily tracked. Consider adding it for consistency withNativeBackendand for observability.♻️ Suggested fix to track Duration
func (b *ClaudeCodeBackend) Run(ctx context.Context, cfg pipeline.AgentRunConfig, emit func(agent.Event)) (agent.SessionResult, error) { + start := time.Now() // Per-run state: local to this invocation, safe for concurrent use. state := &runState{ toolUseIDs: make(map[string]string), }Then in
storeResult:func storeResult(msg ndjsonMessage, state *runState) { result := &agent.SessionResult{ Turns: msg.Turns, ToolCalls: make(map[string]int), }And update
buildResultto set Duration:-func buildResult(state *runState) agent.SessionResult { +func buildResult(state *runState, duration time.Duration) agent.SessionResult { if state.lastResult != nil { + state.lastResult.Duration = duration return *state.lastResult } - return agent.SessionResult{} + return agent.SessionResult{Duration: duration} }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pipeline/handlers/backend_claudecode.go` around lines 63 - 149, The Run implementation doesn't populate SessionResult.Duration; record start := time.Now() at the top of ClaudeCodeBackend.Run (after runState creation), compute dur := time.Since(start) after the process completes (before returning) and pass that duration into buildResult (either by adding a duration parameter to buildResult or by storing it on runState and reading it in buildResult). Update buildResult to set the SessionResult.Duration field from the provided duration/runState value so the backend matches NativeBackend's observability.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/tracker/flags.go`:
- Line 115: The flag help text for the backend flag (where
fs.StringVar(&cfg.backend, "backend", "", ... ) is defined) only mentions
"claude-code" but the validator also accepts "native"; update the help string(s)
(both the occurrence tied to cfg.backend and the other occurrence noted) to list
all accepted values (e.g., "Agent backend override: native, claude-code") or use
a generic description that enumerates both allowed backends so the help message
matches the validator.
In `@docs/superpowers/plans/2026-03-26-claude-code-backend.md`:
- Line 7: The phrase "~400 line internal package" is missing a hyphen for the
compound modifier; update the documentation text (the line describing the
"Claude Code backend" and the agent backend architecture) to read "~400-line
internal package" so the compound adjective is grammatically correct.
- Around line 615-627: The fenced code block containing the dependency graph
(the ASCII diagram starting with "Task 1 (interface + config types)" and showing
arrows through Task 8) needs a language specifier for correct rendering; update
the opening triple-backtick to include a language like "text" or "plaintext"
(e.g., replace ``` with ```text) so the diagram renders as plain text. Ensure
only the opening fence is changed and no diagram content is altered.
In `@pipeline/handlers/codergen.go`:
- Around line 113-135: The lazy-init in ensureClaudeCodeBackend and
ensureNativeBackend is racy; protect their check-then-set with sync.Once. Add
per-backend sync.Once fields (e.g., claudeCodeOnce, nativeOnce) to
CodergenHandler and use Once.Do in ensureClaudeCodeBackend (preserving its error
return) and ensureNativeBackend to perform the
NewClaudeCodeBackend/NewNativeBackend construction and assignment inside the Do
closures so the backend is only created once even under concurrent Execute()
calls.
---
Nitpick comments:
In `@cmd/tracker/main_test.go`:
- Around line 238-245: The test's anonymous run handler doesn't assert the new
backend parameter; update the test to validate backend passthrough by adding an
expectedBackend variable (e.g., expectedBackend := "some-backend") and inside
the run function assert backend == expectedBackend (use t.Fatalf to fail with
actual vs expected), and mirror the same assertion in the runTUI stub if that
path should also exercise backend; reference the anonymous run and runTUI
functions and the runCalled variable when making this change.
In `@docs/superpowers/specs/2026-03-26-claude-code-backend-design.md`:
- Line 148: Add a language specifier to the fenced code block containing the CLI
invocation "claude -p \"<prompt>\"" so the markdown linter MD040 is satisfied;
replace the opening fence ``` with a language-tagged fence such as ```bash or
```text in the block that wraps the claude -p "<prompt>" example.
- Line 105: The fenced code block for the flow diagram is missing a language
specifier which triggers MD040; update the opening fence from ``` to include a
language (e.g., change the opening fence to ```text or ```plaintext) so the
block containing ".dip file → node attrs → CodergenHandler.Execute()" is
recognized as a plaintext/code block by the linter.
- Line 19: Update the phrase "~400 line wrapper" to use the compound modifier
form "~400-line wrapper" so the sentence reads "Write an internal ~400-line
wrapper." Locate the exact string "~400 line wrapper" in the spec text and
replace it with "~400-line wrapper" to correct the hyphenation.
In `@pipeline/dippin_adapter_test.go`:
- Around line 885-913: Rename the test to reflect that it's a unit/helper-level
check of extractAgentBackendAttrs rather than an integration: change
TestFromDippinIR_AgentBackendAttrsViaParams to a clearer name such as
TestExtractAgentBackendAttrs_ViaParams (or
TestExtractAgentBackendAttrs_SimulatedParams), update the test function name
accordingly, and adjust the top comment to mention it's directly exercising
extractAgentBackendAttrs with simulated params to avoid future confusion.
In `@pipeline/handlers/backend_claudecode.go`:
- Around line 63-149: The Run implementation doesn't populate
SessionResult.Duration; record start := time.Now() at the top of
ClaudeCodeBackend.Run (after runState creation), compute dur :=
time.Since(start) after the process completes (before returning) and pass that
duration into buildResult (either by adding a duration parameter to buildResult
or by storing it on runState and reading it in buildResult). Update buildResult
to set the SessionResult.Duration field from the provided duration/runState
value so the backend matches NativeBackend's observability.
In `@pipeline/handlers/backend_native_test.go`:
- Around line 94-124: Add a new unit test that exercises the production path
where cfg.Extra is populated with a *agent.SessionConfig:
copy/TestNativeBackendRespectsConfig and create a sessionCfg :=
agent.SessionConfig{Model: "gpt-4o", Provider: "openai", SystemPrompt: "be
concise", ...} then set cfg.Extra = &sessionCfg before calling NativeBackend.Run
(the same modelCapturingCompleter can be reused) and assert that capturedModel
equals sessionCfg.Model; this verifies the code path in NativeBackend.Run that
reads overrides from cfg.Extra (used by CodergenHandler.buildRunConfig).
In `@pipeline/handlers/backend_native.go`:
- Around line 28-54: The else branch in NativeBackend.Run that builds sessionCfg
from individual AgentRunConfig fields is dead in normal pipeline flow because
CodergenHandler.buildRunConfig always sets cfg.Extra to a *agent.SessionConfig;
update this code by keeping the fallback but adding a clear comment above the
else branch (near NativeBackend.Run, cfg.Extra and agent.SessionConfig
references) stating that CodergenHandler.buildRunConfig normally populates
Extra, and that this branch is only used when NativeBackend is invoked directly
(e.g., tests or special callers) — alternatively, if you prefer to remove it,
enforce that cfg.Extra must be a *agent.SessionConfig and return an error when
absent to simplify behavior.
In `@pipeline/handlers/codergen.go`:
- Around line 94-111: The selectBackend function accepts "codergen" as an alias
for the "native" backend but this alias isn't in the spec; either document the
alias or remove it—update selectBackend in CodergenHandler to remove the
"codergen" case (leaving only "native") if the alias is unintended, or add
documentation to the design spec and user-facing docs stating that "codergen"
maps to ensureNativeBackend (referencing the "selectBackend" function and the
symbols "codergen", "native", and method ensureNativeBackend()).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8fcfecfe-ecd6-474e-aac6-2031415b87db
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (20)
cmd/tracker/commands.gocmd/tracker/flags.gocmd/tracker/main.gocmd/tracker/main_test.gocmd/tracker/run.godocs/superpowers/plans/2026-03-26-claude-code-backend.mddocs/superpowers/specs/2026-03-26-claude-code-backend-design.mdgo.modpipeline/backend.gopipeline/backend_test.gopipeline/dippin_adapter.gopipeline/dippin_adapter_test.gopipeline/handlers/backend_claudecode.gopipeline/handlers/backend_claudecode_test.gopipeline/handlers/backend_native.gopipeline/handlers/backend_native_test.gopipeline/handlers/codergen.gopipeline/handlers/prompt.gopipeline/handlers/prompt_test.gopipeline/handlers/registry.go
# Conflicts: # cmd/tracker/flags.go # cmd/tracker/main.go # go.mod # go.sum
There was a problem hiding this comment.
🧹 Nitpick comments (1)
cmd/tracker/main_test.go (1)
1-14: Consider adding test coverage for the--backendflag.Other CLI flags like
--autopilot,--auto-approve, and--jsonhave dedicated test cases (e.g.,TestParseFlagsAutopilot,TestParseFlagsJsonFlag). Adding similar tests for--backendwould verify:
- Valid values are accepted (
native,claude-code)- Invalid values produce the expected error
- The value is correctly threaded to the run callbacks
Example test cases to consider:
func TestParseFlagsBackendValid(t *testing.T) { for _, backend := range []string{"native", "claude-code"} { cfg, err := parseFlags([]string{"tracker", "--backend", backend, "pipeline.dip"}) if err != nil { t.Fatalf("parseFlags returned error for backend %q: %v", backend, err) } if cfg.backend != backend { t.Fatalf("backend = %q, want %q", cfg.backend, backend) } } } func TestParseFlagsBackendInvalid(t *testing.T) { _, err := parseFlags([]string{"tracker", "--backend", "invalid", "pipeline.dip"}) if err == nil { t.Fatal("expected error for invalid backend") } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/tracker/main_test.go` around lines 1 - 14, Add tests covering the --backend flag by exercising parseFlags and the run callbacks: create one test (e.g., TestParseFlagsBackendValid) that calls parseFlags with each valid backend string ("native", "claude-code") and asserts no error and that cfg.backend equals the provided value, and another test (e.g., TestParseFlagsBackendInvalid) that calls parseFlags with an invalid backend and asserts an error is returned; also add a test that ensures the parsed cfg.backend is propagated into the run callbacks or the function that invokes handlers (reference parseFlags, cfg.backend, and the run callback entry point used in tests) so the selected backend is threaded through.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@cmd/tracker/main_test.go`:
- Around line 1-14: Add tests covering the --backend flag by exercising
parseFlags and the run callbacks: create one test (e.g.,
TestParseFlagsBackendValid) that calls parseFlags with each valid backend string
("native", "claude-code") and asserts no error and that cfg.backend equals the
provided value, and another test (e.g., TestParseFlagsBackendInvalid) that calls
parseFlags with an invalid backend and asserts an error is returned; also add a
test that ensures the parsed cfg.backend is propagated into the run callbacks or
the function that invokes handlers (reference parseFlags, cfg.backend, and the
run callback entry point used in tests) so the selected backend is threaded
through.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 33f8b7a5-1860-4acf-8d2c-fef06d1f72e1
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (6)
.claude/settings.local.jsoncmd/tracker/commands.gocmd/tracker/flags.gocmd/tracker/main.gocmd/tracker/main_test.gocmd/tracker/run.go
✅ Files skipped from review due to trivial changes (1)
- .claude/settings.local.json
🚧 Files skipped from review as they are similar to previous changes (2)
- cmd/tracker/commands.go
- cmd/tracker/main.go
CodeRabbit review findings: - ensureClaudeCodeBackend and ensureNativeBackend now use sync.Once for thread-safe lazy init during parallel node execution - Help text for --backend shows both valid values (native, claude-code) - Minor grammar and code block fixes in plan doc Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract transcriptCollector and buildSessionStats to transcript.go (codergen.go was 536 lines). Extract implicit edge synthesis functions to dippin_adapter_edges.go (dippin_adapter.go was 532 lines). Split buildClaudeCodeConfig into parseClaudeCodeToolAttrs and parseClaudeCodeBudgetAttrs (cyclomatic was 16). All complexity gates now pass. Coverage at 78.9% (below 80% threshold) is a pre-existing issue from the backend feature — needs a small test to cover the extracted edge synthesis functions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
pipeline/dippin_adapter.go (1)
223-229: Placeholder call is a no-op by design.The call
extractAgentBackendAttrs(nil, attrs)will never copy any keys since iterating over a nil map yields zero iterations. The comment clearly documents this as intentional scaffolding pending the upstreamcfg.Paramsfield addition.Consider adding a TODO comment to make this easier to find later:
// NOTE: cfg.Params is a placeholder for a future field on ir.AgentConfig. // When dippin-lang upstream adds Params map[string]string to AgentConfig, // replace nil below with cfg.Params. + // TODO(dippin-lang): Replace nil with cfg.Params once available. extractAgentBackendAttrs(nil, attrs)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pipeline/dippin_adapter.go` around lines 223 - 229, The call extractAgentBackendAttrs(nil, attrs) is intentionally a no-op now but should be annotated for future work; add a clear TODO comment directly above or beside the call mentioning that nil is a placeholder until ir.AgentConfig gains Params (e.g., "TODO: replace nil with cfg.Params when ir.AgentConfig.Params is added — see extractAgentBackendAttrs"), so developers can easily find and update the call in the future.pipeline/dippin_adapter_edges.go (1)
32-43: Last-write-wins for duplicate fan-in sources.If multiple fan-in nodes list the same source in their
cfg.Sources, the last one processed will overwrite the mapping. This is likely acceptable for valid workflows (where a source should only feed one fan-in), but worth noting.Also, this function has the same pointer type handling gap (
*ir.FanInConfignot checked) as the main switch.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pipeline/dippin_adapter_edges.go` around lines 32 - 43, buildFanInSourceMap currently overwrites entries when multiple fan-in nodes share a source and ignores pointer-typed configs; update buildFanInSourceMap to handle both ir.FanInConfig and *ir.FanInConfig type assertions when iterating workflow.Nodes (check for cfg, ok := irNode.Config.(ir.FanInConfig) and cfgPtr, ok := irNode.Config.(*ir.FanInConfig)), and add duplicate detection before setting fanInBySource[source] (e.g., if an entry already exists and the existing ID differs from irNode.ID, either log/return an error or otherwise surface the conflict instead of silently overwriting) so duplicate sources are handled explicitly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pipeline/dippin_adapter_edges.go`:
- Around line 88-97: The current logic only sets startNode.Handler and
exitNode.Handler when the Shape differs, which leaves Handler incorrect if Shape
already matches; update the block that handles startNode and exitNode so Shape
is still set conditionally (keep the if startNode.Shape != "Mdiamond" and if
exitNode.Shape != "Msquare") but always assign startNode.Handler = "start" and
exitNode.Handler = "exit" unconditionally after retrieving the nodes (i.e.,
ensure Handler is set regardless of the Shape check), referencing startNode,
exitNode, g.Nodes, g.StartNode and g.ExitNode to locate and modify the code.
- Around line 22-29: The type switch in the loop currently only matches value
types and should also handle pointer variants to mirror extractNodeAttrs'
defensive handling; update the switch in the loop that calls
synthesizeParallelEdges and synthesizeFanInEdges to include cases for
*ir.ParallelConfig and *ir.FanInConfig (dereference or pass through as your
functions expect), and likewise update buildFanInSourceMap to check for both
ir.FanInConfig and *ir.FanInConfig so pointer configs are recognized; use the
existing function names synthesizeParallelEdges, synthesizeFanInEdges,
buildFanInSourceMap and keep behavior consistent with extractNodeAttrs when
unwrapping pointer vs value.
In `@pipeline/handlers/transcript.go`:
- Around line 62-79: The doc comment for buildSessionStats is misleading (it
claims it can return nil when sessResult is nil but the parameter is a value),
so update the comment to remove the "Returns nil" claim or change the signature
to accept *agent.SessionResult if nil semantics are required; meanwhile,
defensively clone slice fields FilesModified and FilesCreated (like ToolCalls is
cloned) before assigning them into the pipeline.SessionStats (e.g. copy via
append([]string(nil), r.FilesModified...) or make+copy) to avoid leaking backing
arrays.
---
Nitpick comments:
In `@pipeline/dippin_adapter_edges.go`:
- Around line 32-43: buildFanInSourceMap currently overwrites entries when
multiple fan-in nodes share a source and ignores pointer-typed configs; update
buildFanInSourceMap to handle both ir.FanInConfig and *ir.FanInConfig type
assertions when iterating workflow.Nodes (check for cfg, ok :=
irNode.Config.(ir.FanInConfig) and cfgPtr, ok :=
irNode.Config.(*ir.FanInConfig)), and add duplicate detection before setting
fanInBySource[source] (e.g., if an entry already exists and the existing ID
differs from irNode.ID, either log/return an error or otherwise surface the
conflict instead of silently overwriting) so duplicate sources are handled
explicitly.
In `@pipeline/dippin_adapter.go`:
- Around line 223-229: The call extractAgentBackendAttrs(nil, attrs) is
intentionally a no-op now but should be annotated for future work; add a clear
TODO comment directly above or beside the call mentioning that nil is a
placeholder until ir.AgentConfig gains Params (e.g., "TODO: replace nil with
cfg.Params when ir.AgentConfig.Params is added — see extractAgentBackendAttrs"),
so developers can easily find and update the call in the future.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 74da7284-0139-47e1-8f57-19e93b008ef6
📒 Files selected for processing (4)
pipeline/dippin_adapter.gopipeline/dippin_adapter_edges.gopipeline/handlers/codergen.gopipeline/handlers/transcript.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pipeline/handlers/codergen.go
There was a problem hiding this comment.
Pull request overview
This PR introduces a pluggable pipeline.AgentBackend abstraction for codergen/agent pipeline nodes and adds a new claude-code backend that runs sessions via the claude CLI (NDJSON stream), selectable per-node or via a global --backend CLI flag.
Changes:
- Added
pipeline.AgentBackend+ sharedAgentRunConfig/Claude Code config parsing/validation utilities. - Refactored
CodergenHandlerto select a backend (native vsclaude-code) and delegate execution while keeping shared prompt resolution/artifact/outcome logic. - Added CLI wiring (
--backend) and supporting adapter/docs/tests.
Reviewed changes
Copilot reviewed 22 out of 23 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
pipeline/handlers/transcript.go |
Extracts transcript collection + session stats building from codergen.go. |
pipeline/handlers/registry.go |
Adds WithDefaultBackend and injects default backend into codergen handler. |
pipeline/handlers/prompt.go |
Extracts shared ResolvePrompt for prompt resolution + fidelity/context injection. |
pipeline/handlers/prompt_test.go |
Unit tests for ResolvePrompt behavior (expansion, context injection, fidelity). |
pipeline/handlers/codergen.go |
Refactors codergen execution to choose/run an AgentBackend; adds Claude Code config parsing. |
pipeline/handlers/backend_native.go |
Implements AgentBackend using existing agent.Session. |
pipeline/handlers/backend_native_test.go |
Tests for NativeBackend event emission, config propagation, defaults. |
pipeline/handlers/backend_claudecode.go |
Implements Claude Code backend via subprocess + NDJSON parsing + error classification. |
pipeline/handlers/backend_claudecode_test.go |
Tests for CLI arg/env building, NDJSON parsing, error classification, MCP JSON. |
pipeline/backend.go |
Defines AgentBackend, AgentRunConfig, ClaudeCodeConfig, MCP/tool validation helpers. |
pipeline/backend_test.go |
Tests permission mode validation, MCP parsing, tool list parsing/validation. |
pipeline/dippin_adapter.go |
Adds placeholder passthrough helper for backend/Claude Code node attrs (pending upstream IR changes). |
pipeline/dippin_adapter_edges.go |
Moves implicit edge synthesis/start-exit validation helpers out of adapter. |
pipeline/dippin_adapter_test.go |
Adds unit tests for backend attr passthrough helper behavior. |
cmd/tracker/run.go |
Threads backend selection into registry construction in console + TUI run paths. |
cmd/tracker/main.go |
Extends runConfig and dependency signatures to include backend selection. |
cmd/tracker/main_test.go |
Updates command dependency stubs for the new backend parameter. |
cmd/tracker/flags.go |
Adds --backend flag + validation and usage output. |
cmd/tracker/commands.go |
Passes backend selection to run / runTUI. |
go.sum |
Updates dippin-lang dependency checksums (v0.10.0). |
docs/superpowers/specs/2026-03-26-claude-code-backend-design.md |
Design spec for backend architecture + Claude Code execution model. |
docs/superpowers/plans/2026-03-26-claude-code-backend.md |
Detailed implementation plan and testing checklist. |
.claude/settings.local.json |
Adds Claude Code local permissions settings (see security feedback). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Pass through env vars needed for basic operation. Credentials | ||
| // (SSH_AUTH_SOCK) and network config (proxy vars) are intentionally | ||
| // excluded — pass them via node-level env attrs if needed. | ||
| for _, name := range []string{ | ||
| "USER", "TMPDIR", "LANG", | ||
| } { | ||
| if val := os.Getenv(name); val != "" { | ||
| env = append(env, name+"="+val) | ||
| } |
There was a problem hiding this comment.
This comment says proxy/SSH vars can be passed via "node-level env attrs", but there doesn't appear to be any mechanism for agent nodes to declare extra environment variables (searching the codebase shows no env-related attrs for codergen/claude-code). Either implement an explicit env passthrough for this backend or adjust the comment to avoid implying unsupported configuration.
| case "claude-code": | ||
| return h.ensureClaudeCodeBackend() | ||
| case "native", "codergen": | ||
| return h.ensureNativeBackend(), nil |
There was a problem hiding this comment.
If a node sets backend to an unrecognized value, this function silently falls back to the default backend. That makes typos hard to detect and can cause pipelines to run with the wrong execution engine. Consider returning an explicit error for unknown backend values (similar to the --backend flag validation) so misconfiguration fails fast.
| return h.ensureNativeBackend(), nil | |
| return h.ensureNativeBackend(), nil | |
| default: | |
| return nil, fmt.Errorf("unknown backend %q for node %q", backend, node.Name) |
pipeline/handlers/transcript.go
Outdated
| } | ||
|
|
||
| // buildSessionStats converts an agent.SessionResult into a pipeline.SessionStats | ||
| // for inclusion in the trace entry. Returns nil if sessResult is nil. |
There was a problem hiding this comment.
The docstring says "Returns nil if sessResult is nil", but buildSessionStats takes an agent.SessionResult value (not a pointer) so it can never be nil. Update the comment (or change the function signature if nil is a meaningful case) to avoid misleading callers.
| // for inclusion in the trace entry. Returns nil if sessResult is nil. | |
| // for inclusion in the trace entry. |
.claude/settings.local.json
Outdated
| "Bash(ssh aibox01:*)", | ||
| "WebSearch" | ||
| ] | ||
| } | ||
| } |
There was a problem hiding this comment.
This appears to be a machine-local Claude Code permissions file (note the filename settings.local.json) and includes an environment-specific SSH allow-rule. Committing this is risky (it can unintentionally broaden tool permissions for other developers/CI and may leak internal hostnames). Recommend removing it from the repo and adding .claude/settings.local.json to .gitignore; if shared defaults are needed, use a non-"local" settings file with safe, minimal permissions.
| "Bash(ssh aibox01:*)", | |
| "WebSearch" | |
| ] | |
| } | |
| } | |
| "WebSearch" | |
| ] | |
| } | |
| } |
| case "tool_use": | ||
| state.toolUseIDs[c.ToolUseID] = c.Name | ||
| events = append(events, agent.Event{ | ||
| Type: agent.EventToolCallStart, | ||
| Timestamp: now, | ||
| ToolName: c.Name, | ||
| ToolInput: string(c.Input), | ||
| }) |
There was a problem hiding this comment.
parseAssistantContent emits tool call events but never increments SessionResult.ToolCalls. As a result sessResult.TotalToolCalls() stays 0, which can incorrectly trip codergen's "produced no output (0 tokens, 0 tool calls)" failure path for tool-only runs and also makes trace stats inaccurate. Consider incrementing counts in runState on each tool_use and copying them into SessionResult.ToolCalls (or updating state.lastResult as tool_use events arrive).
1. selectBackend now rejects unknown backend names with actionable error
2. buildEnv passes SSH_AUTH_SOCK and SSH_AGENT_PID for git-over-SSH
3. classifyError uses precise "throttled"/"throttling" match, excludes "unthrottled"
4. PermissionMode("") is no longer valid — callers must set an explicit mode
5. Removed .claude/settings.local.json from repo, added to .gitignore
6. extractAgentBackendAttrs(nil, attrs) documented as placeholder with warning infra
7. buildMCPServersJSON returns (string, error) instead of falling back to "{}"
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ralph 1 (data integrity):
- storeResult populates ToolCalls from toolUseIDs (was always 0)
- NDJSON decode errors tracked + surfaced when no result received
- parseMessage unmarshal failures counted in decodeErrors
- buildResult returns error when no "result" message received
- buildSessionStats docstring corrected
Ralph 2 (safety — already on branch via direct commit):
- selectBackend returns error for unknown backend names
- buildEnv passes SSH_AUTH_SOCK + SSH_AGENT_PID
- classifyError uses precise throttle matching
- PermissionMode("") no longer valid
- buildMCPServersJSON propagates errors
- .claude/settings.local.json removed + gitignored
Ralph 3 (tests):
- CLI flag tests for --backend (native, claude-code, invalid)
- Backend passthrough assertion in executeCommand test
- Edge synthesis + ensureStartExitNodes tests for coverage
Complexity: extracted decodeNDJSON + collectResult from Run()
to bring cyclomatic/cognitive under thresholds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (4)
pipeline/dippin_adapter_test.go (3)
951-999: Test name should reflect what is actually validated.This test currently verifies standard
ir.AgentConfigextraction (prompt,llm_model, etc.), not backend-param passthrough. Renaming it (or expanding it once IR params exist) will reduce confusion and overlap withTestFromDippinIR_AgentConfig.✏️ Suggested rename
-func TestExtractAgentBackendAttrs_SimulatedIRParams(t *testing.T) { +func TestExtractAgentAttrs_SelectedFields(t *testing.T) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pipeline/dippin_adapter_test.go` around lines 951 - 999, Rename the test TestExtractAgentBackendAttrs_SimulatedIRParams to accurately reflect that it validates standard ir.AgentConfig extraction (e.g., TestExtractAgentConfig_SimulatedIRParams or TestFromDippinIR_AgentConfig_Simulated); update the test function name and any references to it so it no longer implies backend-param passthrough, and ensure the test body (which calls FromDippinIR and asserts node.Attrs for prompt, llm_model, llm_provider, system_prompt, reasoning_effort, max_turns) remains unchanged.
858-907: Consider asserting all overlapping synthesized edges remain unique.Right now only
dispatch -> branch_aduplication is checked. Adding checks forbranch_a -> join(anddispatch -> join) would make this duplicate-prevention test more complete.🧪 Suggested enhancement
// Count edges from dispatch -> branch_a — should be exactly 1. - count := 0 + countDispatchBranch := 0 + countBranchJoin := 0 + countDispatchJoin := 0 for _, e := range graph.Edges { if e.From == "dispatch" && e.To == "branch_a" { - count++ + countDispatchBranch++ + } + if e.From == "branch_a" && e.To == "join" { + countBranchJoin++ + } + if e.From == "dispatch" && e.To == "join" { + countDispatchJoin++ } } - if count != 1 { - t.Errorf("expected 1 edge dispatch->branch_a, got %d", count) + if countDispatchBranch != 1 { + t.Errorf("expected 1 edge dispatch->branch_a, got %d", countDispatchBranch) + } + if countBranchJoin != 1 { + t.Errorf("expected 1 edge branch_a->join, got %d", countBranchJoin) + } + if countDispatchJoin != 1 { + t.Errorf("expected 1 edge dispatch->join, got %d", countDispatchJoin) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pipeline/dippin_adapter_test.go` around lines 858 - 907, The test TestSynthesizeImplicitEdges_NoDuplicates currently only asserts the dispatch->branch_a edge is unique; extend it to also count and assert uniqueness for branch_a->join and dispatch->join (the overlaps created by synthesizing fan-in/parallel edges). After calling FromDippinIR and building graph (graph.Edges), add similar counters for edges where e.From=="branch_a"&&e.To=="join" and e.From=="dispatch"&&e.To=="join" and assert each count == 1 so all overlapping synthesized edges remain unique.
911-945: Test intent and fixture are slightly misaligned.The comment says override works “regardless of original kind,” but both nodes are currently
ir.NodeAgent. Consider using non-start/exit-native kinds in the fixture to validate the stronger claim.🔧 Suggested fixture tweak
- {ID: "start", Kind: ir.NodeAgent, Config: ir.AgentConfig{Prompt: "begin"}}, - {ID: "exit", Kind: ir.NodeAgent, Config: ir.AgentConfig{Prompt: "end"}}, + {ID: "start", Kind: ir.NodeTool, Config: ir.ToolConfig{Command: "echo begin"}}, + {ID: "exit", Kind: ir.NodeHuman, Config: ir.HumanConfig{Mode: "choice", Default: "ok"}},🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pipeline/dippin_adapter_test.go` around lines 911 - 945, The test TestEnsureStartExitNodes_OverridesShape currently uses ir.NodeAgent for both start and exit nodes which doesn't validate the "regardless of original kind" claim; update the fixture in the workflow passed to FromDippinIR so the start and exit nodes use a non-native kind (e.g., ir.NodeTask or another non-start/exit kind) instead of ir.NodeAgent, leaving the assertions on graph.Nodes["start"].Shape == "Mdiamond" and graph.Nodes["exit"].Shape == "Msquare" and the Handler checks unchanged; this ensures FromDippinIR enforces the start/exit shape override regardless of the original node Kind.pipeline/handlers/backend_claudecode.go (1)
396-418: Consider documenting which SessionResult fields are not populated.
storeResultonly populatesTurns,ToolCalls, andUsagefrom the Claude CLI NDJSON output. The followingagent.SessionResultfields remain zero-valued:SessionID,Duration,MaxTurnsUsed,LoopDetected,FilesModified,FilesCreated,ContextUtilization,ToolCacheHits,ToolCacheMisses,ToolTimings,CompactionsApplied,LongestTurn, andError.If this is intentional because the Claude CLI doesn't provide this data, consider adding a brief comment to document the limitation for future maintainers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pipeline/handlers/backend_claudecode.go` around lines 396 - 418, Add a brief comment above the storeResult function explaining that agent.SessionResult is only partially populated from Claude CLI NDJSON (storeResult fills Turns, ToolCalls and Usage) and explicitly list the other fields left zero-valued (SessionID, Duration, MaxTurnsUsed, LoopDetected, FilesModified, FilesCreated, ContextUtilization, ToolCacheHits, ToolCacheMisses, ToolTimings, CompactionsApplied, LongestTurn, Error) so future maintainers know this is intentional due to missing data in the Claude CLI output; reference storeResult, agent.SessionResult, runState, and state.lastResult in the comment for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pipeline/dippin_adapter_test.go`:
- Around line 951-999: Rename the test
TestExtractAgentBackendAttrs_SimulatedIRParams to accurately reflect that it
validates standard ir.AgentConfig extraction (e.g.,
TestExtractAgentConfig_SimulatedIRParams or
TestFromDippinIR_AgentConfig_Simulated); update the test function name and any
references to it so it no longer implies backend-param passthrough, and ensure
the test body (which calls FromDippinIR and asserts node.Attrs for prompt,
llm_model, llm_provider, system_prompt, reasoning_effort, max_turns) remains
unchanged.
- Around line 858-907: The test TestSynthesizeImplicitEdges_NoDuplicates
currently only asserts the dispatch->branch_a edge is unique; extend it to also
count and assert uniqueness for branch_a->join and dispatch->join (the overlaps
created by synthesizing fan-in/parallel edges). After calling FromDippinIR and
building graph (graph.Edges), add similar counters for edges where
e.From=="branch_a"&&e.To=="join" and e.From=="dispatch"&&e.To=="join" and assert
each count == 1 so all overlapping synthesized edges remain unique.
- Around line 911-945: The test TestEnsureStartExitNodes_OverridesShape
currently uses ir.NodeAgent for both start and exit nodes which doesn't validate
the "regardless of original kind" claim; update the fixture in the workflow
passed to FromDippinIR so the start and exit nodes use a non-native kind (e.g.,
ir.NodeTask or another non-start/exit kind) instead of ir.NodeAgent, leaving the
assertions on graph.Nodes["start"].Shape == "Mdiamond" and
graph.Nodes["exit"].Shape == "Msquare" and the Handler checks unchanged; this
ensures FromDippinIR enforces the start/exit shape override regardless of the
original node Kind.
In `@pipeline/handlers/backend_claudecode.go`:
- Around line 396-418: Add a brief comment above the storeResult function
explaining that agent.SessionResult is only partially populated from Claude CLI
NDJSON (storeResult fills Turns, ToolCalls and Usage) and explicitly list the
other fields left zero-valued (SessionID, Duration, MaxTurnsUsed, LoopDetected,
FilesModified, FilesCreated, ContextUtilization, ToolCacheHits, ToolCacheMisses,
ToolTimings, CompactionsApplied, LongestTurn, Error) so future maintainers know
this is intentional due to missing data in the Claude CLI output; reference
storeResult, agent.SessionResult, runState, and state.lastResult in the comment
for clarity.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1424eb95-dd7a-442a-925d-70e5fe63aa15
📒 Files selected for processing (11)
.gitignorecmd/tracker/main_test.gopipeline/backend.gopipeline/backend_test.gopipeline/dippin_adapter.gopipeline/dippin_adapter_test.gopipeline/handlers/backend_claudecode.gopipeline/handlers/backend_claudecode_test.gopipeline/handlers/codergen.gopipeline/handlers/codergen_test.gopipeline/handlers/transcript.go
✅ Files skipped from review due to trivial changes (2)
- .gitignore
- pipeline/backend_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- pipeline/dippin_adapter.go
- pipeline/backend.go
- pipeline/handlers/codergen.go
…ional, slice clone - synthesizeImplicitEdges + buildFanInSourceMap handle *ir.ParallelConfig and *ir.FanInConfig pointer types (matching main adapter pattern) - ensureStartExitNodes unconditionally sets Handler (was skipped when shape already matched — could leave Handler empty) - buildSessionStats clones FilesModified/FilesCreated slices to prevent shared backing array leaks into trace stats - Added TODO(dippin-lang) comment on placeholder call Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Missing --print and --verbose flags in buildArgs. Claude Code's --output-format stream-json requires both flags. Without them, the subprocess produces no NDJSON output. 2. Wrong permission mode values. "fullAuto"/"autoEdit" don't exist in Claude Code CLI. Real values: plan, acceptEdits, bypassPermissions, default, dontAsk, auto. 3. NDJSON schema mismatch. Assistant messages nest content inside msg.message.content, not msg.content. Result messages use num_turns (not turns) and total_cost_usd at top level. Also: system messages now only emit turn start on subtype "init" (not on hook_started/hook_response). Added Subtype and Message fields to ndjsonMessage struct. Verified: end-to-end pipeline run with --backend claude-code completes successfully (Start → Test → Done, 1 turn, 4.9s). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ified e2e Three critical bugs found by live smoke testing (not caught by 464 lines of unit tests or 3 code reviewers): 1. Missing --print --verbose flags: claude CLI requires both for stream-json output 2. Wrong permission mode values: fullAuto/autoEdit don't exist in Claude Code CLI. Fixed to bypassPermissions/acceptEdits/etc. 3. NDJSON schema mismatch: assistant content is in msg.message.content (nested), not msg.content. Result uses num_turns, not turns. Also: extracted NDJSON types + parser to backend_claudecode_ndjson.go (main file was 530 lines, now 345). Added rate_limit_event as known type. System messages only emit turn start on subtype "init". Verified end-to-end: simple text response, tool calling (Write + Read), autopilot + backend combo. All produce correct outcomes, turns, and tool call counts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Token tracking: claude-code backend usage now flows through TokenTracker via new AddUsage() method. Run summary shows "Tokens by Provider: claude-code" with input/output/cost. 2. Retryable init: replaced sync.Once with mutex for claude-code backend initialization. Installing claude mid-run now works without restarting tracker. 3. Tool use ID field: Claude Code uses "id" (not "tool_use_id") for tool_use blocks in assistant messages. Fixed ndjsonContent struct to check both fields. Verified: tool calling pipeline shows correct counts (4 turns, 3 tools) and token usage in run summary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The guard that triggers retry on empty responses now also checks sessResult.Turns > 0. A session with turns but no text output is a valid pattern with the claude-code backend (agent uses tools without narrating). Previously this incorrectly triggered retry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Conflicts: # docs/superpowers/plans/2026-03-26-claude-code-backend.md
Summary
Adds a pluggable
AgentBackendabstraction to tracker's pipeline engine, with a Claude Code backend that delegates agent nodes to theclaudeCLI. No external SDK dependency — the backend is ~400 lines of internal Go that spawnsclaudeas a subprocess and parses NDJSON.Two activation paths:
backend: "claude-code"in .dip files (requires upstream dippin-lang IR change)tracker --backend claude-code pipeline.dip— all agent nodes run via Claude CodeWhy: Pipeline nodes get access to MCP servers, Claude Code skills/hooks, CLAUDE.md awareness, and the full Claude Code tool suite — capabilities unreachable by tracker's 6 built-in tools.
Architecture
One handler (
codergen), multiple backends. The handler does prompt resolution, config building, artifact writing, outcome parsing. The backend just runs the agent session.Key design decisions
AgentBackendinterface emitsagent.Eventdirectly and returnsagent.SessionResult— no new types, no translation layerExtra anyfield onAgentRunConfigcarries backend-specific config (*ClaudeCodeConfigor*agent.SessionConfig)New files
pipeline/backend.goAgentBackendinterface, config types, MCP parsing, tool list validationpipeline/handlers/prompt.goResolvePrompt(extracted from codergen)pipeline/handlers/backend_native.goagent.Sessionpipeline/handlers/backend_claudecode.goclaudeCLI, parses NDJSON, emits eventsSubprocess management
Setpgid)context.WithTimeoutfrom node configReview notes
extractAgentBackendAttrs) is wired tonil— it's a placeholder pending an upstreamir.AgentConfig.Paramsfield in dippin-lang--backend claude-codeCLI path works today; per-node.dipdeclaration requires the upstream changeTest plan
go test ./...)claudeCLI (requires CLI installed,//go:build integration)tracker --backend claude-code --no-tui <pipeline>🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
--backendCLI flag to specify agent execution backend ("native" or "claude-code")Documentation