Skip to content

Platform reframe: layered engine, tmux agents, workflow packages#81

Merged
mattleaverton merged 86 commits into
danshapiro:mainfrom
mattleaverton:impl/platform-reframe
Apr 17, 2026
Merged

Platform reframe: layered engine, tmux agents, workflow packages#81
mattleaverton merged 86 commits into
danshapiro:mainfrom
mattleaverton:impl/platform-reframe

Conversation

@mattleaverton
Copy link
Copy Markdown
Collaborator

Summary

This PR reframes Kilroy as a layered platform: a core graph-execution engine (L0), an agent-execution layer that drives CLI tools via tmux (L1), and workflow packages that ship as self-contained directories with graphs, scripts, and manifests (L2). It adds the primitives and server infrastructure needed to run multi-agent workflows reliably from a headless process: explicit loop and concurrent split/join nodes, a SQLite run database, a per-run structured activity log, a REST API with SSE streaming, and an embedded dashboard UI. Three ready-to-use workflow packages ship with the branch: quick-launch, pr-review, and build-test.

This unblocks building agentic automation that can be launched from other agents (the quick-launch package is already used that way), observed in real time via the log API, and extended by adding new workflow packages without touching engine code.

What's new

Layered architecture (L0/L1/L2)

  • Engine package (internal/attractor/engine/) is now L0-only: graph execution, node dispatch, lifecycle hooks. No agent or git specifics inside.
  • L1 (internal/attractor/agents/) provides TmuxAgentHandler, which is registered as the agent handler at startup.
  • L2 (internal/attractor/workflows/) provides GitHook, HumanGateHandler, and the package loader — registered at startup in cmd/kilroy/main.go via newLayeredRegistry().
  • GitOps interface in the engine means git mode is fully optional; the engine runs in plain-directory mode when GitOps is nil, enabling non-git workflows.
  • codergen has been renamed to agent throughout (types, files, test names).

Workflow packages and CLI flags

  • --package <dir>: loads a self-contained workflow from a directory containing graph.dot, optional workflow.toml, and scripts/ / prompts/ subdirectories. Package scripts and prompts are materialized into .kilroy/package/ inside the workspace before execution.
  • --workspace <dir>: sets the execution directory independently of the source repo.
  • --input <path-or-json>: passes structured inputs as a JSON/YAML file or inline JSON. Values are injected as KILROY_INPUT_<KEY> env vars and written to .kilroy/INPUT.md.
  • --prompt-file <path>: stages a file as the node prompt.
  • --label KEY=VALUE: attaches metadata to a run; queryable via runs list and runs show.
  • --tmux: enables the tmux agent handler for the run.
  • runs show, runs wait: wait on and inspect a run by ID or prefix; --latest with --label filters to pick the most recent matching run.

Engine primitives

  • Loop (internal/attractor/engine/loop.go): explicit single-node or multi-node iteration. Termination conditions: loop_count, loop_max, loop_until_file, loop_until_file_contains, loop_while_outcome=fail. Each iteration gets its own attempt number and DB row for observability. Nested loops are rejected at validation.
  • Concurrent split/join (internal/attractor/engine/concurrent.go): runs independent node chains concurrently in the same workspace, waits for all branches. Configured via concurrent_id node attribute; allow_partial=true permits branch failures. Nested concurrent scopes are rejected.
  • Output contracts: nodes declare expected output files via outputs= attribute; the engine enforces their existence after the node completes and writes .kilroy/FEEDBACK.md on failure.
  • .kilroy/ file conventions (internal/attractor/engine/kilroy_files.go): INPUT.md, CONTEXT.md, TASK.md, FEEDBACK.md written by the engine as standard inter-node data files. KILROY_STAGE_STATUS_PATH / KILROY_STAGE_STATUS_FALLBACK_PATH env vars let scripts signal completion status.

Agent execution via tmux

  • TmuxAgentHandler (internal/attractor/agents/tmux_handler.go) spawns each agent node in a named tmux session on the kilroy socket. Sessions persist after completion for inspection; pane output is tailed in real time.
  • Session manager (internal/attractor/agents/tmux/) handles create, destroy, wait, and health checks. Process group kill (SIGTERM then SIGKILL) on context cancel — no orphan processes.
  • Invocation templates (internal/attractor/agents/templates/) define per-tool startup, env, and structured-output behavior for claude, codex, opencode, and gemini. Adding a new CLI tool requires only a new template, no handler code.
  • Auth isolation: codex sessions write a per-run auth.json into an isolated CODEX_HOME; the user's ~/.codex/config.toml is explicitly excluded so user-local settings (model, reasoning effort) cannot bleed into kilroy runs.
  • Conversation log parsers (internal/attractor/agents/agentlog/): structured AgentEvent extraction from Claude JSONL, Codex JSONL, and OpenCode JSONL. A live tailer reads the log during execution and emits events to the RunLog.

RunLog and observability

  • RunLog (internal/attractor/engine/runlog.go): newline-delimited JSON activity log written to {run_logs_root}/run.log. Every lifecycle event (run start/end, node start/end, edge decision, checkpoint, git activity, tool stdout/stderr line, agent conversation event) is emitted as a timestamped RunLogEvent.
  • GET /runs/{id}/log: HTTP endpoint that serves the run.log with optional query filters (?node=, ?source=, ?event=, ?since=, ?tail=N) and SSE streaming (?stream=true) for live tail without polling.
  • Git activity events (worktree.created, commit) appear in the RunLog so the full timeline is in one place.

Server and REST API

  • internal/server/ now covers a full run-management API:
    • POST /runs: accepts a workflow package, inputs, workspace, and labels to start a run.
    • GET /runs, GET /runs/{id}: run list and detail from the RunDB (no filesystem-only fallback for completed runs).
    • GET /runs/{id}/log: RunLog with SSE.
    • GET /runs/{id}/nodes/{nodeId}/diff: per-node git diff.
    • GET /runs/{id}/files/{path...}, GET /runs/{id}/workspace/{path...}: file browser.
    • GET /workflows: lists available packages.
    • /pipelines/... kept as a backward-compatibility alias.
  • Embedded dashboard UI at /ui/ (internal/server/ui/index.html, viz.js, viz-render.js) — compiled into the binary via //go:embed. No separate asset server needed.
  • Prefix ID resolution: runs show 01KPB resolves to the unique matching run, same as the CLI.
  • SQLite WAL mode and 5-second busy timeout (internal/attractor/rundb/rundb.go) for safe concurrent server writes.

Workflows shipped

  • workflows/quick-launch/: single-agent task runner. Accepts prompt, optional context_file or context. Three graph variants: graph.dot (Claude), graph.codex.dot (Codex), graph.gemini.dot (Gemini). Ships with a kilroy slash-command skill and install script.
  • workflows/pr-review/: full PR review pipeline — checkout, build/test, per-file code review (Claude), holistic review, combined report. Accepts pr_repo and pr_number inputs; emits review-report.md.
  • workflows/build-test/: build-and-test workflow for CI-style validation.
  • workflows/multi-tool-exercise/: multi-agent graph used to validate concurrent tool execution and observability.

Reliability fixes

  • Subprocess group kill on cancel: tool handler spawns each subprocess in its own process group (Setpgid: true) and sends SIGTERM to the group on context cancellation, falling back to SIGKILL if needed (internal/attractor/engine/process_group_unix.go). Prevents orphaned child processes on run cancel.
  • Validation rejects nested concurrent and loop scopes at graph load time.
  • Stall watchdog now resets from the agent log tailer's last-seen event, not just stdout — prevents false stall detection during long LLM reasoning pauses.
  • Run failure is recorded in the RunDB on engine error return (previously only terminal nodes wrote their own state).
  • Server reconciles in-flight runs as stale on startup to prevent permanently pending entries after a crash.

Breaking changes

  • codergen is renamed to agent throughout internal types and test file names. Any code outside this repo that imports internal/attractor/engine types named Codergen* will need to be updated.
  • --graph remains the primary flag; --package is additive. No existing graph-based invocations change.
  • The /pipelines/ server API is retained as an alias but the canonical path is /runs/.
  • internal/attractor/workflows/ now contains HumanGateHandler (moved out of the engine package). Callers that registered a wait.human handler directly into the engine registry will need to use the L2 package instead, or register workflows.NewHumanGateHandler() themselves.

Known gaps and follow-ups

  • The supervisor prototype (internal/attractor/workflows/supervisor.go) is present but not wired to any endpoint or scheduler yet.
  • Windows process-group kill is a no-op stub (process_group_windows.go); subprocess cleanup on cancel is not guaranteed on Windows.
  • The dashboard UI is functional for run inspection but does not yet support launching runs or answering human-gate questions from the browser.
  • The CXDB integration boundary is documented (engine/cxdb_hook.go, engine/document CXDB integration boundary) but CXDB remains embedded in the engine; extraction to a separate service is a planned follow-up.
  • Gemini support in quick-launch (graph.gemini.dot) exists but the Gemini template is minimal and untested end-to-end.

Test plan

  • go build ./... compiles cleanly.
  • go test ./... passes (unit and integration tests; integration tests require external services to be available or will skip).
  • kilroy attractor run --graph demo/tmux-agent-test/graph.dot --tmux completes without error; check that the tmux session is cleaned up.
  • kilroy attractor run --package workflows/quick-launch --input '{"prompt":"say hello"}' --workspace /tmp/ql-test --tmux produces a result.md in the workspace.
  • kilroy attractor runs list shows the run; kilroy attractor runs show --latest prints its detail.
  • kilroy attractor serve starts; visit http://localhost:8080/ui/ and confirm the dashboard loads and lists the completed runs.
  • GET http://localhost:8080/runs/<id>/log returns the NDJSON log; add ?stream=true and confirm it stays open while a run is in progress.
  • Run a graph with a loop_count=3 node and verify three distinct attempt rows appear in runs show --json output.
  • Cancel a running job mid-execution and confirm no orphan tmux sessions or subprocesses remain (tmux ls -L kilroy should be empty).

mattleaverton and others added 30 commits April 1, 2026 16:15
…racts

From first real PR review workflow run:
- Phase 0.9: config/auto-detect conflict, require_clean default, missing
  env vars in tool nodes, CLI headless warning, error message hints,
  worktree file-not-found context
- Phase 3.7-3.9: run input contract (--input), output contract, node
  data passing conventions
- Future work: iteration patterns (dynamic loops over collections)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Setup/build/test tool nodes + single review agent node.
Experimental workflow for automated PR triage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Setup script now copies build-test.sh to .ai/ before gh pr checkout
changes the branch and removes workflow files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Raw git checkout (no gh pr checkout) with unique branch names for parallel safety
- Separate investigate (exploratory, full tool access) and decide (directive, no tools) agents
- Tighter output contract: next actions instead of follow-up tasks
- Setup script preserves all workflow scripts to .ai/ before branch switch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Absolute path for setup script (works from any repo's worktree)
- Remove Go-specific assumptions from investigate prompt
- Agent discovers build system and runs appropriate checks
- Add freshell run config

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…platform

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Create agents/ (L1) and workflows/ (L2) package directories.
Split NewDefaultRegistry into NewCoreRegistry (L0-only) + NewDefaultRegistry
(backward compat). Export engine types and methods needed by external handler
packages: StatusSource, FallbackStatusPath, StageStatusContract, Truncate,
WarnEngine, BuildFidelityPreamble, ClassifyAPIError, etc.
Add Engine accessor methods: AppendProgress, CXDBPrompt,
CXDBInterviewStarted/Completed/Timeout, LastResolvedFidelity, SetDefault.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Registry field to RunOptions so cmd/kilroy/ can pass a pre-composed
handler registry. cmd/kilroy/ now creates a layered registry:
  L0: engine.NewCoreRegistry() (start, exit, conditional, tool, parallel, fan_in)
  L1: agents.AgentHandler (codergen/default)
  L2: workflows.HumanGateHandler, workflows.ManagerLoopHandler

Type aliases in agents/ and workflows/ establish the package structure and
import direction. Implementations remain in engine/ until Phases 2-3.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TestCoreRegistry_ToolOnlyGraph demonstrates that a graph using only
tool_command nodes executes successfully with NewCoreRegistry (no L1
agent handler or L2 workflow handlers registered).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename Go types: CodergenBackend→AgentBackend, CodergenRouter→AgentRouter,
SimulatedCodergenBackend→SimulatedAgentBackend, etc.
Rename handler type string: "codergen"→"agent" in registry.
Rename DOT attribute: agent_mode replaces codergen_mode (with fallback).
Rename files: codergen_*.go → agent_*.go.
Update comments, diagnostics, test names.

The CodergenHandler type name is retained in engine/ as the implementation
type (aliased as agents.AgentHandler). It will be renamed when the
implementation moves to agents/ in Phase 2.1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New package internal/attractor/rundb backed by modernc.org/sqlite (pure Go,
no CGO). Global DB at ~/.local/state/kilroy/runs.db.

Features:
- Auto-applying numbered SQL migrations on DB open
- WAL mode for concurrent reads
- Schema: runs (with labels, inputs, timing), node_executions (with attempts),
  edge_decisions, provider_selections
- Write ops: InsertRun, CompleteRun, InsertNodeStart, CompleteNode,
  InsertEdgeDecision, InsertProviderSelection
- Read ops: GetRun, LatestRun, ListRuns (filter by status/labels/graph),
  GetNodeExecutions
- Prune ops: PruneRuns (by date, graph, labels, or orphaned logs_root)
- Cascade deletes: child records deleted with parent run
- 16 tests covering all operations

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add RunDBWriter interface in engine/ (no rundb import needed).
Engine records to RunDB at every lifecycle point:
- run start (after worktree ready)
- node start/complete (each execution)
- edge selection decisions
- run complete (success/failure)

All RunDB calls are best-effort (warn on error, never block).
cmd/kilroy/ opens global DB and passes via RunOptions.RunDB.
Integration test proves tool graph produces correct DB entries.

Also fix RunOptions propagation: RunDB, Registry, Labels now properly
forwarded through bootstrapRunWithConfig overrides.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
runs list: queries RunDB first, falls back to filesystem scan.
Now shows duration column.
runs prune: delegates to RunDB with filter support (before, graph, labels, orphans).
status --latest: instant lookup via RunDB.LatestRun() instead of filesystem scan.

All commands gracefully fall back to filesystem-based behavior when
the RunDB is unavailable or empty.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Structured inputs for graph runs via --input flag.

Features:
- LoadInputFile (YAML/JSON) and LoadInputString (inline JSON)
- Graph declares required inputs via inputs="key1,key2" attribute
- ValidateRequiredInputs rejects runs with missing declared inputs
- Input values injected into context as input.* keys
- Input values expanded in prompts as $input.key placeholders
- Input values available as KILROY_INPUT_* env vars in tool_command nodes
- Inputs recorded in RunDB

Integration test proves tool_command nodes see KILROY_INPUT_* env vars.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Separates graph file location from execution location:
- --workspace /path/to/dir sets the execution directory
- --graph /path/to/graph.dot determines where prompt_file resolves
- When --workspace is omitted, defaults to current behavior
- GraphDir derived from --graph path for prompt_file resolution
- Workspace flows through RunOptions → engine → worktree creation

PrepareOptions gains GraphDir field that takes precedence over RepoPath
for prompt_file resolution, enabling cross-repo workflows where graphs
and scripts live separately from the target project.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Graphs declare expected output artifacts via outputs="file1,file2" attribute.
After run completion, the engine:
- Searches for declared outputs in the worktree
- Copies found outputs to {logs_root}/outputs/
- Writes outputs.json manifest with found/missing status and file sizes
- Emits warnings for missing declared outputs (not errors)
- Records output collection in progress events

Hooked into persistTerminalOutcome so outputs are collected on every
run completion (success or failure).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add canonical /runs endpoints per platform-reframe plan:
- GET /runs: list runs from RunDB (with status/graph filters)
- GET /runs/{id}/outputs: list collected output artifacts

Existing /pipelines endpoints retained as backward-compat aliases.
All /runs endpoints mirror /pipelines for submit, status, events,
cancel, context, and questions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run lifecycle management additions:
- --label KEY=VALUE flag on attractor run (stored in RunDB)
- --older-than 7d duration-based prune filter (supports d/h/m units)
- Labels passed through to RunDB via RunOptions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New packages for Layer 1 agent capabilities:

agents/tmux/ — tmux session management:
  - Session creation with two-step pattern (shell → respawn-pane)
  - Input delivery with sanitization, chunking, and Enter verification
  - Output capture with NBSP normalization for prompt detection
  - Readiness/idle/exit detection via polling with busy indicators
  - Process tree cleanup (SIGTERM → grace → SIGKILL)
  - Socket isolation (kilroy-specific tmux socket)
  - Session environment variable storage for metadata
  - 11 integration tests against real tmux on isolated socket

agents/templates/ — per-tool invocation templates:
  - Template struct with per-tool config (args, env, prompt prefix,
    busy indicators, startup dialogs, exit behavior)
  - Built-in templates: claude, codex, gemini, opencode
  - Template registry for name-based lookup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TestSmoke_Claude_PrintMode spawns Claude Code via tmux in --print mode,
waits for exit, and captures output. Verified working: Claude returns
KILROY_SMOKE_OK, session exits with status 0.

Tests skip gracefully when API keys or CLI tools are unavailable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TmuxAgentHandler implements engine.Handler and orchestrates the full
agent lifecycle via tmux sessions:
1. Resolve tool template from node attributes (agent_tool or llm_provider)
2. Build command and environment from template
3. Create tmux session on isolated socket
4. Handle startup dialogs (trust prompts, permission warnings)
5. Wait for completion (exit-based or idle-detection)
6. Capture output and build outcome
7. Clean up session and process tree

Two agent handlers now available:
- AgentHandler: existing subprocess/API backend (backward compat)
- TmuxAgentHandler: tmux-based CLI sessions (new)

Also adds SendKeys method to tmux.Manager for dialog interaction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire TmuxAgentHandler into cmd/kilroy/ via --tmux flag. Add exit code
detection via tmux #{pane_dead_status}. Three integration test scenarios
verified against real Claude via tmux:

1. Simple agent task: Claude creates a file, tool node verifies it exists
   → status: success, KILROY_TMUX_TEST_PASS confirmed

2. Multi-node pipeline: Claude writes calc.sh, tool node executes it,
   conditional routes on result → 42 computed, routed to success exit

3. Failure routing: Agent succeeds, tool node intentionally fails (cat
   nonexistent file), conditional routes to fail exit → correct routing

All three runs recorded in RunDB with timing. TmuxAgentHandler correctly:
- Spawns Claude in tmux sessions on isolated socket
- Passes prompt and environment variables
- Captures output and exit codes
- Detects failures via non-zero exit codes
- Cleans up sessions and process trees

Also adds 3 unit tests with fake agent scripts proving handler contract:
success path, failure detection (exit code 1), and workdir file creation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add event envelope canonicalization task (3.6), workflow.toml concept for
packages, run retro idea, and testing emphasis notes from Phase 2 review.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Defines the GitOps interface that encapsulates all git operations the
engine needs. When nil, the engine will operate in plain-directory mode.
Added GitOps field to RunOptions and Engine struct.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GitHook implements the engine.GitOps interface, wrapping gitutil
functions for worktree isolation, per-node commits, and branch
management. This is the Layer 2 implementation that will replace
direct gitutil calls in the engine.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The engine package now has zero direct gitutil imports. All git
operations go through the GitOps interface, which is optional.
When GitOps is nil, the engine operates in plain-directory mode:
no worktrees, no commits, no branch management.

Key changes:
- engine.run() conditionally sets up git workspace via GitOps
- checkpoint() only commits when GitOps is set
- parallel_handlers use GitOps for branch workspace isolation,
  falling back to temp directory copy for no-git mode
- resume uses GitOps for worktree recreation
- config_defaults accepts GitOps parameter
- cmd/kilroy/ creates GitHook and wires it through

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GitOps is now auto-detected: if the workspace (or cwd when no
workspace is specified) is a git repo, git worktrees and commits
are enabled. Otherwise, runs proceed in plain-directory mode.

DefaultRunConfig now accepts an explicit repoPath parameter so
--workspace correctly routes to the git repo.

Tested against real binary:
- Graph in git repo: worktree + commits created
- Graph in plain dir: runs successfully without git
- Graph with --workspace to git repo: worktree + commits created

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a pluggable AutoDetectGitOps factory function that eng.run() and
bootstrapRunWithConfig call when GitOps is not explicitly set. This
preserves backward compatibility: existing callers that set RepoPath
to a git repo automatically get git worktree behavior.

Fixes:
- Branch engine now inherits GitOps from parent (parallel commits work)
- TestRun_FailsWhenNotAGitRepo renamed to TestRun_SucceedsInNonGitDir
  (non-git dirs are now valid — the intended Phase 3.1 behavior)
- cmd/kilroy/ registers AutoDetectGitOps at init time
- Test TestMain registers testGitOps auto-detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mattleaverton and others added 29 commits April 7, 2026 14:35
Serves the canonical run.log with query params: ?node=, ?source=,
?event=, ?since=, ?tail= for filtered reads, and ?stream=true for
live SSE tailing via polling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each template gets a LogLocator that finds the CLI tool's conversation
log, and a parser that extracts tool_call, tool_result, text, and
thinking events. The tmux handler emits parsed events to RunLog after
agent completion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Emits worktree.created when the run worktree is set up, and commit
events with diff stats when recordNodeDiff finds file changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move rundbRecordProviderIfAgent to after executeWithRetry so provider/model
attrs are populated. Pass llm_model from node attributes through to CLI tool
--model flags. Record agent_tool as the backend in provider_selections.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Exercises input/output contracts, .kilroy/ convention files, agent_tool
routing across claude/codex/opencode, edge conditions, and run log events.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Claude: add --bare flag to skip keychain/OAuth, rely purely on
ANTHROPIC_API_KEY env var. Removes need for startup dialog handling.

Codex: use exec subcommand with --full-auto --skip-git-repo-check.
Write isolated auth.json under CODEX_HOME per session so codex uses
the API key without touching ~/.codex/.

OpenCode: add --format json --pure --dir flags. Inject provider config
via OPENCODE_CONFIG_CONTENT env var for keyless config isolation.

Add PrepareSession hook to Template for per-tool filesystem setup
before tmux session creation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The CLI preflight probe uses the old subprocess invocation path which
doesn't match tmux template auth isolation. Skip it when the caller
knows the tools are configured.

Also fix codex node to use o3-mini model (can't probe claude model
on the openai provider).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Claude: normalize dots to dashes in model ID (claude-sonnet-4.6 →
claude-sonnet-4-6) since Claude CLI uses dash format.

Codex: auth.json auth_mode must be lowercase "apikey" not "ApiKey".

OpenCode: model format is provider/model (anthropic/claude-sonnet-4-6),
add prefix and normalize dots.

Also fix SkipPreflight not propagating through RunOptions override copy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
o3-mini doesn't support codex's web_search_preview tool.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
--full-auto implied web_search_preview which most models reject.
Use --sandbox workspace-write directly. Switch to gpt-5.4-nano
which supports codex's tool set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each CLI template now produces structured output (stream-json for
claude, --json for codex, --format json for opencode). The handler
redirects stdout to {stageDir}/agent_output.jsonl and parses it
directly — no more hunting through tool-specific log directories.

LogLocator remains as fallback for non-structured-output modes.
Response text is extracted from the JSONL for response.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex defaults web_search="cached" which sends web_search_preview
tool on every request. Most small models reject it. Disable it
since Kilroy agents don't need web search.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parsers were written speculatively. Now matched to real output:
- Codex: item.completed/started with agent_message, command_execution
- OpenCode: tool_use with nested part.state, text events

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix .git detection to use -e (file or dir) for worktree support
- Convert prompt_file to inline prompts with KILROY_STAGE_STATUS_PATH
- Add output contract declarations on agent nodes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add TailJSONL that watches agent_output.jsonl and emits events to
RunLog as lines appear. Refactor parsers to expose per-line functions
(ParseClaudeLine, ParseCodexLine, ParseOpenCodeLine) used by both
the tailer and batch parsing.

The tailer starts when the tmux session is created and stops when the
agent exits. Events flow through RunLog to the SSE endpoint in real
time, so the UI sees agent tool calls as they happen rather than in
a batch after completion.

Falls back to batch parsing when structured output isn't available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When structured output is redirected to a file, the tmux pane is
empty and the stall watchdog sees no progress. The real-time tailer
now calls TickStallWatchdog on each parsed event, keeping the
watchdog alive during agent execution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When eng.run() returns an error (stall watchdog, context cancellation,
etc.), the run stayed as "running" in the DB forever. Now
RunWithConfig records a fail status before returning the error.

The existing ReconcileStaleRuns on server startup handles panics
and unclean exits as a safety net.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cel fix

- SQLite: add busy_timeout(5000), synchronous(normal), SetMaxOpenConns(1)
  so concurrent detached runs don't silently fail to register in the DB
- Detach: forward --input, --workspace, --package, --tmux, --skip-preflight,
  --label flags to child process; resolve relative paths to absolute
- Invocation capture: record os.Args and run config in manifest.json and
  runs DB (new migration 004); expose in API response
- Prefix ID matching: GET /runs/{short-id} resolves to full run ID via
  DB prefix query and in-memory registry scan
- Cancel: fall back to PID-based SIGTERM for CLI-launched detached runs
  that aren't in the server's in-memory pipeline registry
- AGENTS.md: document backend:cli vs api, --tmux flag, correct run config

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bundles the single-file Kilroy dashboard SPA (index.html + Graphviz
WASM worker) into the binary via //go:embed so `kilroy attractor serve`
exposes a working dashboard at http://localhost:9700/ui/ with no extra
processes or CORS configuration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Artifact capture: new node_execution_artifacts table (migration 005).
  At rundbRecordNodeComplete, ingest stage files (prompt, response,
  agent_output.jsonl, events.ndjson, status, stdout/stderr, tool_timing,
  etc.) as blobs keyed by node_execution_id. Each retry and each loop
  iteration gets its own DB row + captured artifacts, fixing retry
  history loss and enabling loop iteration history.
- handleGetNodeTurns now serves from DB first with filesystem fallback
  for legacy runs. Response includes source="db"|"filesystem" so the UI
  can tell.
- Loop primitive: new trapezium (loop.begin) and invtrapezium (loop.end)
  node shapes for multi-node loops, plus loop_count/loop_until_file/
  loop_until_file_contains/loop_max attributes on any node for
  single-node loops. Termination evaluated after each iteration;
  loop_max exceeded fails the run. Separate from existing loop_restart
  which only handles transient_infra failure restarts.
- Loop iterations tracked in Engine.loopIterations so each iteration
  gets a distinct attempt number in node_executions (currently on the
  loop-back target; body-node attempt numbering is a UX follow-up).
- Label filtering wired end to end: GET /runs?label=KEY=VALUE&limit=N
  and kilroy attractor runs list --label --status --graph --limit.
  Underlying DB filter already existed; just surfaced to API and CLI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- activeLoopIteration tracks current iteration across an entire loop body
  so every node execution inside a multi-node loop records a distinct
  attempt number (previously only the jump target got incremented, body
  nodes all recorded attempt=1).
- captureReferencedScripts reads tool_invocation.json, tokenizes argv and
  command fields (handling bash -c "sh script.sh" pattern), and captures
  referenced script files as tool_script:<name> artifacts. The UI shows
  them alongside stdout/stderr in the Detail tab.
- New endpoint GET /runs/{id}/nodes/{nodeId}/attempts returns all attempt
  rows for a node. GET /runs/{id}/nodes/{nodeId}/turns now accepts
  ?attempt=N to load a specific iteration's captured artifacts.
- UI: sidebar shows iteration badges (↻1/5, ↻2/5, ...) when a node has
  multiple attempts. Detail tab shows "Iteration N of M" banner and
  passes n.attempt to the turns fetch so each iteration loads its own
  data. Command and captured scripts render in the Detail view.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New process-flow-level primitive for running independent node chains in
parallel in the shared workspace. Distinct from the existing parallel
handler (shape=component) which is worktree-isolated and winner-takes-all
for LLM code-gen branching.

- Shapes: pentagon → concurrent.split, cylinder → concurrent.join.
  Paired via concurrent_id attribute (defaults to the split node's ID).
- runConcurrentRegion dispatches each outgoing edge from the split as a
  goroutine running runBranchUntilJoin. All branches share the engine's
  context, DB writer, git worktree, and progress sink. Each node executes
  through the same rundbRecordNodeStart/executeWithRetry/CompleteNode/
  CaptureArtifacts sequence as the main loop.
- Fail-fast: first branch error cancels the parent context, siblings exit
  at their next cancellation checkpoint. Optional allow_partial=true
  attribute on the split disables fail-fast.
- Git commits: suppressed for non-sentinel nodes while concurrentDepth > 0.
  Concurrent region is treated as one atomic checkpoint unit.
- Rejects nested concurrent regions and loops inside concurrent regions
  as runtime errors. Graph validation rule can be added later.

Known follow-up: subprocess cancellation does not kill running child
processes (sleep in a cancelled branch runs to completion) — branch
goroutines see the cancelled context but the tool handler's exec doesn't
propagate the kill. Separate lifecycle concern, not specific to the
concurrent primitive.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rrent/loop

Subprocess cancellation:
- ToolHandler now runs commands in their own process group via
  setProcessGroupAttr (Setpgid=true) and sets cmd.Cancel to
  forceKillProcessGroup so context cancellation kills the entire process
  tree, not just the shell. Before this fix, a cancelled `bash -c "sleep 20"`
  left sleep as an orphan with the stdout pipe open, and cmd.Wait() blocked
  for the full 20s. Verified with a fail-fast concurrent test: total run
  time dropped from 20.5s to 1.1s.

Graph validation:
- lintConcurrentSplitMinBranches: concurrent_split requires ≥2 outgoing edges
- lintConcurrentSplitHasJoin: concurrent_split must have a paired concurrent_join
- lintNoNestedConcurrentRegions: concurrent regions cannot be nested
- lintNoLoopsInConcurrentRegions: loops cannot be nested in concurrent regions
- Pairs are matched by concurrent_id attribute (falling back to node ID)
- nodesBetween walks the graph from the split forward to the join to build
  the "inside the region" set

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a minimal two-node workflow (stage + agent) for kicking off one-shot
investigation runs with --input '{"prompt":...,"context_file":...}'. Three
graph variants route to claude, codex, or gemini via the existing
agent_tool/model_stylesheet mechanism.

Adds `kilroy attractor runs show <id-or-prefix>` with --json, --outputs, and
--print <file> modes so an agentic caller can pull result.md (or any declared
output) back out without poking at the logs directory by hand. runs list
--json now carries worktree_dir, repo_path, run_branch, and logs_root too.
New quick-launch skill gives agents the exact invocation for firing a
one-shot delegated run: --detach --tmux + --package + --label + --input
and the follow-up runs list / runs show / runs show --print flow for
checking status and pulling result.md back out. Structured after the
trycycle subskill style: action-oriented steps, no theory.

using-kilroy was missing several current flags (--package, --tmux,
--label, --input, --workspace, --skip-cli-headless-warning) and had no
coverage of the runs subcommand family, so those gaps are filled in
alongside a pointer to quick-launch for the one-shot case.
Adds skills/quick-launch/commands/kilroy-quick.md as the canonical slash
command file, symlinked into ~/.claude/commands/ and ~/.codex/commands/
at install time. One source of truth, live-editable from the repo.

Updates SKILL.md to reference ~/.local/share/kilroy/workflows/quick-launch
(installed as a symlink) instead of an <ABS_PATH> placeholder, and drops
the --config requirement — kilroy auto-builds a default run config when
cwd is a git repo and auto-detects installed provider CLIs. Verified with
a bare git init + config-less launch.
Driven by feedback from testing /kilroy-quick in Claude. Five changes:

1. --prompt-file <path>: read a file verbatim into the "prompt" input
   key. Replaces hand-escaped multi-line JSON in --input. Strongly
   preferred for anything beyond a one-liner — no \n escapes, no
   quoting hazards.

2. Auto --no-cxdb when --config is absent. The zero-config default run
   config doesn't populate cxdb addresses, so requiring cxdb was just
   noise. Explicit --config with cxdb.binary_addr still enables it.

3. Auto-skip the interactive CLI-backend warning when stdin isn't a
   terminal. Uses mattn/go-isatty because a naive Mode&CharDevice check
   treats /dev/null as a TTY. Agent-driven invocations, CI, pipes, and
   the detach child all hit this path.

4. runs show --latest --label k=v and new runs wait subcommand. show
   returns the most recent matching run; wait polls the run DB until
   the target reaches a terminal state and exits 0/1/2 for
   success/fail/timeout. Both support the same id-or-prefix-or-latest
   target resolution.

5. launchDetached was starting the child with cmd.Dir=logs_root, so
   the detach child's cwd was the logs dir instead of the user's
   workspace — runs reported repo_path pointing at the logs dir and
   worktrees never saw the real files. Parent now forwards its own cwd
   to the child via --workspace when none was passed explicitly.

Quick-launch workflow package simplified to a single agent node. The
previous stage.sh wrote .kilroy/TASK.md, but the engine rewrites that
file before every node; contents got clobbered. Inputs now land in
.kilroy/INPUT.md (written once at run start) and the agent reads from
there directly.

scripts/install-skills.sh wires everything up idempotently: symlinks
for the binary, the workflows dir, and the skills/commands into
~/.claude, ~/.agents (codex's native discovery path — not
~/.codex/skills), and ~/.config/opencode.

Also rebuilds the SKILL.md to document --prompt-file as the default
path for non-trivial tasks, drops the --no-cxdb / --skip-cli-headless-warning
mentions, and points to runs wait / runs show --latest for the
check-status / retrieve-result flow.
Kilroy's isolated codex home used to copy both auth.json and config.toml
from the user's real ~/.codex/ into the kilroy-owned codex state dir. That
leaked user-scoped settings (model_reasoning_effort, personality, model)
into kilroy runs, so a setting that worked for the user's interactive
codex sessions could silently break kilroy runs for specific models —
notably `gpt-5-codex` rejecting the inherited `model_reasoning_effort =
"xhigh"` upstream with a 400 during preflight probes.

Two fixes here:

1. Drop the config.toml copy entirely. Run configuration must come from
   kilroy and the .dot graph, not by accident from whatever the user has
   in ~/.codex/config.toml. If kilroy codex runs need specific settings,
   those belong in the graph or run.yaml.

2. When OPENAI_API_KEY is available in the parent env, write a fresh
   apikey auth.json into the isolated codex home instead of copying
   whatever auth.json the user has. This matches what tmux_handler.go +
   templates/codex.go already does for non-probe runs, so the probe
   stops diverging from the real run: both paths now force apikey mode
   when a key is present.

When no OPENAI_API_KEY is set, kilroy still falls back to copying the
user's auth.json (subscription auth). Probes under that path can't
exercise apikey-only models like gpt-5-codex, but the rest of preflight
still runs against something plausible.

Tests updated: the old assertions on config.toml existence are replaced
with explicit "must-not-exist" checks, and a new test covers the apikey
auth.json write path. Verified end-to-end with a codex graph.codex.dot
quick-launch run (39s, result.md correctly produced).
@mattleaverton mattleaverton merged commit d8c61c0 into danshapiro:main Apr 17, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant