Platform reframe: layered engine, tmux agents, workflow packages by mattleaverton · Pull Request #81 · danshapiro/kilroy

mattleaverton · 2026-04-16T17:40:22Z

Summary

This PR reframes Kilroy as a layered platform: a core graph-execution engine (L0), an agent-execution layer that drives CLI tools via tmux (L1), and workflow packages that ship as self-contained directories with graphs, scripts, and manifests (L2). It adds the primitives and server infrastructure needed to run multi-agent workflows reliably from a headless process: explicit loop and concurrent split/join nodes, a SQLite run database, a per-run structured activity log, a REST API with SSE streaming, and an embedded dashboard UI. Three ready-to-use workflow packages ship with the branch: quick-launch, pr-review, and build-test.

This unblocks building agentic automation that can be launched from other agents (the quick-launch package is already used that way), observed in real time via the log API, and extended by adding new workflow packages without touching engine code.

What's new

Layered architecture (L0/L1/L2)

Engine package (internal/attractor/engine/) is now L0-only: graph execution, node dispatch, lifecycle hooks. No agent or git specifics inside.
L1 (internal/attractor/agents/) provides TmuxAgentHandler, which is registered as the agent handler at startup.
L2 (internal/attractor/workflows/) provides GitHook, HumanGateHandler, and the package loader — registered at startup in cmd/kilroy/main.go via newLayeredRegistry().
GitOps interface in the engine means git mode is fully optional; the engine runs in plain-directory mode when GitOps is nil, enabling non-git workflows.
codergen has been renamed to agent throughout (types, files, test names).

Workflow packages and CLI flags

--package <dir>: loads a self-contained workflow from a directory containing graph.dot, optional workflow.toml, and scripts/ / prompts/ subdirectories. Package scripts and prompts are materialized into .kilroy/package/ inside the workspace before execution.
--workspace <dir>: sets the execution directory independently of the source repo.
--input <path-or-json>: passes structured inputs as a JSON/YAML file or inline JSON. Values are injected as KILROY_INPUT_<KEY> env vars and written to .kilroy/INPUT.md.
--prompt-file <path>: stages a file as the node prompt.
--label KEY=VALUE: attaches metadata to a run; queryable via runs list and runs show.
--tmux: enables the tmux agent handler for the run.
runs show, runs wait: wait on and inspect a run by ID or prefix; --latest with --label filters to pick the most recent matching run.

Engine primitives

Loop (internal/attractor/engine/loop.go): explicit single-node or multi-node iteration. Termination conditions: loop_count, loop_max, loop_until_file, loop_until_file_contains, loop_while_outcome=fail. Each iteration gets its own attempt number and DB row for observability. Nested loops are rejected at validation.
Concurrent split/join (internal/attractor/engine/concurrent.go): runs independent node chains concurrently in the same workspace, waits for all branches. Configured via concurrent_id node attribute; allow_partial=true permits branch failures. Nested concurrent scopes are rejected.
Output contracts: nodes declare expected output files via outputs= attribute; the engine enforces their existence after the node completes and writes .kilroy/FEEDBACK.md on failure.
.kilroy/ file conventions (internal/attractor/engine/kilroy_files.go): INPUT.md, CONTEXT.md, TASK.md, FEEDBACK.md written by the engine as standard inter-node data files. KILROY_STAGE_STATUS_PATH / KILROY_STAGE_STATUS_FALLBACK_PATH env vars let scripts signal completion status.

Agent execution via tmux

TmuxAgentHandler (internal/attractor/agents/tmux_handler.go) spawns each agent node in a named tmux session on the kilroy socket. Sessions persist after completion for inspection; pane output is tailed in real time.
Session manager (internal/attractor/agents/tmux/) handles create, destroy, wait, and health checks. Process group kill (SIGTERM then SIGKILL) on context cancel — no orphan processes.
Invocation templates (internal/attractor/agents/templates/) define per-tool startup, env, and structured-output behavior for claude, codex, opencode, and gemini. Adding a new CLI tool requires only a new template, no handler code.
Auth isolation: codex sessions write a per-run auth.json into an isolated CODEX_HOME; the user's ~/.codex/config.toml is explicitly excluded so user-local settings (model, reasoning effort) cannot bleed into kilroy runs.
Conversation log parsers (internal/attractor/agents/agentlog/): structured AgentEvent extraction from Claude JSONL, Codex JSONL, and OpenCode JSONL. A live tailer reads the log during execution and emits events to the RunLog.

RunLog and observability

RunLog (internal/attractor/engine/runlog.go): newline-delimited JSON activity log written to {run_logs_root}/run.log. Every lifecycle event (run start/end, node start/end, edge decision, checkpoint, git activity, tool stdout/stderr line, agent conversation event) is emitted as a timestamped RunLogEvent.
GET /runs/{id}/log: HTTP endpoint that serves the run.log with optional query filters (?node=, ?source=, ?event=, ?since=, ?tail=N) and SSE streaming (?stream=true) for live tail without polling.
Git activity events (worktree.created, commit) appear in the RunLog so the full timeline is in one place.

Server and REST API

internal/server/ now covers a full run-management API:
- POST /runs: accepts a workflow package, inputs, workspace, and labels to start a run.
- GET /runs, GET /runs/{id}: run list and detail from the RunDB (no filesystem-only fallback for completed runs).
- GET /runs/{id}/log: RunLog with SSE.
- GET /runs/{id}/nodes/{nodeId}/diff: per-node git diff.
- GET /runs/{id}/files/{path...}, GET /runs/{id}/workspace/{path...}: file browser.
- GET /workflows: lists available packages.
- /pipelines/... kept as a backward-compatibility alias.
Embedded dashboard UI at /ui/ (internal/server/ui/index.html, viz.js, viz-render.js) — compiled into the binary via //go:embed. No separate asset server needed.
Prefix ID resolution: runs show 01KPB resolves to the unique matching run, same as the CLI.
SQLite WAL mode and 5-second busy timeout (internal/attractor/rundb/rundb.go) for safe concurrent server writes.

Workflows shipped

workflows/quick-launch/: single-agent task runner. Accepts prompt, optional context_file or context. Three graph variants: graph.dot (Claude), graph.codex.dot (Codex), graph.gemini.dot (Gemini). Ships with a kilroy slash-command skill and install script.
workflows/pr-review/: full PR review pipeline — checkout, build/test, per-file code review (Claude), holistic review, combined report. Accepts pr_repo and pr_number inputs; emits review-report.md.
workflows/build-test/: build-and-test workflow for CI-style validation.
workflows/multi-tool-exercise/: multi-agent graph used to validate concurrent tool execution and observability.

Reliability fixes

Subprocess group kill on cancel: tool handler spawns each subprocess in its own process group (Setpgid: true) and sends SIGTERM to the group on context cancellation, falling back to SIGKILL if needed (internal/attractor/engine/process_group_unix.go). Prevents orphaned child processes on run cancel.
Validation rejects nested concurrent and loop scopes at graph load time.
Stall watchdog now resets from the agent log tailer's last-seen event, not just stdout — prevents false stall detection during long LLM reasoning pauses.
Run failure is recorded in the RunDB on engine error return (previously only terminal nodes wrote their own state).
Server reconciles in-flight runs as stale on startup to prevent permanently pending entries after a crash.

Breaking changes

codergen is renamed to agent throughout internal types and test file names. Any code outside this repo that imports internal/attractor/engine types named Codergen* will need to be updated.
--graph remains the primary flag; --package is additive. No existing graph-based invocations change.
The /pipelines/ server API is retained as an alias but the canonical path is /runs/.
internal/attractor/workflows/ now contains HumanGateHandler (moved out of the engine package). Callers that registered a wait.human handler directly into the engine registry will need to use the L2 package instead, or register workflows.NewHumanGateHandler() themselves.

Known gaps and follow-ups

The supervisor prototype (internal/attractor/workflows/supervisor.go) is present but not wired to any endpoint or scheduler yet.
Windows process-group kill is a no-op stub (process_group_windows.go); subprocess cleanup on cancel is not guaranteed on Windows.
The dashboard UI is functional for run inspection but does not yet support launching runs or answering human-gate questions from the browser.
The CXDB integration boundary is documented (engine/cxdb_hook.go, engine/document CXDB integration boundary) but CXDB remains embedded in the engine; extraction to a separate service is a planned follow-up.
Gemini support in quick-launch (graph.gemini.dot) exists but the Gemini template is minimal and untested end-to-end.

Test plan

go build ./... compiles cleanly.
go test ./... passes (unit and integration tests; integration tests require external services to be available or will skip).
kilroy attractor run --graph demo/tmux-agent-test/graph.dot --tmux completes without error; check that the tmux session is cleaned up.
kilroy attractor run --package workflows/quick-launch --input '{"prompt":"say hello"}' --workspace /tmp/ql-test --tmux produces a result.md in the workspace.
kilroy attractor runs list shows the run; kilroy attractor runs show --latest prints its detail.
kilroy attractor serve starts; visit http://localhost:8080/ui/ and confirm the dashboard loads and lists the completed runs.
GET http://localhost:8080/runs/<id>/log returns the NDJSON log; add ?stream=true and confirm it stays open while a run is in progress.
Run a graph with a loop_count=3 node and verify three distinct attempt rows appear in runs show --json output.
Cancel a running job mid-execution and confirm no orphan tmux sessions or subprocesses remain (tmux ls -L kilroy should be empty).

…racts From first real PR review workflow run: - Phase 0.9: config/auto-detect conflict, require_clean default, missing env vars in tool nodes, CLI headless warning, error message hints, worktree file-not-found context - Phase 3.7-3.9: run input contract (--input), output contract, node data passing conventions - Future work: iteration patterns (dynamic loops over collections) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Setup/build/test tool nodes + single review agent node. Experimental workflow for automated PR triage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Setup script now copies build-test.sh to .ai/ before gh pr checkout changes the branch and removes workflow files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Raw git checkout (no gh pr checkout) with unique branch names for parallel safety - Separate investigate (exploratory, full tool access) and decide (directive, no tools) agents - Tighter output contract: next actions instead of follow-up tasks - Setup script preserves all workflow scripts to .ai/ before branch switch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Absolute path for setup script (works from any repo's worktree) - Remove Go-specific assumptions from investigate prompt - Agent discovers build system and runs appropriate checks - Add freshell run config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…platform Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Create agents/ (L1) and workflows/ (L2) package directories. Split NewDefaultRegistry into NewCoreRegistry (L0-only) + NewDefaultRegistry (backward compat). Export engine types and methods needed by external handler packages: StatusSource, FallbackStatusPath, StageStatusContract, Truncate, WarnEngine, BuildFidelityPreamble, ClassifyAPIError, etc. Add Engine accessor methods: AppendProgress, CXDBPrompt, CXDBInterviewStarted/Completed/Timeout, LastResolvedFidelity, SetDefault. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add Registry field to RunOptions so cmd/kilroy/ can pass a pre-composed handler registry. cmd/kilroy/ now creates a layered registry: L0: engine.NewCoreRegistry() (start, exit, conditional, tool, parallel, fan_in) L1: agents.AgentHandler (codergen/default) L2: workflows.HumanGateHandler, workflows.ManagerLoopHandler Type aliases in agents/ and workflows/ establish the package structure and import direction. Implementations remain in engine/ until Phases 2-3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

TestCoreRegistry_ToolOnlyGraph demonstrates that a graph using only tool_command nodes executes successfully with NewCoreRegistry (no L1 agent handler or L2 workflow handlers registered). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rename Go types: CodergenBackend→AgentBackend, CodergenRouter→AgentRouter, SimulatedCodergenBackend→SimulatedAgentBackend, etc. Rename handler type string: "codergen"→"agent" in registry. Rename DOT attribute: agent_mode replaces codergen_mode (with fallback). Rename files: codergen_*.go → agent_*.go. Update comments, diagnostics, test names. The CodergenHandler type name is retained in engine/ as the implementation type (aliased as agents.AgentHandler). It will be renamed when the implementation moves to agents/ in Phase 2.1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New package internal/attractor/rundb backed by modernc.org/sqlite (pure Go, no CGO). Global DB at ~/.local/state/kilroy/runs.db. Features: - Auto-applying numbered SQL migrations on DB open - WAL mode for concurrent reads - Schema: runs (with labels, inputs, timing), node_executions (with attempts), edge_decisions, provider_selections - Write ops: InsertRun, CompleteRun, InsertNodeStart, CompleteNode, InsertEdgeDecision, InsertProviderSelection - Read ops: GetRun, LatestRun, ListRuns (filter by status/labels/graph), GetNodeExecutions - Prune ops: PruneRuns (by date, graph, labels, or orphaned logs_root) - Cascade deletes: child records deleted with parent run - 16 tests covering all operations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add RunDBWriter interface in engine/ (no rundb import needed). Engine records to RunDB at every lifecycle point: - run start (after worktree ready) - node start/complete (each execution) - edge selection decisions - run complete (success/failure) All RunDB calls are best-effort (warn on error, never block). cmd/kilroy/ opens global DB and passes via RunOptions.RunDB. Integration test proves tool graph produces correct DB entries. Also fix RunOptions propagation: RunDB, Registry, Labels now properly forwarded through bootstrapRunWithConfig overrides. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

runs list: queries RunDB first, falls back to filesystem scan. Now shows duration column. runs prune: delegates to RunDB with filter support (before, graph, labels, orphans). status --latest: instant lookup via RunDB.LatestRun() instead of filesystem scan. All commands gracefully fall back to filesystem-based behavior when the RunDB is unavailable or empty. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Structured inputs for graph runs via --input flag. Features: - LoadInputFile (YAML/JSON) and LoadInputString (inline JSON) - Graph declares required inputs via inputs="key1,key2" attribute - ValidateRequiredInputs rejects runs with missing declared inputs - Input values injected into context as input.* keys - Input values expanded in prompts as $input.key placeholders - Input values available as KILROY_INPUT_* env vars in tool_command nodes - Inputs recorded in RunDB Integration test proves tool_command nodes see KILROY_INPUT_* env vars. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Separates graph file location from execution location: - --workspace /path/to/dir sets the execution directory - --graph /path/to/graph.dot determines where prompt_file resolves - When --workspace is omitted, defaults to current behavior - GraphDir derived from --graph path for prompt_file resolution - Workspace flows through RunOptions → engine → worktree creation PrepareOptions gains GraphDir field that takes precedence over RepoPath for prompt_file resolution, enabling cross-repo workflows where graphs and scripts live separately from the target project. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Graphs declare expected output artifacts via outputs="file1,file2" attribute. After run completion, the engine: - Searches for declared outputs in the worktree - Copies found outputs to {logs_root}/outputs/ - Writes outputs.json manifest with found/missing status and file sizes - Emits warnings for missing declared outputs (not errors) - Records output collection in progress events Hooked into persistTerminalOutcome so outputs are collected on every run completion (success or failure). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add canonical /runs endpoints per platform-reframe plan: - GET /runs: list runs from RunDB (with status/graph filters) - GET /runs/{id}/outputs: list collected output artifacts Existing /pipelines endpoints retained as backward-compat aliases. All /runs endpoints mirror /pipelines for submit, status, events, cancel, context, and questions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Run lifecycle management additions: - --label KEY=VALUE flag on attractor run (stored in RunDB) - --older-than 7d duration-based prune filter (supports d/h/m units) - Labels passed through to RunDB via RunOptions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New packages for Layer 1 agent capabilities: agents/tmux/ — tmux session management: - Session creation with two-step pattern (shell → respawn-pane) - Input delivery with sanitization, chunking, and Enter verification - Output capture with NBSP normalization for prompt detection - Readiness/idle/exit detection via polling with busy indicators - Process tree cleanup (SIGTERM → grace → SIGKILL) - Socket isolation (kilroy-specific tmux socket) - Session environment variable storage for metadata - 11 integration tests against real tmux on isolated socket agents/templates/ — per-tool invocation templates: - Template struct with per-tool config (args, env, prompt prefix, busy indicators, startup dialogs, exit behavior) - Built-in templates: claude, codex, gemini, opencode - Template registry for name-based lookup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

TestSmoke_Claude_PrintMode spawns Claude Code via tmux in --print mode, waits for exit, and captures output. Verified working: Claude returns KILROY_SMOKE_OK, session exits with status 0. Tests skip gracefully when API keys or CLI tools are unavailable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

TmuxAgentHandler implements engine.Handler and orchestrates the full agent lifecycle via tmux sessions: 1. Resolve tool template from node attributes (agent_tool or llm_provider) 2. Build command and environment from template 3. Create tmux session on isolated socket 4. Handle startup dialogs (trust prompts, permission warnings) 5. Wait for completion (exit-based or idle-detection) 6. Capture output and build outcome 7. Clean up session and process tree Two agent handlers now available: - AgentHandler: existing subprocess/API backend (backward compat) - TmuxAgentHandler: tmux-based CLI sessions (new) Also adds SendKeys method to tmux.Manager for dialog interaction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Wire TmuxAgentHandler into cmd/kilroy/ via --tmux flag. Add exit code detection via tmux #{pane_dead_status}. Three integration test scenarios verified against real Claude via tmux: 1. Simple agent task: Claude creates a file, tool node verifies it exists → status: success, KILROY_TMUX_TEST_PASS confirmed 2. Multi-node pipeline: Claude writes calc.sh, tool node executes it, conditional routes on result → 42 computed, routed to success exit 3. Failure routing: Agent succeeds, tool node intentionally fails (cat nonexistent file), conditional routes to fail exit → correct routing All three runs recorded in RunDB with timing. TmuxAgentHandler correctly: - Spawns Claude in tmux sessions on isolated socket - Passes prompt and environment variables - Captures output and exit codes - Detects failures via non-zero exit codes - Cleans up sessions and process trees Also adds 3 unit tests with fake agent scripts proving handler contract: success path, failure detection (exit code 1), and workdir file creation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add event envelope canonicalization task (3.6), workflow.toml concept for packages, run retro idea, and testing emphasis notes from Phase 2 review. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Defines the GitOps interface that encapsulates all git operations the engine needs. When nil, the engine will operate in plain-directory mode. Added GitOps field to RunOptions and Engine struct. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

GitHook implements the engine.GitOps interface, wrapping gitutil functions for worktree isolation, per-node commits, and branch management. This is the Layer 2 implementation that will replace direct gitutil calls in the engine. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The engine package now has zero direct gitutil imports. All git operations go through the GitOps interface, which is optional. When GitOps is nil, the engine operates in plain-directory mode: no worktrees, no commits, no branch management. Key changes: - engine.run() conditionally sets up git workspace via GitOps - checkpoint() only commits when GitOps is set - parallel_handlers use GitOps for branch workspace isolation, falling back to temp directory copy for no-git mode - resume uses GitOps for worktree recreation - config_defaults accepts GitOps parameter - cmd/kilroy/ creates GitHook and wires it through Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

GitOps is now auto-detected: if the workspace (or cwd when no workspace is specified) is a git repo, git worktrees and commits are enabled. Otherwise, runs proceed in plain-directory mode. DefaultRunConfig now accepts an explicit repoPath parameter so --workspace correctly routes to the git repo. Tested against real binary: - Graph in git repo: worktree + commits created - Graph in plain dir: runs successfully without git - Graph with --workspace to git repo: worktree + commits created Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds a pluggable AutoDetectGitOps factory function that eng.run() and bootstrapRunWithConfig call when GitOps is not explicitly set. This preserves backward compatibility: existing callers that set RepoPath to a git repo automatically get git worktree behavior. Fixes: - Branch engine now inherits GitOps from parent (parallel commits work) - TestRun_FailsWhenNotAGitRepo renamed to TestRun_SucceedsInNonGitDir (non-git dirs are now valid — the intended Phase 3.1 behavior) - cmd/kilroy/ registers AutoDetectGitOps at init time - Test TestMain registers testGitOps auto-detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Serves the canonical run.log with query params: ?node=, ?source=, ?event=, ?since=, ?tail= for filtered reads, and ?stream=true for live SSE tailing via polling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Each template gets a LogLocator that finds the CLI tool's conversation log, and a parser that extracts tool_call, tool_result, text, and thinking events. The tmux handler emits parsed events to RunLog after agent completion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Emits worktree.created when the run worktree is set up, and commit events with diff stats when recordNodeDiff finds file changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Move rundbRecordProviderIfAgent to after executeWithRetry so provider/model attrs are populated. Pass llm_model from node attributes through to CLI tool --model flags. Record agent_tool as the backend in provider_selections. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Exercises input/output contracts, .kilroy/ convention files, agent_tool routing across claude/codex/opencode, edge conditions, and run log events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Claude: add --bare flag to skip keychain/OAuth, rely purely on ANTHROPIC_API_KEY env var. Removes need for startup dialog handling. Codex: use exec subcommand with --full-auto --skip-git-repo-check. Write isolated auth.json under CODEX_HOME per session so codex uses the API key without touching ~/.codex/. OpenCode: add --format json --pure --dir flags. Inject provider config via OPENCODE_CONFIG_CONTENT env var for keyless config isolation. Add PrepareSession hook to Template for per-tool filesystem setup before tmux session creation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The CLI preflight probe uses the old subprocess invocation path which doesn't match tmux template auth isolation. Skip it when the caller knows the tools are configured. Also fix codex node to use o3-mini model (can't probe claude model on the openai provider). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Claude: normalize dots to dashes in model ID (claude-sonnet-4.6 → claude-sonnet-4-6) since Claude CLI uses dash format. Codex: auth.json auth_mode must be lowercase "apikey" not "ApiKey". OpenCode: model format is provider/model (anthropic/claude-sonnet-4-6), add prefix and normalize dots. Also fix SkipPreflight not propagating through RunOptions override copy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

o3-mini doesn't support codex's web_search_preview tool. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

--full-auto implied web_search_preview which most models reject. Use --sandbox workspace-write directly. Switch to gpt-5.4-nano which supports codex's tool set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Each CLI template now produces structured output (stream-json for claude, --json for codex, --format json for opencode). The handler redirects stdout to {stageDir}/agent_output.jsonl and parses it directly — no more hunting through tool-specific log directories. LogLocator remains as fallback for non-structured-output modes. Response text is extracted from the JSONL for response.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Codex defaults web_search="cached" which sends web_search_preview tool on every request. Most small models reject it. Disable it since Kilroy agents don't need web search. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Parsers were written speculatively. Now matched to real output: - Codex: item.completed/started with agent_message, command_execution - OpenCode: tool_use with nested part.state, text events Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix .git detection to use -e (file or dir) for worktree support - Convert prompt_file to inline prompts with KILROY_STAGE_STATUS_PATH - Add output contract declarations on agent nodes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add TailJSONL that watches agent_output.jsonl and emits events to RunLog as lines appear. Refactor parsers to expose per-line functions (ParseClaudeLine, ParseCodexLine, ParseOpenCodeLine) used by both the tailer and batch parsing. The tailer starts when the tmux session is created and stops when the agent exits. Events flow through RunLog to the SSE endpoint in real time, so the UI sees agent tool calls as they happen rather than in a batch after completion. Falls back to batch parsing when structured output isn't available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When structured output is redirected to a file, the tmux pane is empty and the stall watchdog sees no progress. The real-time tailer now calls TickStallWatchdog on each parsed event, keeping the watchdog alive during agent execution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When eng.run() returns an error (stall watchdog, context cancellation, etc.), the run stayed as "running" in the DB forever. Now RunWithConfig records a fail status before returning the error. The existing ReconcileStaleRuns on server startup handles panics and unclean exits as a safety net. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…cel fix - SQLite: add busy_timeout(5000), synchronous(normal), SetMaxOpenConns(1) so concurrent detached runs don't silently fail to register in the DB - Detach: forward --input, --workspace, --package, --tmux, --skip-preflight, --label flags to child process; resolve relative paths to absolute - Invocation capture: record os.Args and run config in manifest.json and runs DB (new migration 004); expose in API response - Prefix ID matching: GET /runs/{short-id} resolves to full run ID via DB prefix query and in-memory registry scan - Cancel: fall back to PID-based SIGTERM for CLI-launched detached runs that aren't in the server's in-memory pipeline registry - AGENTS.md: document backend:cli vs api, --tmux flag, correct run config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Bundles the single-file Kilroy dashboard SPA (index.html + Graphviz WASM worker) into the binary via //go:embed so `kilroy attractor serve` exposes a working dashboard at http://localhost:9700/ui/ with no extra processes or CORS configuration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Artifact capture: new node_execution_artifacts table (migration 005). At rundbRecordNodeComplete, ingest stage files (prompt, response, agent_output.jsonl, events.ndjson, status, stdout/stderr, tool_timing, etc.) as blobs keyed by node_execution_id. Each retry and each loop iteration gets its own DB row + captured artifacts, fixing retry history loss and enabling loop iteration history. - handleGetNodeTurns now serves from DB first with filesystem fallback for legacy runs. Response includes source="db"|"filesystem" so the UI can tell. - Loop primitive: new trapezium (loop.begin) and invtrapezium (loop.end) node shapes for multi-node loops, plus loop_count/loop_until_file/ loop_until_file_contains/loop_max attributes on any node for single-node loops. Termination evaluated after each iteration; loop_max exceeded fails the run. Separate from existing loop_restart which only handles transient_infra failure restarts. - Loop iterations tracked in Engine.loopIterations so each iteration gets a distinct attempt number in node_executions (currently on the loop-back target; body-node attempt numbering is a UX follow-up). - Label filtering wired end to end: GET /runs?label=KEY=VALUE&limit=N and kilroy attractor runs list --label --status --graph --limit. Underlying DB filter already existed; just surfaced to API and CLI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- activeLoopIteration tracks current iteration across an entire loop body so every node execution inside a multi-node loop records a distinct attempt number (previously only the jump target got incremented, body nodes all recorded attempt=1). - captureReferencedScripts reads tool_invocation.json, tokenizes argv and command fields (handling bash -c "sh script.sh" pattern), and captures referenced script files as tool_script:<name> artifacts. The UI shows them alongside stdout/stderr in the Detail tab. - New endpoint GET /runs/{id}/nodes/{nodeId}/attempts returns all attempt rows for a node. GET /runs/{id}/nodes/{nodeId}/turns now accepts ?attempt=N to load a specific iteration's captured artifacts. - UI: sidebar shows iteration badges (↻1/5, ↻2/5, ...) when a node has multiple attempts. Detail tab shows "Iteration N of M" banner and passes n.attempt to the turns fetch so each iteration loads its own data. Command and captured scripts render in the Detail view. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New process-flow-level primitive for running independent node chains in parallel in the shared workspace. Distinct from the existing parallel handler (shape=component) which is worktree-isolated and winner-takes-all for LLM code-gen branching. - Shapes: pentagon → concurrent.split, cylinder → concurrent.join. Paired via concurrent_id attribute (defaults to the split node's ID). - runConcurrentRegion dispatches each outgoing edge from the split as a goroutine running runBranchUntilJoin. All branches share the engine's context, DB writer, git worktree, and progress sink. Each node executes through the same rundbRecordNodeStart/executeWithRetry/CompleteNode/ CaptureArtifacts sequence as the main loop. - Fail-fast: first branch error cancels the parent context, siblings exit at their next cancellation checkpoint. Optional allow_partial=true attribute on the split disables fail-fast. - Git commits: suppressed for non-sentinel nodes while concurrentDepth > 0. Concurrent region is treated as one atomic checkpoint unit. - Rejects nested concurrent regions and loops inside concurrent regions as runtime errors. Graph validation rule can be added later. Known follow-up: subprocess cancellation does not kill running child processes (sleep in a cancelled branch runs to completion) — branch goroutines see the cancelled context but the tool handler's exec doesn't propagate the kill. Separate lifecycle concern, not specific to the concurrent primitive. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rrent/loop Subprocess cancellation: - ToolHandler now runs commands in their own process group via setProcessGroupAttr (Setpgid=true) and sets cmd.Cancel to forceKillProcessGroup so context cancellation kills the entire process tree, not just the shell. Before this fix, a cancelled `bash -c "sleep 20"` left sleep as an orphan with the stdout pipe open, and cmd.Wait() blocked for the full 20s. Verified with a fail-fast concurrent test: total run time dropped from 20.5s to 1.1s. Graph validation: - lintConcurrentSplitMinBranches: concurrent_split requires ≥2 outgoing edges - lintConcurrentSplitHasJoin: concurrent_split must have a paired concurrent_join - lintNoNestedConcurrentRegions: concurrent regions cannot be nested - lintNoLoopsInConcurrentRegions: loops cannot be nested in concurrent regions - Pairs are matched by concurrent_id attribute (falling back to node ID) - nodesBetween walks the graph from the split forward to the join to build the "inside the region" set Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds a minimal two-node workflow (stage + agent) for kicking off one-shot investigation runs with --input '{"prompt":...,"context_file":...}'. Three graph variants route to claude, codex, or gemini via the existing agent_tool/model_stylesheet mechanism. Adds `kilroy attractor runs show <id-or-prefix>` with --json, --outputs, and --print <file> modes so an agentic caller can pull result.md (or any declared output) back out without poking at the logs directory by hand. runs list --json now carries worktree_dir, repo_path, run_branch, and logs_root too.

New quick-launch skill gives agents the exact invocation for firing a one-shot delegated run: --detach --tmux + --package + --label + --input and the follow-up runs list / runs show / runs show --print flow for checking status and pulling result.md back out. Structured after the trycycle subskill style: action-oriented steps, no theory. using-kilroy was missing several current flags (--package, --tmux, --label, --input, --workspace, --skip-cli-headless-warning) and had no coverage of the runs subcommand family, so those gaps are filled in alongside a pointer to quick-launch for the one-shot case.

Adds skills/quick-launch/commands/kilroy-quick.md as the canonical slash command file, symlinked into ~/.claude/commands/ and ~/.codex/commands/ at install time. One source of truth, live-editable from the repo. Updates SKILL.md to reference ~/.local/share/kilroy/workflows/quick-launch (installed as a symlink) instead of an <ABS_PATH> placeholder, and drops the --config requirement — kilroy auto-builds a default run config when cwd is a git repo and auto-detects installed provider CLIs. Verified with a bare git init + config-less launch.

Driven by feedback from testing /kilroy-quick in Claude. Five changes: 1. --prompt-file <path>: read a file verbatim into the "prompt" input key. Replaces hand-escaped multi-line JSON in --input. Strongly preferred for anything beyond a one-liner — no \n escapes, no quoting hazards. 2. Auto --no-cxdb when --config is absent. The zero-config default run config doesn't populate cxdb addresses, so requiring cxdb was just noise. Explicit --config with cxdb.binary_addr still enables it. 3. Auto-skip the interactive CLI-backend warning when stdin isn't a terminal. Uses mattn/go-isatty because a naive Mode&CharDevice check treats /dev/null as a TTY. Agent-driven invocations, CI, pipes, and the detach child all hit this path. 4. runs show --latest --label k=v and new runs wait subcommand. show returns the most recent matching run; wait polls the run DB until the target reaches a terminal state and exits 0/1/2 for success/fail/timeout. Both support the same id-or-prefix-or-latest target resolution. 5. launchDetached was starting the child with cmd.Dir=logs_root, so the detach child's cwd was the logs dir instead of the user's workspace — runs reported repo_path pointing at the logs dir and worktrees never saw the real files. Parent now forwards its own cwd to the child via --workspace when none was passed explicitly. Quick-launch workflow package simplified to a single agent node. The previous stage.sh wrote .kilroy/TASK.md, but the engine rewrites that file before every node; contents got clobbered. Inputs now land in .kilroy/INPUT.md (written once at run start) and the agent reads from there directly. scripts/install-skills.sh wires everything up idempotently: symlinks for the binary, the workflows dir, and the skills/commands into ~/.claude, ~/.agents (codex's native discovery path — not ~/.codex/skills), and ~/.config/opencode. Also rebuilds the SKILL.md to document --prompt-file as the default path for non-trivial tasks, drops the --no-cxdb / --skip-cli-headless-warning mentions, and points to runs wait / runs show --latest for the check-status / retrieve-result flow.

Kilroy's isolated codex home used to copy both auth.json and config.toml from the user's real ~/.codex/ into the kilroy-owned codex state dir. That leaked user-scoped settings (model_reasoning_effort, personality, model) into kilroy runs, so a setting that worked for the user's interactive codex sessions could silently break kilroy runs for specific models — notably `gpt-5-codex` rejecting the inherited `model_reasoning_effort = "xhigh"` upstream with a 400 during preflight probes. Two fixes here: 1. Drop the config.toml copy entirely. Run configuration must come from kilroy and the .dot graph, not by accident from whatever the user has in ~/.codex/config.toml. If kilroy codex runs need specific settings, those belong in the graph or run.yaml. 2. When OPENAI_API_KEY is available in the parent env, write a fresh apikey auth.json into the isolated codex home instead of copying whatever auth.json the user has. This matches what tmux_handler.go + templates/codex.go already does for non-probe runs, so the probe stops diverging from the real run: both paths now force apikey mode when a key is present. When no OPENAI_API_KEY is set, kilroy still falls back to copying the user's auth.json (subscription auth). Probes under that path can't exercise apikey-only models like gpt-5-codex, but the rest of preflight still runs against something plausible. Tests updated: the old assertions on config.toml existence are replaced with explicit "must-not-exist" checks, and a new test covers the apikey auth.json write path. Verified end-to-end with a codex graph.codex.dot quick-launch run (39s, result.md correctly produced).

mattleaverton and others added 30 commits April 1, 2026 16:15

wip: PR review workflow graph (v2)

6cd2824

Setup/build/test tool nodes + single review agent node. Experimental workflow for automated PR triage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

wip: fix build-test script survival across branch switch

159c0f4

Setup script now copies build-test.sh to .ai/ before gh pr checkout changes the branch and removes workflow files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'upstream/main'

855df38

docs(plan): platform reframe — layered architecture for software ops …

e8f4e50

…platform Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

style: gofmt formatting pass after rename

2ae3885

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mattleaverton and others added 29 commits April 7, 2026 14:35

engine: add git activity events to RunLog (worktree.created, commit)

7b69581

Emits worktree.created when the run worktree is set up, and commit events with diff stats when recordNodeDiff finds file changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

workflows: add multi-tool exercise for observability testing

ff668ae

Exercises input/output contracts, .kilroy/ convention files, agent_tool routing across claude/codex/opencode, edge conditions, and run log events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: codex auth.json uses OPENAI_API_KEY field, not token

a09f2b6

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

workflows: switch codex model to codex-mini-latest

9204aa9

o3-mini doesn't support codex's web_search_preview tool. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mattleaverton merged commit d8c61c0 into danshapiro:main Apr 17, 2026
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Platform reframe: layered engine, tmux agents, workflow packages#81

Platform reframe: layered engine, tmux agents, workflow packages#81
mattleaverton merged 86 commits into
danshapiro:mainfrom
mattleaverton:impl/platform-reframe

mattleaverton commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mattleaverton commented Apr 16, 2026

Summary

What's new

Layered architecture (L0/L1/L2)

Workflow packages and CLI flags

Engine primitives

Agent execution via tmux

RunLog and observability

Server and REST API

Workflows shipped

Reliability fixes

Breaking changes

Known gaps and follow-ups

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant