KirkForge

Deterministic verification plugin for coding agents. KirkForge is not a standalone agent. It plugs into Codex, Claude Code, OpenCode, or any agent stack as a verification, correction, and routing layer.

How it works

The core insight: verification commoditizes model choice.

Role	Cost	Job
Brain	Expensive	Plans, delegates, decides next action. The host agent.
Brawn	Cheap	Generates code from a prompt. A worker model.
Verifier	Free	Lint, types, security, diff, imports. KirkForge. No model calls.

The Brain sends a task to the Brawn in JSON. The Brawn writes code. KirkForge's deterministic tools run on the output. If the Brawn messes up, KirkForge builds a compact correction prompt — not a summary, the actual errors — and the Brain decides whether to retry, switch models, or escalate. The Verifier never calls a model. The Brain never sees raw Brawn output, only the reduced state.

This is the loop: emit → verify → correct → repeat.

When correction fails, the Brain takes over — the Ferrari leaves the garage. But most tasks don't need the Ferrari. On measured tasks, mid-tier models + verification match frontier models at 2–4× lower token cost.

See ADR-005: Verification commoditizes model choice for the data.

What it does

Verify — Run lint, type-check, security, git-diff, and import-graph checks on a workspace. Deterministic, no model calls.
Prompt — Build a compact correction prompt from verification failures. Ready for the next model turn.
Observe — Record task outcomes (pass/fail/escalate) so future tasks can benefit from empirical routing.
Recall — Retrieve routing bias from past observations to recommend model and mode.
Decompose — Break complex tasks into smaller, independently verifiable subtasks.

The core invariant: verifier pass ≠ task pass. Verification checks code quality. Only the host knows whether the task succeeded. Memory stores host-reported outcomes, never verifier status.

Quick start

npm ci && npm run build && npm test

# Probe available tools
npx tsx apps/cli/src/index.ts doctor --pretty

# Verify a workspace (no model call)
npx tsx apps/cli/src/index.ts verify-workspace --workspace /path/to/project

# Build correction prompt from verification result
npx tsx apps/cli/src/index.ts prompt --packet result.json

# Record a task observation
npx tsx apps/cli/src/index.ts observe --memory mem.json \
  --task-id t1 --description "fix auth" --language typescript \
  --mode hard-prompt --model gpt-4 --outcome pass --duration-ms 5000

# Recall routing bias
npx tsx apps/cli/src/index.ts recall --memory mem.json --description "fix auth"

# Start daemon
npx tsx apps/cli/src/index.ts serve
# curl http://localhost:9090/healthz

CLI commands

Command	What it does
`delegate`	Task delegation with automatic mode routing
`run`	Execute task with correction loop (accept/correct/escalate)
`verify-workspace`	Deterministic verification → `ReducedStatePacket`
`decompose`	Break complex task into dependency-ordered subtrees
`recall-decomposition`	Inspect stored decompositions
`observe`	Record task outcome for routing memory
`recall`	Retrieve routing bias for similar tasks
`health`	Orchestrator health and SLO status
`serve`	Daemon mode with health-check server (port 9090)
`doctor`	Internal + external tool availability diagnostic
`tools`	List registered verification tools

Delegation modes

Mode	How it works
`hard-prompt`	Brain sends freeform instructions, Brawn writes code blocks, Verifier checks
`schema-contract`	Brain sends a JSON schema, Brawn fills it, Verifier validates structure
`artifact`	Brawn emits JSONL file-write artifacts, Verifier checks path safety

Verifier tools

Tool	What it checks	Source
lint	8 languages, 103 rules total	internal
types	tsc (TS), pyright (Python)	external
security	Safety-category lint rules	internal
changes	git diff (via GitnexusEmitter)	internal
graph	Import graph broken-edge detection	internal

Internal tools are bundled and always available. External tools (tsc, pyright) are probed from PATH.

Design invariants

No model calls in any verification or correction path. All five commands are deterministic.
stdout is data, stderr is diagnostics. Hosts parse stdout; stderr is for humans.
Verifier fail is not exit 1. The ReducedStatePacket is the product regardless of verdict.
Memory is explicit. observe and recall require --memory <path>. No ambient state.
Host decides task outcome. observe --outcome must come from the host's validator, never from verification status. Verifier pass ≠ task pass.

MCP Server

KirkForge ships a Model Context Protocol server for direct integration with MCP hosts (Claude Desktop, Codex CLI, Copilot, etc.):

{
  "mcpServers": {
    "kirkforge": {
      "command": "npx",
      "args": ["@kirkforge/mcp"]
    }
  }
}

Or run directly:

npx tsx apps/mcp/src/index.ts

See apps/mcp/README.md for the full tool list and configuration.

Project stats

34 packages (29 library + 5 lint engine + CLI)
970 tests across 66 suites
~22,500 lines production code, ~15,300 lines test code
Node.js ≥ 20, Git required for diff tracking

Architecture decisions

Deployment

Method	Command
Docker	`docker build -t kirkforge . && docker run -p 9090:9090 kirkforge`
Docker Compose	`docker-compose up -d`
GHCR	`docker pull ghcr.io/kirkforge/kirkforge:latest`
Kubernetes	`helm install kirkforge ./deploy/helm/kirkforge`

Security and multi-tenancy

KirkForge ships with security features for team and production use:

Sandbox: Docker runner for untrusted code (default), host runner with deny-by-default constraints
Auth: OIDC JWT/JWKS verification, API key bearer tokens
RBAC: Four-role deny-by-default model (admin, operator, developer, viewer)
Policy engine: Deny-by-default allowlists for commands, paths, and networks. Signed bundles (HMAC-SHA256 + Ed25519).
Multi-tenancy: Tenant registry with path isolation, per-tenant encryption keys, cross-tenant access control
Audit: Append-only WORM log with chain-hash integrity and SIEM export
Enterprise mode: Startup gate validates auth, audit, policy, and storage before daemon starts

These are guardrails, not the product. The product is deterministic verification that makes cheap models productive.

Requirements

Node.js ≥ 20
Git (for gitnexus diff tracking)
Optional: ESLint, TypeScript, ruff, pyright, bandit (for language-specific verification)
Optional: Docker (for sandboxed code execution)

Clean repo validation

bash scripts/ci.sh
# or: npm run ci

Runs build, lint, and test in sequence. Exits on first failure.

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
.changeset		.changeset
.codex-plugin		.codex-plugin
.github		.github
apps		apps
bench		bench
deploy/helm/kirkforge		deploy/helm/kirkforge
docs		docs
e2e		e2e
examples		examples
packages		packages
scripts		scripts
tests/load		tests/load
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.npmignore		.npmignore
.nvmrc		.nvmrc
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
.releaserc.json		.releaserc.json
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.sandbox		Dockerfile.sandbox
KirkForge-enterprise-readiness-gap.md		KirkForge-enterprise-readiness-gap.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
changelog.md		changelog.md
docker-compose.yml		docker-compose.yml
eslint.config.js		eslint.config.js
healthcheck.js		healthcheck.js
otel-collector-config.yml		otel-collector-config.yml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.package.json		tsconfig.package.json
typedoc.json		typedoc.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KirkForge

How it works

What it does

Quick start

CLI commands

Delegation modes

Verifier tools

Design invariants

MCP Server

Project stats

Architecture decisions

Deployment

Security and multi-tenancy

Requirements

Clean repo validation

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KirkForge

How it works

What it does

Quick start

CLI commands

Delegation modes

Verifier tools

Design invariants

MCP Server

Project stats

Architecture decisions

Deployment

Security and multi-tenancy

Requirements

Clean repo validation

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages