Smart-routed CI repair under a hard AI budget.
PatchPilot turns red CI into reviewed, locally verified pull requests while showing every model-routing decision, tool action, and dollar spent.
Built as a Franklin plugin. Uses USDC micropayments via x402, not subscriptions. Apache-2.0.
Other CI repair agents send every log to Claude Opus and hope for the best. PatchPilot treats CI repair as a pipeline, not a single prompt:
┌─────────────────────────────────────────────────────────────────────┐
│ PatchPilot workflow │
│ │
│ 5 of 7 steps never touch a model. Only 1 step may pay. │
└─────────────────────────────────────────────────────────────────────┘
① ingest ──────── none $0.0000 read + normalize log
② redact ──────── none $0.0000 strip secrets before any model sees them
③ classify ────── free $0.0000 pattern-match first, free-tier LLM fallback
④ policy_check ── none $0.0000 allow / deny by failure type
⑤ repair ──────── free │ pro $0.0000 – $0.0300 generate unified diff
⑥ verify ──────── none $0.0000 run the verify command
⑦ policy_paths ── none $0.0000 enforce forbidden paths on the diff
Total: $0.000 – $0.031
All-pro baseline: ~$0.09
Saved: 65%+ (often 100%)
PatchPilot uses two model tiers:
free—nvidia/qwen3-coder-480b. Zero cost, no wallet required.pro—anthropic/claude-sonnet-4.6. ~$0.02/call, USDC via Franklin wallet.
Steps 1, 2, 4, 6, and 7 are pure TypeScript — regex, AST, shell, policy logic. Zero model cost.
Step 3 (classify) runs a local pattern matcher first. Only if confidence drops below 0.4 does it call the free tier for a second opinion — still $0.00.
Step 5 (repair) is the only step that might cost money. Low-risk failures (lint, format, simple config) route to the free tier and stay at $0.00. Only medium-risk code reasoning (typecheck, unit test) uses pro.
flowchart LR
A[ingest<br/>none · $0] --> B[redact<br/>none · $0]
B --> C{classify<br/>pattern match}
C -- confidence ≥ 0.4 --> E[policy_check<br/>none · $0]
C -- confidence < 0.4 --> D[free tier<br/>$0]
D --> E
E -- allowed --> F{repair<br/>risk-routed}
E -- forbidden --> X[abort<br/>triage only]
F -- risk: low --> G1[free tier<br/>$0]
F -- risk: medium --> G2[pro tier<br/>~$0.020]
F -- risk: high --> X
G1 --> H[verify<br/>shell · $0]
G2 --> H
H -- pass --> I[policy_check_paths<br/>none · $0]
H -- fail --> X
I -- clean --> J[✅ verified<br/>PR ready]
I -- violates --> X
classDef model fill:#fef3c7,stroke:#d97706,color:#78350f
classDef free fill:#dcfce7,stroke:#16a34a,color:#14532d
classDef terminal fill:#dbeafe,stroke:#2563eb,color:#1e3a8a
classDef abort fill:#fee2e2,stroke:#dc2626,color:#7f1d1d
class D,G1,G2 model
class A,B,E,H,I free
class J terminal
class X abort
PatchPilot only has two tiers, chosen per failure by risk:
┌──────────────────────────┐
│ Failure classification │
│ type + risk + evidence │
└────────────┬─────────────┘
│
┌─────────────────────┼─────────────────────┐
│ │ │
risk: low risk: medium risk: high
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌──────────────┐
│ free tier │ │ pro tier │ │ triage only │
│ │ │ │ │ no patch │
│ qwen3-coder │ │ sonnet-4.6 │ │ human review │
│ $0.00 │ │ ~$0.020 │ │ $0.00 │
└─────────────┘ └─────────────┘ └──────────────┘
e.g. lint, e.g. typecheck, e.g. auth changes,
format, simple unit test fail payment logic,
dep config env secrets
Franklin's Smart Router resolves the final model within each tier. You don't pick qwen3-coder or sonnet-4.6 — PatchPilot only names the tier, Franklin decides the model based on your wallet, task category, and availability.
A typical low-risk repair run (lint / format / simple dep config):
step tier tokens (in/out) cost model
────────────────────────────────────────────────────────────────────────
ingest none — $0.0000 —
redact none — $0.0000 —
classify free pattern match $0.0000 (skipped LLM)
policy_check none — $0.0000 —
repair free 1.8K / 0.4K $0.0000 nvidia/qwen3-coder-480b
verify none — $0.0000 —
policy_check_paths none — $0.0000 —
────────────────────────────────────────────────────────────────────────
$0.0000 total — zero wallet required
A typical medium-risk repair run (typecheck / unit test):
step tier tokens (in/out) cost model
────────────────────────────────────────────────────────────────────────
ingest none — $0.0000 —
redact none — $0.0000 —
classify free pattern match $0.0000 (skipped LLM)
policy_check none — $0.0000 —
repair pro 3.2K / 0.8K $0.0184 claude-sonnet-4.6
verify none — $0.0000 —
policy_check_paths none — $0.0000 —
────────────────────────────────────────────────────────────────────────
$0.0184 total
~$0.054 if we'd used pro everywhere
66% saved
Every dollar appears in the ledger.json artifact. Every model pick is auditable.
Real artifacts from a sample repair run:
docs/examples/ledger.json— full per-step cost ledgerdocs/examples/report.md— the Markdown report that becomes the PR body
For the methodology behind the 65% savings claim, see docs/benchmark.md.
┌─ Local ────────────────────────────────────────────────────────────┐
│ patchpilot repair --command "npm test" --budget 0.25 │
│ │
│ Runs the failing command, diagnoses, patches, verifies locally. │
│ Nothing leaves your machine except prompts + Franklin settlement. │
└────────────────────────────────────────────────────────────────────┘
┌─ GitHub Actions ───────────────────────────────────────────────────┐
│ patchpilot repair-gh --repo owner/repo --run latest-failed \ │
│ --budget 0.50 --create-pr │
│ │
│ Fetches the failed run via gh CLI, repairs, opens a draft PR. │
└────────────────────────────────────────────────────────────────────┘
┌─ Diagnose ─────────────────────────────────────────────────────────┐
│ patchpilot diagnose --command "npm test" │
│ patchpilot diagnose --log failure.log │
│ │
│ Classification only. No code changes, no model cost for simple │
│ failures (pattern match hits). │
└────────────────────────────────────────────────────────────────────┘
Every repair and repair-gh invocation picks a workflow mode. Default is full.
| Mode | Runs steps | Model calls | Files changed | When to use |
|---|---|---|---|---|
full |
all 7 | yes (repair) |
yes (if fixed) | Default. Real repair with patch + PR. |
triage |
first 4 | no | no | Just want a classification + root cause. |
dry-run |
all 7 (no-ops) | no | no | Preview the pipeline path, zero cost. |
# full repair with PR
patchpilot repair --command "npm test" --mode full
# diagnose without patching anything
patchpilot repair --command "npm test" --mode triage
# walk the pipeline, see classification + policy decisions, zero model spend
patchpilot repair --command "npm test" --mode dry-runtriage terminates cleanly at the repair step with run status diagnosed. It's not an abort — it's the intended end of the pipeline when you don't want a patch.
# Install Franklin (optional peer, but recommended)
npm install -g @blockrun/franklin
franklin setup base # or: franklin setup solana
# Install PatchPilot directly from GitHub
npm install -g github:Lexiie/PatchPilot
# Initialize repo policy
patchpilot init
# Verify your environment
patchpilot doctor
# First repair
patchpilot repair --command "npm test" --budget 0.25PatchPilot works as both a standalone CLI tool and a Franklin plugin. Same codebase, same workflow, same artifacts — only the entry point differs.
┌─────────────────────────────────────────────────────────────────────┐
│ Shared source │
│ │
│ src/franklin/workflow.ts ← the 7-step Workflow │
│ src/cli/orchestrator.ts ← repairLocal / repairGitHub │
│ src/failure-engine/ ← classifier, redactor, log normalizer │
│ src/policy/ ← .patchpilot.yml enforcement │
└──────────────────┬───────────────────────────┬──────────────────────┘
│ │
┌────────▼──────────┐ ┌─────────▼─────────┐
│ src/cli.ts │ │ src/index.ts │
│ Commander-based │ │ Exports Plugin{} │
│ bin: patchpilot │ │ plugin entry │
└────────┬──────────┘ └─────────┬─────────┘
│ │
▼ ▼
┌────────────────┐ ┌────────────────────┐
│ Standalone CLI │ │ Franklin host │
│ npm i -g │ │ via plugin.json │
└────────────────┘ └────────────────────┘
Install directly from GitHub. Franklin is an optional peer; if it's installed, PatchPilot uses its CLI for model calls and cost tracking. If not, PatchPilot still builds and type-checks.
npm install -g github:Lexiie/PatchPilot
npm install -g @blockrun/franklin # optional but recommended
patchpilot repair --command "npm test" --budget 0.25
patchpilot repair-gh --repo acme/api --create-pr --budget 0.50Use standalone when:
- Running in CI (GitHub Actions, GitLab CI, buildkite)
- Scheduled cron or webhook handler
- Docker container with minimal footprint
- Org hasn't adopted Franklin yet
Drop the package into Franklin's plugin directory. Franklin discovers it via plugin.json and registers its commands + workflow with the native runner.
# Dev / local install
FRANKLIN_PLUGINS_DIR=/path/to/PatchPilot franklin
# User install
cp -r PatchPilot ~/.blockrun/plugins/patchpilot
franklin patchpilot repair --command "npm test" --budget 0.25
franklin patchpilot repair-gh --repo acme/api --create-prUse plugin mode when:
- Developer already uses Franklin as primary coding agent
- Want integrated
/cost,/insights, session history - Interactive debugging alongside
/fix,/review,/test - Team tracks all AI spend through one Franklin wallet
The runner detects Franklin at startup and picks the best path — you don't configure this:
| Runtime behavior | Standalone | Plugin |
|---|---|---|
| Model calls | Shell out: franklin --model |
Native ctx.callModel() |
| Cost tracking | Parse CLI routing footer | Native session ledger |
| Wallet settlement | Franklin wallet (via CLI) | Franklin wallet (native) |
Franklin /cost visibility |
❌ | ✅ |
Franklin /insights |
❌ | ✅ |
| Startup overhead | Minimal | Higher (loads Franklin core) |
| PatchPilot artifacts | .patchpilot/runs/<id>/ (same) |
.patchpilot/runs/<id>/ (same) |
Start standalone. Upgrade to plugin mode if your team adopts Franklin more broadly — no config migration needed, the same .patchpilot.yml works for both.
┌────────────────────────────────────────────────────────────────────┐
│ PatchPilot CLI │
│ │
│ repair · repair-gh · diagnose · init · doctor · runs │
└───────────────────┬────────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────────────────┐
│ Unified Workflow Runner │
│ │
│ If @blockrun/franklin installed → native Franklin WorkflowRunner │
│ Else → standalone runner that shells out to `franklin --model` │
└───────────────────┬────────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────────────────┐
│ PatchPilot Workflow (7 steps) │
├────────────────────────────────────────────────────────────────────┤
│ │
│ Adapters Failure Engine Franklin │
│ ──────── ────────────── ──────── │
│ local-command log-normalizer workflow.ts │
│ log-file secret-redactor prompt-builder │
│ github-cli classifier patch-applier │
│ standalone-runner │
│ │
│ Verification Policy Reporting │
│ ──────────── ────── ───────── │
│ verify command loadPolicy markdown-reporter │
│ collectDiff checkFailureType terminal-reporter │
│ getChangedFiles checkForbiddenPaths │
└────────────────────────────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────────────────┐
│ Artifacts │
│ │
│ .patchpilot/runs/<run-id>/ │
│ ├── run.json (full run metadata) │
│ ├── ledger.json (cost + routing, per step) │
│ ├── report.md (PR body / human-readable report) │
│ ├── patch.diff (the applied unified diff) │
│ ├── failure.log (raw CI log) │
│ ├── redacted-log.txt (secret-redacted version sent to model) │
│ └── verify-*.log (verification command output) │
└────────────────────────────────────────────────────────────────────┘
| Default behavior | Override |
|---|---|
| Refuse dirty working tree | --allow-dirty |
| Draft PRs only | createPrDefault: ready |
| Never patch forbidden paths | edit .patchpilot.yml |
Never exceed --budget |
— |
| Redact secrets before any model | — |
| Require passing verify command | — |
Skip environment_missing_secret |
explicit allow list |
Skip network_or_infra |
explicit allow list |
| Triage-only for high-risk failures | — |
Policy lives in .patchpilot.yml:
version: 1
repair:
max_attempts: 3
create_pr_default: draft
allowed_failure_types: [lint, format, typecheck, unit_test, dependency_config]
forbidden_failure_types: [environment_missing_secret, network_or_infra]
forbidden_paths: [".env*", "secrets/**", "infra/prod/**"]
require_human_review_for: ["auth/**", "billing/**", "payments/**", "migrations/**"]
verification:
required_commands: ["npm test"]Most AI coding agents treat CI repair as "send log → get patch". That's a single pro-tier call, and it's wasteful:
- 60% of CI failures don't need LLM reasoning at all. Lint, format, snapshot updates, lockfile drift — these are pattern-level problems with pattern-level fixes.
- Classification ≠ repair. Deciding what kind of failure this is is a much simpler task than fixing the code. Free-tier or local matching handles it.
- Verification is not AI's job. Running
npm testis deterministic and free. Shelling out is faster than asking a model to predict whether tests pass.
PatchPilot encodes this as a workflow. Franklin's Smart Router handles the rest.
src/
├── adapters/ Input adapters (local / log / GitHub)
├── failure-engine/ Log normalizer, secret redactor, classifier
├── policy/ .patchpilot.yml loader + enforcement
├── franklin/ Franklin plugin surface
│ ├── workflow.ts ← the 7-step Workflow
│ ├── prompt-builder.ts
│ ├── patch-applier.ts
│ ├── runner.ts ← unified native/standalone runner
│ └── standalone-runner.ts
├── verification/ Command execution + diff collection
├── reporting/ Markdown + terminal reporters
├── types/ Zod schemas + Franklin SDK types
├── utils/ Storage, IDs
├── cli/ Orchestrator + CLI subcommands
├── index.ts Plugin entry (exports default Plugin)
└── cli.ts Standalone CLI entry
tests/ 28 tests, 7 suites
plugin.json Franklin plugin manifest
| Feature | Status |
|---|---|
| Local Mode | ✅ implemented |
| Log Mode | ✅ implemented |
GitHub CLI Mode (repair-gh) |
✅ implemented |
| Managed Mode (GitHub App + webhook) | ❌ planned |
| Franklin Plugin SDK integration | ✅ implemented |
| Native Franklin runner (auto-detect) | ✅ implemented |
| Standalone fallback runner | ✅ implemented |
| Secret redaction | ✅ implemented |
| Policy enforcement | ✅ implemented |
| Cost ledger with real model routing | ✅ implemented |
| Draft PR creation via gh | ✅ implemented |
| Sample artifacts + benchmark docs | ✅ included |
| 44 tests, TypeScript strict mode | ✅ |
Apache-2.0. See LICENSE.
Built on Franklin — the AI agent with a wallet.