PatchPilot

Smart-routed CI repair under a hard AI budget.

PatchPilot turns red CI into reviewed, locally verified pull requests while showing every model-routing decision, tool action, and dollar spent.

Built as a Franklin plugin. Uses USDC micropayments via x402, not subscriptions. Apache-2.0.

The core insight

Other CI repair agents send every log to Claude Opus and hope for the best. PatchPilot treats CI repair as a pipeline, not a single prompt:

┌─────────────────────────────────────────────────────────────────────┐
│                      PatchPilot workflow                             │
│                                                                      │
│   5 of 7 steps never touch a model. Only 1 step may pay.            │
└─────────────────────────────────────────────────────────────────────┘

  ① ingest ────────  none   $0.0000   read + normalize log
  ② redact ────────  none   $0.0000   strip secrets before any model sees them
  ③ classify ──────  free   $0.0000   pattern-match first, free-tier LLM fallback
  ④ policy_check ──  none   $0.0000   allow / deny by failure type
  ⑤ repair ────────  free │ pro  $0.0000 – $0.0300   generate unified diff
  ⑥ verify ────────  none   $0.0000   run the verify command
  ⑦ policy_paths ──  none   $0.0000   enforce forbidden paths on the diff

                                       Total: $0.000 – $0.031
                                       All-pro baseline: ~$0.09
                                       Saved: 65%+ (often 100%)

PatchPilot uses two model tiers:

free — nvidia/qwen3-coder-480b. Zero cost, no wallet required.
pro — anthropic/claude-sonnet-4.6. ~$0.02/call, USDC via Franklin wallet.

Steps 1, 2, 4, 6, and 7 are pure TypeScript — regex, AST, shell, policy logic. Zero model cost.

Step 3 (classify) runs a local pattern matcher first. Only if confidence drops below 0.4 does it call the free tier for a second opinion — still $0.00.

Step 5 (repair) is the only step that might cost money. Low-risk failures (lint, format, simple config) route to the free tier and stay at $0.00. Only medium-risk code reasoning (typecheck, unit test) uses pro.

Workflow graph (Mermaid)

flowchart LR
    A[ingest<br/>none · $0] --> B[redact<br/>none · $0]
    B --> C{classify<br/>pattern match}
    C -- confidence &ge; 0.4 --> E[policy_check<br/>none · $0]
    C -- confidence &lt; 0.4 --> D[free tier<br/>$0]
    D --> E
    E -- allowed --> F{repair<br/>risk-routed}
    E -- forbidden --> X[abort<br/>triage only]
    F -- risk: low --> G1[free tier<br/>$0]
    F -- risk: medium --> G2[pro tier<br/>~$0.020]
    F -- risk: high --> X
    G1 --> H[verify<br/>shell · $0]
    G2 --> H
    H -- pass --> I[policy_check_paths<br/>none · $0]
    H -- fail --> X
    I -- clean --> J[✅ verified<br/>PR ready]
    I -- violates --> X

    classDef model fill:#fef3c7,stroke:#d97706,color:#78350f
    classDef free fill:#dcfce7,stroke:#16a34a,color:#14532d
    classDef terminal fill:#dbeafe,stroke:#2563eb,color:#1e3a8a
    classDef abort fill:#fee2e2,stroke:#dc2626,color:#7f1d1d
    class D,G1,G2 model
    class A,B,E,H,I free
    class J terminal
    class X abort

Model tier routing

PatchPilot only has two tiers, chosen per failure by risk:

                       ┌──────────────────────────┐
                       │ Failure classification   │
                       │  type + risk + evidence  │
                       └────────────┬─────────────┘
                                    │
              ┌─────────────────────┼─────────────────────┐
              │                     │                     │
        risk: low              risk: medium          risk: high
              │                     │                     │
              ▼                     ▼                     ▼
      ┌─────────────┐        ┌─────────────┐      ┌──────────────┐
      │  free tier  │        │   pro tier  │      │ triage only  │
      │             │        │             │      │ no patch     │
      │ qwen3-coder │        │ sonnet-4.6  │      │ human review │
      │    $0.00    │        │  ~$0.020    │      │     $0.00    │
      └─────────────┘        └─────────────┘      └──────────────┘

         e.g. lint,            e.g. typecheck,       e.g. auth changes,
         format, simple        unit test fail        payment logic,
         dep config                                  env secrets

Franklin's Smart Router resolves the final model within each tier. You don't pick qwen3-coder or sonnet-4.6 — PatchPilot only names the tier, Franklin decides the model based on your wallet, task category, and availability.

Where the money goes (real cost breakdown)

A typical low-risk repair run (lint / format / simple dep config):

step                 tier   tokens (in/out)   cost      model
────────────────────────────────────────────────────────────────────────
ingest               none   —                 $0.0000   —
redact               none   —                 $0.0000   —
classify             free   pattern match     $0.0000   (skipped LLM)
policy_check         none   —                 $0.0000   —
repair               free   1.8K / 0.4K       $0.0000   nvidia/qwen3-coder-480b
verify               none   —                 $0.0000   —
policy_check_paths   none   —                 $0.0000   —
────────────────────────────────────────────────────────────────────────
                                              $0.0000 total — zero wallet required

A typical medium-risk repair run (typecheck / unit test):

step                 tier   tokens (in/out)   cost      model
────────────────────────────────────────────────────────────────────────
ingest               none   —                 $0.0000   —
redact               none   —                 $0.0000   —
classify             free   pattern match     $0.0000   (skipped LLM)
policy_check         none   —                 $0.0000   —
repair               pro    3.2K / 0.8K       $0.0184   claude-sonnet-4.6
verify               none   —                 $0.0000   —
policy_check_paths   none   —                 $0.0000   —
────────────────────────────────────────────────────────────────────────
                                              $0.0184 total
                                              ~$0.054 if we'd used pro everywhere
                                              66% saved

Every dollar appears in the ledger.json artifact. Every model pick is auditable.

See it in action

Real artifacts from a sample repair run:

docs/examples/ledger.json — full per-step cost ledger
docs/examples/report.md — the Markdown report that becomes the PR body

For the methodology behind the 65% savings claim, see docs/benchmark.md.

Usage

Three modes, one command style

┌─ Local ────────────────────────────────────────────────────────────┐
│  patchpilot repair --command "npm test" --budget 0.25              │
│                                                                    │
│  Runs the failing command, diagnoses, patches, verifies locally.   │
│  Nothing leaves your machine except prompts + Franklin settlement. │
└────────────────────────────────────────────────────────────────────┘

┌─ GitHub Actions ───────────────────────────────────────────────────┐
│  patchpilot repair-gh --repo owner/repo --run latest-failed \      │
│                       --budget 0.50 --create-pr                    │
│                                                                    │
│  Fetches the failed run via gh CLI, repairs, opens a draft PR.     │
└────────────────────────────────────────────────────────────────────┘

┌─ Diagnose ─────────────────────────────────────────────────────────┐
│  patchpilot diagnose --command "npm test"                          │
│  patchpilot diagnose --log failure.log                             │
│                                                                    │
│  Classification only. No code changes, no model cost for simple    │
│  failures (pattern match hits).                                    │
└────────────────────────────────────────────────────────────────────┘

Workflow modes — `--mode full | triage | dry-run`

Every repair and repair-gh invocation picks a workflow mode. Default is full.

Mode	Runs steps	Model calls	Files changed	When to use
`full`	all 7	yes (`repair`)	yes (if fixed)	Default. Real repair with patch + PR.
`triage`	first 4	no	no	Just want a classification + root cause.
`dry-run`	all 7 (no-ops)	no	no	Preview the pipeline path, zero cost.

# full repair with PR
patchpilot repair --command "npm test" --mode full

# diagnose without patching anything
patchpilot repair --command "npm test" --mode triage

# walk the pipeline, see classification + policy decisions, zero model spend
patchpilot repair --command "npm test" --mode dry-run

triage terminates cleanly at the repair step with run status diagnosed. It's not an abort — it's the intended end of the pipeline when you don't want a patch.

Quick start

# Install Franklin (optional peer, but recommended)
npm install -g @blockrun/franklin
franklin setup base          # or: franklin setup solana

# Install PatchPilot directly from GitHub
npm install -g github:Lexiie/PatchPilot

# Initialize repo policy
patchpilot init

# Verify your environment
patchpilot doctor

# First repair
patchpilot repair --command "npm test" --budget 0.25

Two ways to run it

PatchPilot works as both a standalone CLI tool and a Franklin plugin. Same codebase, same workflow, same artifacts — only the entry point differs.

┌─────────────────────────────────────────────────────────────────────┐
│                         Shared source                               │
│                                                                     │
│   src/franklin/workflow.ts   ← the 7-step Workflow                  │
│   src/cli/orchestrator.ts    ← repairLocal / repairGitHub           │
│   src/failure-engine/        ← classifier, redactor, log normalizer │
│   src/policy/                ← .patchpilot.yml enforcement          │
└──────────────────┬───────────────────────────┬──────────────────────┘
                   │                           │
          ┌────────▼──────────┐      ┌─────────▼─────────┐
          │ src/cli.ts        │      │ src/index.ts      │
          │ Commander-based   │      │ Exports Plugin{}  │
          │ bin: patchpilot   │      │ plugin entry      │
          └────────┬──────────┘      └─────────┬─────────┘
                   │                           │
                   ▼                           ▼
          ┌────────────────┐          ┌────────────────────┐
          │ Standalone CLI │          │ Franklin host      │
          │ npm i -g       │          │ via plugin.json    │
          └────────────────┘          └────────────────────┘

Mode A — Standalone CLI

Install directly from GitHub. Franklin is an optional peer; if it's installed, PatchPilot uses its CLI for model calls and cost tracking. If not, PatchPilot still builds and type-checks.

npm install -g github:Lexiie/PatchPilot
npm install -g @blockrun/franklin    # optional but recommended

patchpilot repair --command "npm test" --budget 0.25
patchpilot repair-gh --repo acme/api --create-pr --budget 0.50

Use standalone when:

Running in CI (GitHub Actions, GitLab CI, buildkite)
Scheduled cron or webhook handler
Docker container with minimal footprint
Org hasn't adopted Franklin yet

Mode B — Franklin plugin

Drop the package into Franklin's plugin directory. Franklin discovers it via plugin.json and registers its commands + workflow with the native runner.

# Dev / local install
FRANKLIN_PLUGINS_DIR=/path/to/PatchPilot franklin

# User install
cp -r PatchPilot ~/.blockrun/plugins/patchpilot
franklin patchpilot repair --command "npm test" --budget 0.25
franklin patchpilot repair-gh --repo acme/api --create-pr

Use plugin mode when:

Developer already uses Franklin as primary coding agent
Want integrated /cost, /insights, session history
Interactive debugging alongside /fix, /review, /test
Team tracks all AI spend through one Franklin wallet

What auto-switches between modes

The runner detects Franklin at startup and picks the best path — you don't configure this:

Runtime behavior	Standalone	Plugin
Model calls	Shell out: `franklin --model`	Native `ctx.callModel()`
Cost tracking	Parse CLI routing footer	Native session ledger
Wallet settlement	Franklin wallet (via CLI)	Franklin wallet (native)
Franklin `/cost` visibility	❌	✅
Franklin `/insights`	❌	✅
Startup overhead	Minimal	Higher (loads Franklin core)
PatchPilot artifacts	`.patchpilot/runs/<id>/` (same)	`.patchpilot/runs/<id>/` (same)

Not sure which to use?

Start standalone. Upgrade to plugin mode if your team adopts Franklin more broadly — no config migration needed, the same .patchpilot.yml works for both.

Architecture

┌────────────────────────────────────────────────────────────────────┐
│                        PatchPilot CLI                              │
│                                                                    │
│   repair · repair-gh · diagnose · init · doctor · runs             │
└───────────────────┬────────────────────────────────────────────────┘
                    │
                    ▼
┌────────────────────────────────────────────────────────────────────┐
│                     Unified Workflow Runner                        │
│                                                                    │
│   If @blockrun/franklin installed → native Franklin WorkflowRunner │
│   Else → standalone runner that shells out to `franklin --model`   │
└───────────────────┬────────────────────────────────────────────────┘
                    │
                    ▼
┌────────────────────────────────────────────────────────────────────┐
│                    PatchPilot Workflow (7 steps)                   │
├────────────────────────────────────────────────────────────────────┤
│                                                                    │
│   Adapters              Failure Engine          Franklin           │
│   ────────              ──────────────          ────────           │
│   local-command         log-normalizer          workflow.ts        │
│   log-file              secret-redactor         prompt-builder     │
│   github-cli            classifier              patch-applier      │
│                                                 standalone-runner  │
│                                                                    │
│   Verification          Policy                  Reporting          │
│   ────────────          ──────                  ─────────          │
│   verify command        loadPolicy              markdown-reporter  │
│   collectDiff           checkFailureType        terminal-reporter  │
│   getChangedFiles       checkForbiddenPaths                        │
└────────────────────────────────────────────────────────────────────┘
                    │
                    ▼
┌────────────────────────────────────────────────────────────────────┐
│                         Artifacts                                  │
│                                                                    │
│   .patchpilot/runs/<run-id>/                                       │
│   ├── run.json          (full run metadata)                        │
│   ├── ledger.json       (cost + routing, per step)                 │
│   ├── report.md         (PR body / human-readable report)          │
│   ├── patch.diff        (the applied unified diff)                 │
│   ├── failure.log       (raw CI log)                               │
│   ├── redacted-log.txt  (secret-redacted version sent to model)    │
│   └── verify-*.log      (verification command output)              │
└────────────────────────────────────────────────────────────────────┘

Safety rails

Default behavior	Override
Refuse dirty working tree	`--allow-dirty`
Draft PRs only	`createPrDefault: ready`
Never patch forbidden paths	edit `.patchpilot.yml`
Never exceed `--budget`	—
Redact secrets before any model	—
Require passing verify command	—
Skip `environment_missing_secret`	explicit allow list
Skip `network_or_infra`	explicit allow list
Triage-only for high-risk failures	—

Policy lives in .patchpilot.yml:

version: 1
repair:
  max_attempts: 3
  create_pr_default: draft
  allowed_failure_types: [lint, format, typecheck, unit_test, dependency_config]
  forbidden_failure_types: [environment_missing_secret, network_or_infra]
  forbidden_paths: [".env*", "secrets/**", "infra/prod/**"]
  require_human_review_for: ["auth/**", "billing/**", "payments/**", "migrations/**"]
verification:
  required_commands: ["npm test"]

Why this works

Most AI coding agents treat CI repair as "send log → get patch". That's a single pro-tier call, and it's wasteful:

60% of CI failures don't need LLM reasoning at all. Lint, format, snapshot updates, lockfile drift — these are pattern-level problems with pattern-level fixes.
Classification ≠ repair. Deciding what kind of failure this is is a much simpler task than fixing the code. Free-tier or local matching handles it.
Verification is not AI's job. Running npm test is deterministic and free. Shelling out is faster than asking a model to predict whether tests pass.

PatchPilot encodes this as a workflow. Franklin's Smart Router handles the rest.

Project layout

src/
├── adapters/              Input adapters (local / log / GitHub)
├── failure-engine/        Log normalizer, secret redactor, classifier
├── policy/                .patchpilot.yml loader + enforcement
├── franklin/              Franklin plugin surface
│   ├── workflow.ts        ← the 7-step Workflow
│   ├── prompt-builder.ts
│   ├── patch-applier.ts
│   ├── runner.ts          ← unified native/standalone runner
│   └── standalone-runner.ts
├── verification/          Command execution + diff collection
├── reporting/             Markdown + terminal reporters
├── types/                 Zod schemas + Franklin SDK types
├── utils/                 Storage, IDs
├── cli/                   Orchestrator + CLI subcommands
├── index.ts               Plugin entry (exports default Plugin)
└── cli.ts                 Standalone CLI entry

tests/                     28 tests, 7 suites
plugin.json                Franklin plugin manifest

Status

Feature	Status
Local Mode	✅ implemented
Log Mode	✅ implemented
GitHub CLI Mode (`repair-gh`)	✅ implemented
Managed Mode (GitHub App + webhook)	❌ planned
Franklin Plugin SDK integration	✅ implemented
Native Franklin runner (auto-detect)	✅ implemented
Standalone fallback runner	✅ implemented
Secret redaction	✅ implemented
Policy enforcement	✅ implemented
Cost ledger with real model routing	✅ implemented
Draft PR creation via gh	✅ implemented
Sample artifacts + benchmark docs	✅ included
44 tests, TypeScript strict mode	✅

License

Apache-2.0. See LICENSE.

Built on Franklin — the AI agent with a wallet.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
docs		docs
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
plugin.json		plugin.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PatchPilot

The core insight

Workflow graph (Mermaid)

Model tier routing

Where the money goes (real cost breakdown)

See it in action

Usage

Three modes, one command style

Workflow modes — `--mode full | triage | dry-run`

Quick start

Two ways to run it

Mode A — Standalone CLI

Mode B — Franklin plugin

What auto-switches between modes

Not sure which to use?

Architecture

Safety rails

Why this works

Project layout

Status

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PatchPilot

The core insight

Workflow graph (Mermaid)

Model tier routing

Where the money goes (real cost breakdown)

See it in action

Usage

Three modes, one command style

Workflow modes — --mode full | triage | dry-run

Quick start

Two ways to run it

Mode A — Standalone CLI

Mode B — Franklin plugin

What auto-switches between modes

Not sure which to use?

Architecture

Safety rails

Why this works

Project layout

Status

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Workflow modes — `--mode full | triage | dry-run`

Packages