Skip to content

Lexiie/PatchPilot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PatchPilot

Smart-routed CI repair under a hard AI budget.

License: Apache 2.0 TypeScript: strict Node: >=20 Franklin: plugin Payments: x402 Settlement: USDC

PatchPilot turns red CI into reviewed, locally verified pull requests while showing every model-routing decision, tool action, and dollar spent.

Built as a Franklin plugin. Uses USDC micropayments via x402, not subscriptions. Apache-2.0.


The core insight

Other CI repair agents send every log to Claude Opus and hope for the best. PatchPilot treats CI repair as a pipeline, not a single prompt:

┌─────────────────────────────────────────────────────────────────────┐
│                      PatchPilot workflow                             │
│                                                                      │
│   5 of 7 steps never touch a model. Only 1 step may pay.            │
└─────────────────────────────────────────────────────────────────────┘

  ① ingest ────────  none   $0.0000   read + normalize log
  ② redact ────────  none   $0.0000   strip secrets before any model sees them
  ③ classify ──────  free   $0.0000   pattern-match first, free-tier LLM fallback
  ④ policy_check ──  none   $0.0000   allow / deny by failure type
  ⑤ repair ────────  free │ pro  $0.0000 – $0.0300   generate unified diff
  ⑥ verify ────────  none   $0.0000   run the verify command
  ⑦ policy_paths ──  none   $0.0000   enforce forbidden paths on the diff

                                       Total: $0.000 – $0.031
                                       All-pro baseline: ~$0.09
                                       Saved: 65%+ (often 100%)

PatchPilot uses two model tiers:

  • freenvidia/qwen3-coder-480b. Zero cost, no wallet required.
  • proanthropic/claude-sonnet-4.6. ~$0.02/call, USDC via Franklin wallet.

Steps 1, 2, 4, 6, and 7 are pure TypeScript — regex, AST, shell, policy logic. Zero model cost.

Step 3 (classify) runs a local pattern matcher first. Only if confidence drops below 0.4 does it call the free tier for a second opinion — still $0.00.

Step 5 (repair) is the only step that might cost money. Low-risk failures (lint, format, simple config) route to the free tier and stay at $0.00. Only medium-risk code reasoning (typecheck, unit test) uses pro.

Workflow graph (Mermaid)

flowchart LR
    A[ingest<br/>none · $0] --> B[redact<br/>none · $0]
    B --> C{classify<br/>pattern match}
    C -- confidence &ge; 0.4 --> E[policy_check<br/>none · $0]
    C -- confidence &lt; 0.4 --> D[free tier<br/>$0]
    D --> E
    E -- allowed --> F{repair<br/>risk-routed}
    E -- forbidden --> X[abort<br/>triage only]
    F -- risk: low --> G1[free tier<br/>$0]
    F -- risk: medium --> G2[pro tier<br/>~$0.020]
    F -- risk: high --> X
    G1 --> H[verify<br/>shell · $0]
    G2 --> H
    H -- pass --> I[policy_check_paths<br/>none · $0]
    H -- fail --> X
    I -- clean --> J[✅ verified<br/>PR ready]
    I -- violates --> X

    classDef model fill:#fef3c7,stroke:#d97706,color:#78350f
    classDef free fill:#dcfce7,stroke:#16a34a,color:#14532d
    classDef terminal fill:#dbeafe,stroke:#2563eb,color:#1e3a8a
    classDef abort fill:#fee2e2,stroke:#dc2626,color:#7f1d1d
    class D,G1,G2 model
    class A,B,E,H,I free
    class J terminal
    class X abort
Loading

Model tier routing

PatchPilot only has two tiers, chosen per failure by risk:

                       ┌──────────────────────────┐
                       │ Failure classification   │
                       │  type + risk + evidence  │
                       └────────────┬─────────────┘
                                    │
              ┌─────────────────────┼─────────────────────┐
              │                     │                     │
        risk: low              risk: medium          risk: high
              │                     │                     │
              ▼                     ▼                     ▼
      ┌─────────────┐        ┌─────────────┐      ┌──────────────┐
      │  free tier  │        │   pro tier  │      │ triage only  │
      │             │        │             │      │ no patch     │
      │ qwen3-coder │        │ sonnet-4.6  │      │ human review │
      │    $0.00    │        │  ~$0.020    │      │     $0.00    │
      └─────────────┘        └─────────────┘      └──────────────┘

         e.g. lint,            e.g. typecheck,       e.g. auth changes,
         format, simple        unit test fail        payment logic,
         dep config                                  env secrets

Franklin's Smart Router resolves the final model within each tier. You don't pick qwen3-coder or sonnet-4.6 — PatchPilot only names the tier, Franklin decides the model based on your wallet, task category, and availability.


Where the money goes (real cost breakdown)

A typical low-risk repair run (lint / format / simple dep config):

step                 tier   tokens (in/out)   cost      model
────────────────────────────────────────────────────────────────────────
ingest               none   —                 $0.0000   —
redact               none   —                 $0.0000   —
classify             free   pattern match     $0.0000   (skipped LLM)
policy_check         none   —                 $0.0000   —
repair               free   1.8K / 0.4K       $0.0000   nvidia/qwen3-coder-480b
verify               none   —                 $0.0000   —
policy_check_paths   none   —                 $0.0000   —
────────────────────────────────────────────────────────────────────────
                                              $0.0000 total — zero wallet required

A typical medium-risk repair run (typecheck / unit test):

step                 tier   tokens (in/out)   cost      model
────────────────────────────────────────────────────────────────────────
ingest               none   —                 $0.0000   —
redact               none   —                 $0.0000   —
classify             free   pattern match     $0.0000   (skipped LLM)
policy_check         none   —                 $0.0000   —
repair               pro    3.2K / 0.8K       $0.0184   claude-sonnet-4.6
verify               none   —                 $0.0000   —
policy_check_paths   none   —                 $0.0000   —
────────────────────────────────────────────────────────────────────────
                                              $0.0184 total
                                              ~$0.054 if we'd used pro everywhere
                                              66% saved

Every dollar appears in the ledger.json artifact. Every model pick is auditable.

See it in action

Real artifacts from a sample repair run:

For the methodology behind the 65% savings claim, see docs/benchmark.md.


Usage

Three modes, one command style

┌─ Local ────────────────────────────────────────────────────────────┐
│  patchpilot repair --command "npm test" --budget 0.25              │
│                                                                    │
│  Runs the failing command, diagnoses, patches, verifies locally.   │
│  Nothing leaves your machine except prompts + Franklin settlement. │
└────────────────────────────────────────────────────────────────────┘

┌─ GitHub Actions ───────────────────────────────────────────────────┐
│  patchpilot repair-gh --repo owner/repo --run latest-failed \      │
│                       --budget 0.50 --create-pr                    │
│                                                                    │
│  Fetches the failed run via gh CLI, repairs, opens a draft PR.     │
└────────────────────────────────────────────────────────────────────┘

┌─ Diagnose ─────────────────────────────────────────────────────────┐
│  patchpilot diagnose --command "npm test"                          │
│  patchpilot diagnose --log failure.log                             │
│                                                                    │
│  Classification only. No code changes, no model cost for simple    │
│  failures (pattern match hits).                                    │
└────────────────────────────────────────────────────────────────────┘

Workflow modes — --mode full | triage | dry-run

Every repair and repair-gh invocation picks a workflow mode. Default is full.

Mode Runs steps Model calls Files changed When to use
full all 7 yes (repair) yes (if fixed) Default. Real repair with patch + PR.
triage first 4 no no Just want a classification + root cause.
dry-run all 7 (no-ops) no no Preview the pipeline path, zero cost.
# full repair with PR
patchpilot repair --command "npm test" --mode full

# diagnose without patching anything
patchpilot repair --command "npm test" --mode triage

# walk the pipeline, see classification + policy decisions, zero model spend
patchpilot repair --command "npm test" --mode dry-run

triage terminates cleanly at the repair step with run status diagnosed. It's not an abort — it's the intended end of the pipeline when you don't want a patch.

Quick start

# Install Franklin (optional peer, but recommended)
npm install -g @blockrun/franklin
franklin setup base          # or: franklin setup solana

# Install PatchPilot directly from GitHub
npm install -g github:Lexiie/PatchPilot

# Initialize repo policy
patchpilot init

# Verify your environment
patchpilot doctor

# First repair
patchpilot repair --command "npm test" --budget 0.25

Two ways to run it

PatchPilot works as both a standalone CLI tool and a Franklin plugin. Same codebase, same workflow, same artifacts — only the entry point differs.

┌─────────────────────────────────────────────────────────────────────┐
│                         Shared source                               │
│                                                                     │
│   src/franklin/workflow.ts   ← the 7-step Workflow                  │
│   src/cli/orchestrator.ts    ← repairLocal / repairGitHub           │
│   src/failure-engine/        ← classifier, redactor, log normalizer │
│   src/policy/                ← .patchpilot.yml enforcement          │
└──────────────────┬───────────────────────────┬──────────────────────┘
                   │                           │
          ┌────────▼──────────┐      ┌─────────▼─────────┐
          │ src/cli.ts        │      │ src/index.ts      │
          │ Commander-based   │      │ Exports Plugin{}  │
          │ bin: patchpilot   │      │ plugin entry      │
          └────────┬──────────┘      └─────────┬─────────┘
                   │                           │
                   ▼                           ▼
          ┌────────────────┐          ┌────────────────────┐
          │ Standalone CLI │          │ Franklin host      │
          │ npm i -g       │          │ via plugin.json    │
          └────────────────┘          └────────────────────┘

Mode A — Standalone CLI

Install directly from GitHub. Franklin is an optional peer; if it's installed, PatchPilot uses its CLI for model calls and cost tracking. If not, PatchPilot still builds and type-checks.

npm install -g github:Lexiie/PatchPilot
npm install -g @blockrun/franklin    # optional but recommended

patchpilot repair --command "npm test" --budget 0.25
patchpilot repair-gh --repo acme/api --create-pr --budget 0.50

Use standalone when:

  • Running in CI (GitHub Actions, GitLab CI, buildkite)
  • Scheduled cron or webhook handler
  • Docker container with minimal footprint
  • Org hasn't adopted Franklin yet

Mode B — Franklin plugin

Drop the package into Franklin's plugin directory. Franklin discovers it via plugin.json and registers its commands + workflow with the native runner.

# Dev / local install
FRANKLIN_PLUGINS_DIR=/path/to/PatchPilot franklin

# User install
cp -r PatchPilot ~/.blockrun/plugins/patchpilot
franklin patchpilot repair --command "npm test" --budget 0.25
franklin patchpilot repair-gh --repo acme/api --create-pr

Use plugin mode when:

  • Developer already uses Franklin as primary coding agent
  • Want integrated /cost, /insights, session history
  • Interactive debugging alongside /fix, /review, /test
  • Team tracks all AI spend through one Franklin wallet

What auto-switches between modes

The runner detects Franklin at startup and picks the best path — you don't configure this:

Runtime behavior Standalone Plugin
Model calls Shell out: franklin --model Native ctx.callModel()
Cost tracking Parse CLI routing footer Native session ledger
Wallet settlement Franklin wallet (via CLI) Franklin wallet (native)
Franklin /cost visibility
Franklin /insights
Startup overhead Minimal Higher (loads Franklin core)
PatchPilot artifacts .patchpilot/runs/<id>/ (same) .patchpilot/runs/<id>/ (same)

Not sure which to use?

Start standalone. Upgrade to plugin mode if your team adopts Franklin more broadly — no config migration needed, the same .patchpilot.yml works for both.


Architecture

┌────────────────────────────────────────────────────────────────────┐
│                        PatchPilot CLI                              │
│                                                                    │
│   repair · repair-gh · diagnose · init · doctor · runs             │
└───────────────────┬────────────────────────────────────────────────┘
                    │
                    ▼
┌────────────────────────────────────────────────────────────────────┐
│                     Unified Workflow Runner                        │
│                                                                    │
│   If @blockrun/franklin installed → native Franklin WorkflowRunner │
│   Else → standalone runner that shells out to `franklin --model`   │
└───────────────────┬────────────────────────────────────────────────┘
                    │
                    ▼
┌────────────────────────────────────────────────────────────────────┐
│                    PatchPilot Workflow (7 steps)                   │
├────────────────────────────────────────────────────────────────────┤
│                                                                    │
│   Adapters              Failure Engine          Franklin           │
│   ────────              ──────────────          ────────           │
│   local-command         log-normalizer          workflow.ts        │
│   log-file              secret-redactor         prompt-builder     │
│   github-cli            classifier              patch-applier      │
│                                                 standalone-runner  │
│                                                                    │
│   Verification          Policy                  Reporting          │
│   ────────────          ──────                  ─────────          │
│   verify command        loadPolicy              markdown-reporter  │
│   collectDiff           checkFailureType        terminal-reporter  │
│   getChangedFiles       checkForbiddenPaths                        │
└────────────────────────────────────────────────────────────────────┘
                    │
                    ▼
┌────────────────────────────────────────────────────────────────────┐
│                         Artifacts                                  │
│                                                                    │
│   .patchpilot/runs/<run-id>/                                       │
│   ├── run.json          (full run metadata)                        │
│   ├── ledger.json       (cost + routing, per step)                 │
│   ├── report.md         (PR body / human-readable report)          │
│   ├── patch.diff        (the applied unified diff)                 │
│   ├── failure.log       (raw CI log)                               │
│   ├── redacted-log.txt  (secret-redacted version sent to model)    │
│   └── verify-*.log      (verification command output)              │
└────────────────────────────────────────────────────────────────────┘

Safety rails

Default behavior Override
Refuse dirty working tree --allow-dirty
Draft PRs only createPrDefault: ready
Never patch forbidden paths edit .patchpilot.yml
Never exceed --budget
Redact secrets before any model
Require passing verify command
Skip environment_missing_secret explicit allow list
Skip network_or_infra explicit allow list
Triage-only for high-risk failures

Policy lives in .patchpilot.yml:

version: 1
repair:
  max_attempts: 3
  create_pr_default: draft
  allowed_failure_types: [lint, format, typecheck, unit_test, dependency_config]
  forbidden_failure_types: [environment_missing_secret, network_or_infra]
  forbidden_paths: [".env*", "secrets/**", "infra/prod/**"]
  require_human_review_for: ["auth/**", "billing/**", "payments/**", "migrations/**"]
verification:
  required_commands: ["npm test"]

Why this works

Most AI coding agents treat CI repair as "send log → get patch". That's a single pro-tier call, and it's wasteful:

  • 60% of CI failures don't need LLM reasoning at all. Lint, format, snapshot updates, lockfile drift — these are pattern-level problems with pattern-level fixes.
  • Classification ≠ repair. Deciding what kind of failure this is is a much simpler task than fixing the code. Free-tier or local matching handles it.
  • Verification is not AI's job. Running npm test is deterministic and free. Shelling out is faster than asking a model to predict whether tests pass.

PatchPilot encodes this as a workflow. Franklin's Smart Router handles the rest.


Project layout

src/
├── adapters/              Input adapters (local / log / GitHub)
├── failure-engine/        Log normalizer, secret redactor, classifier
├── policy/                .patchpilot.yml loader + enforcement
├── franklin/              Franklin plugin surface
│   ├── workflow.ts        ← the 7-step Workflow
│   ├── prompt-builder.ts
│   ├── patch-applier.ts
│   ├── runner.ts          ← unified native/standalone runner
│   └── standalone-runner.ts
├── verification/          Command execution + diff collection
├── reporting/             Markdown + terminal reporters
├── types/                 Zod schemas + Franklin SDK types
├── utils/                 Storage, IDs
├── cli/                   Orchestrator + CLI subcommands
├── index.ts               Plugin entry (exports default Plugin)
└── cli.ts                 Standalone CLI entry

tests/                     28 tests, 7 suites
plugin.json                Franklin plugin manifest

Status

Feature Status
Local Mode ✅ implemented
Log Mode ✅ implemented
GitHub CLI Mode (repair-gh) ✅ implemented
Managed Mode (GitHub App + webhook) ❌ planned
Franklin Plugin SDK integration ✅ implemented
Native Franklin runner (auto-detect) ✅ implemented
Standalone fallback runner ✅ implemented
Secret redaction ✅ implemented
Policy enforcement ✅ implemented
Cost ledger with real model routing ✅ implemented
Draft PR creation via gh ✅ implemented
Sample artifacts + benchmark docs ✅ included
44 tests, TypeScript strict mode

License

Apache-2.0. See LICENSE.

Built on Franklin — the AI agent with a wallet.

About

Smart-routed CI repair under a hard AI budget. Franklin plugin that turns red CI into reviewed, locally verified pull requests.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors