Skip to content

dicnunz/agentproof

Repository files navigation

AgentProof

AgentProof CI

AgentProof social preview: local proof reports for AI-built apps and public PRs

Social preview path: check fit, scope checkout, then send the public target. No calls, no secrets, no private repo access.

AgentProof is a free local proof harness for AI-built apps, repos, and public PRs.

It answers the question every agentic coding demo eventually runs into:

Did this app actually work, or did it just look good in the chat transcript?

Run AgentProof against a repo and it detects the project, runs safe local checks, starts the app when possible, crawls configured routes in Playwright, captures screenshots and console errors, checks basic accessibility and broken links, scores launch readiness, and writes a polished static report to agentproof-report/index.html.

No paid API. No API key. No hosting. No database. No telemetry. No source upload.

GitHub Overview Fast Path

Most people landing here need one of three things:

Use the free local run when it answers the question. Buy only when an outside written proof packet would save review churn or launch risk. Keep the same AP-... audit reference from checkout through intake, payment match, fulfillment, and delivery proof.

If You Cloned This

There are two clean paths:

  • Run the free local proof pass yourself:

    npm install
    npm run build
    node dist/src/cli.js audit . \
      --no-browser
    open agentproof-report/index.html
  • If you want the proof packet done for one public repo, demo, or PR, buy the fixed $149 async mini-audit, then send the public target through the handoff page. No calls, no secrets, no private repo access.

Do not buy if the free report answers the question. Buy only when you need an outside written proof packet for launch, handoff, posting, or PR review. The $149 pass should be cheaper than the review churn or launch risk it replaces: it breaks even after 3 hours saved at $50/hour, or 1.5 hours at $100/hour.

Paid path:

Paid Async Audit

Want the report generated and interpreted for one public AI-built app, repo, or PR?

The fixed-price AgentProof Mini Audit is $149, written, async, and no-call. Send a public repo/demo or public PR with safe inputs; get an AgentProof report, score, blocking issues, maintainer/review risk signals, first fixes, and stop rules.

Use the landing-page target router, intake builder, or buyer handoff when you want a prefilled safe email/GitHub brief before sending the target. The landing router accepts owner/repo, a public GitHub PR URL, or a public demo URL, assigns a stable AP-... audit reference, and carries that public target plus reference into checkout, intake, and handoff without asking for secrets. Repo-specific links support ?repo=owner/repo on the precheck, checkout, buyer handoff, and intake handoff. Targeted checkout pages also show the exact post-payment packet to send after Gumroad, including any source report score from an AgentProof report and the audit reference for matching the Gumroad payment to the intake. Do not send secrets, production credentials, private customer data, payment data, or private repo access.

Why It Exists

AI agents can build fast. The bottleneck is proof.

AgentProof is the local evidence layer you run after Codex, Claude Code, Cursor, Replit, Windsurf, or another coding agent says the work is done. It turns "trust me, it passed" into a local artifact with commands, screenshots, route status, console output, accessibility checks, broken links, and a deterministic score.

Quickstart

From source:

git clone https://github.com/dicnunz/agentproof.git
cd agentproof
npm install
npm run build
node dist/src/cli.js audit .

Short form after build:

node dist/src/cli.js .

Install the current release tarball:

npm install -g https://github.com/dicnunz/agentproof/releases/download/v0.1.7/nicdunz-agentproof-0.1.7.tgz
agentproof audit .

Hosted quickstart:

The install quickstart takes owner/repo once, then prints the local audit commands, CI badge command, and repo-scoped checkout/intake/handoff links with the same AP-... audit reference.

The npm package name is prepared for a future publish, but this release is distributed through the GitHub release tarball.

Open the report:

open agentproof-report/index.html

Serve it locally:

agentproof report latest --serve

Demo

This repo includes a tiny demo app with one intentional issue and one fixed route.

npm install
npm run demo
open agentproof-report/index.html

The demo report should show:

  • passing local command checks
  • route screenshots
  • a deliberate console error on /issues
  • a broken internal link from /
  • a clean /fixed route
  • a generated proof-card.svg

Refresh the hosted demo report:

npm run service:demo-report

Run AgentProof in GitHub Actions:

agentproof init-action .

Print a copyable README badge/proof block for the same workflow:

agentproof init-action . --badge --repo owner/repo

Or generate the workflow in the browser:

Release proof:

- uses: actions/checkout@v6
- uses: dicnunz/agentproof@v0.1.7
  with:
    path: .
    output: agentproof-report
    write-summary: "true"
    min-score: "75"

This repo dogfoods the action in .github/workflows/agentproof.yml. The action writes score/blockers, the report's source-preserving reader URL, and the repo-targeted no-call outside-proof handoff to the GitHub Step Summary, uploads the report artifact, then enforces min-score. Direct hosted-reader preload requires a public results.json URL; private GitHub artifacts stay private.

For a public GitHub PR proof-triage sample:

npm run service:pr-demo
open services/sample-pr-proof-report.md

For a public-repo DX audit sample:

npm run service:dx-demo
open services/sample-dx-audit-report.md

For a paid DX audit intake after private payment verification:

npm run service:fulfill -- --issue <issue-number-or-url> --payment-verified

For a public PR service-intake issue:

npm run service:fulfill -- --issue <issue-number-or-url> --payment-verified

For a paid public app/repo intake or paid email intake body. Public app/repo intakes run AgentProof against the safe local clone supplied with --local-repo, or a fresh public clone when no local clone is supplied, and return an app-report-ready customer report:

npm run service:fulfill -- --email-file services/sample-email-intake.txt --email-from buyer@example.com --email-subject "AgentProof Mini Audit intake" --payment-verified

After a customer report is generated, build the ready-send delivery draft and log template:

npm run service:delivery-draft -- --report services/customer-reports/agentproof-mini-audit-intake.md --buyer-email buyer@example.com
npm run service:delivery-log -- --buyer-email buyer@example.com --report services/customer-reports/agentproof-mini-audit-intake.md --sent-at 2026-05-10T20:10:00Z --sent-via Gmail --message-url https://mail.google.com/mail/#all/real-message --output services/private/delivery-log.md
npm run service:delivery-log -- --verify services/private/delivery-log.md

If the customer report came from an existing AgentProof score, the delivery draft, log template, and real delivery log preserve that source-report line so a paid handoff stays tied to the exact proof that opened the service path.

Before paid fulfillment, match the Gumroad sale email to the intake without committing raw receipt data. If the buyer includes an AP-... audit reference from checkout, service:payment-check records it in the private payment report, blocks mismatched sale/intake references, and the paid-fulfillment path carries the same reference through the customer report, delivery draft, delivery log template, and completion audit.

npm run service:payment-check -- --sale-email-file services/private/gumroad-sale.txt --intake-email-file services/private/intake.txt --buyer-email buyer@example.com

Use --payment-verified on service:fulfill only after service:payment-check returns payment-verified. Text inside a public issue or email that says the buyer paid is treated as context only; it never unlocks fulfillment without the private payment-check result.

When the first real sale arrives, prepare the ignored private capture files before pasting receipt or intake data:

npm run service:capture-sale

This writes services/first-customer-capture-latest.md plus template files under ignored services/private/. The templates are not proof and do not unlock fulfillment; copy or rename them to services/private/gumroad-sale.txt and services/private/intake.txt only after a real buyer signal exists, then run service:first-customer-status and service:payment-check.

For the first paid customer, run the paid-fulfillment pipeline from ignored private receipt/intake files. It runs payment-check first, then fulfillment, then creates a private delivery draft only when a customer report is ready:

npm run service:paid-fulfillment -- --sale-email-file services/private/gumroad-sale.txt --intake-email-file services/private/intake.txt --buyer-email buyer@example.com

Add --issue <issue-number-or-url> for a public GitHub intake issue, or let the pipeline use the intake email body as the fulfillment source. Keep real delivery drafts and logs under services/private/.

To see exactly what the first paid-customer path is missing without exposing buyer data:

npm run service:first-customer-status

This writes services/first-customer-status-latest.md with file-presence checks, status markers, the allowed next action, and the remaining proof boundary. It does not print raw sale emails, buyer addresses, or intake bodies.

To rehearse the full paid path with fake fixture data before a real buyer arrives:

npm run service:paid-rehearsal

This refreshes the hosted paid fulfillment rehearsal, sample payment verification, sample customer report, sample delivery draft, and sample delivery log template. It is not revenue proof.

Before any revenue-loop action:

npm run service:loop

Before marking the money-making loop complete:

npm run service:completion-audit

service:completion-audit writes services/completion-audit-latest.md and exits nonzero until there is real non-sample evidence for collected payment, customer fulfillment, and delivery. The delivery log must pass service:delivery-log -- --verify <delivery-log> with Status: sent, a real sent timestamp, and a message URL; a draft or template is not enough. If payment or fulfillment includes a source-report score, the sent-delivery log must preserve the same source report before completion can pass.

If the loop says contact-one-reply-eligible, verify the exact recipient before any email, X reply, GitHub comment, or DM:

npm run service:leads
npm run service:outreach-guard -- --recipient sha256:<hash> --channel email

For a recipient already in services/outreach-ledger.json, prefer --signal-at <ISO timestamp> for the new reply, paid sale, or intake. The guard only allows the reply when that timestamp is newer than the latest ledger entry. Use --fresh-buyer-signal only as a manual fallback when the event timestamp is unavailable but freshness has been verified.

Lead radar searches both AI-PR pain and buyer-intent terms, then marks public candidates as reply-eligible, watch-only, or do-not-contact. Treat anything but reply-eligible as research only. service:market-proof turns the latest watch-only research into docs/market-proof/index.html, docs/review-load-sample/index.html, services/market-proof-latest.md, and services/review-load-sample-latest.md so each loop leaves reusable owned conversion and fulfillment assets. service:capture-sale prepares ignored first-customer receipt/intake templates and a public-safe capture report so the next run starts from the exact private handoff instead of rediscovering the same blocker. service:first-customer-status keeps the first real buyer path from being miscounted by showing whether the blocker is sale receipt, intake, payment check, fulfillment, sent delivery, or completion audit. service:paid-rehearsal proves the sample receipt-to-delivery path reaches delivery-draft-ready without counting fixture data as revenue. service:buyer-trust verifies the buyer-facing pages and service docs still show the free-first fit gate, AP reference continuity, no-secrets/no-receipts boundary, private payment-match rule, and no revenue or outcome overclaims. service:loop refreshes the revenue radar, lead radar, market-proof page, review-load sample, conversion snapshot, and buyer-trust report, then emits the allowed next action: fulfill paid intake, contact one reply-eligible buyer-shaped signal, improve conversion, or ship one owned-surface improvement.

CLI

agentproof audit [path]
agentproof demo
agentproof report [latest]

Agent Skill

Agents can use the included AgentProof skill as a runbook for verifying AI-built apps and generating service-ready proof reports.

Useful options:

agentproof audit . --output proof
agentproof audit . --port 5173
agentproof audit . --no-browser
agentproof audit . --install
agentproof audit . --min-score 85

Exit codes:

  • 0: audit completed
  • 1: AgentProof runtime failure
  • 2: audit completed but --min-score was not met

What AgentProof Detects

Project detection:

  • package manager: npm, pnpm, yarn, bun
  • likely framework: Vite, Next.js, React, Vue, Svelte, Astro, Express, static/node, unknown
  • package scripts: build, test, lint, typecheck, dev, start, preview
  • README, license, contributing file
  • common local app ports

Command checks:

  • runs detected build, test, lint, and typecheck scripts
  • supports explicit config overrides
  • captures command, cwd, timestamps, duration, exit code, stdout, stderr
  • truncates long output so reports stay readable

Browser proof:

  • starts a local server from preview, dev, start, or agentproof.config.json
  • crawls configured routes with Playwright
  • captures full-page screenshots
  • records severe console messages and page errors
  • detects visible runtime error text
  • checks missing alt text, nameless controls, unlabeled inputs, missing html lang, and h1 structure
  • checks internal links for 4xx/5xx failures

Configuration

Create agentproof.config.json in the target repo:

{
  "appName": "My App",
  "routes": ["/", "/dashboard", "/settings"],
  "port": 5173,
  "startCommand": "npm run dev -- --host 127.0.0.1 --port 5173",
  "commandTimeoutMs": 120000,
  "serverTimeoutMs": 30000,
  "maxLinks": 30,
  "checks": {
    "build": true,
    "test": true,
    "lint": true,
    "typecheck": "npm run check-types"
  }
}

To skip a noisy check:

{
  "checks": {
    "lint": false
  }
}

AgentProof does not install dependencies by default. Use --install or "allowInstall": true when you explicitly want it to run a safe install command. The default install mode uses --ignore-scripts.

Scoring

The AgentProof Score is deterministic. It does not call an LLM judge.

Categories:

  • Reliability
  • UI proof
  • Accessibility
  • Docs/readme quality
  • Test/build health
  • Launch readiness

Every point deduction includes a reason and a suggested fix in the report.

Report Artifacts

AgentProof writes:

agentproof-report/
  index.html
  results.json
  report-handoff.md
  proof-card.svg
  screenshots/
  report-qa.png

index.html is the human report. results.json is the machine-readable audit output. report-handoff.md is the buyer/user delivery packet with the same AP-... reference, proof contract, no-secrets boundary, and next action. proof-card.svg is a shareable card for docs or manual launch posts. report-qa.png is created by npm run verify:report.

Privacy

AgentProof is local-first:

  • no telemetry
  • no source upload
  • no API key
  • no remote database
  • no hosted dashboard
  • no paid service

It runs local commands in the target repo and opens local browser routes. Public internet access is not required to audit a local app.

FAQ

Is this a replacement for tests?

No. It is a proof bundle around the checks and UI paths your repo already exposes. It makes missing proof visible.

Does it prove correctness?

No harness can prove full correctness. AgentProof proves a stronger minimum: local checks ran, routes opened, screenshots exist, severe browser failures were captured, and the deductions are visible.

Will it run unsafe scripts?

It runs local package scripts that already exist in the repo or commands you explicitly configure. It does not install by default, and the built-in install path uses --ignore-scripts.

Does it use OpenAI, Anthropic, or any other model API?

No. AgentProof is deterministic and local. It works after any coding agent, but it does not depend on one.

What if Playwright browsers are missing?

Install the free local browser runtime:

npx playwright install chromium

Roadmap

  • optional trace/video capture
  • richer accessibility rules
  • config presets for common frameworks
  • changed-file risk signals
  • optional coverage ingestion
  • report comparison between two runs

Development

npm install
npm run typecheck
npm run lint
npm test
npm run build
npm run demo
npm run verify:report
npm run score:launch

License

MIT