Coverify is a command-line harness for using LLMs and other proof backends on a mathematical knowledge base without losing source grounding, verification, or durable project state. It is designed to work with Cosheaf, and uses Coflat as the mathematical document format.
In practice, Coverify does four things:
- Takes a user question plus a bounded source bundle from a Cosheaf workspace or local directory.
- Prepares the relevant context, preferably with an agentic gatherer that can inspect the allowed files directly.
- Calls a reasoner or verifier backend and records what was asked, what came back, and which sources were used.
- Returns a checked answer, or publishes it as a Cosheaf issue comment, review, PR artifact, or knowledge-base update.
The useful loop is:
question + thread + source bundle
-> prepare relevant context
-> call a reasoner
-> verify the candidate answer
-> return or publish the checked result
Coverify is for workflows such as:
- answering a branch-scoped chat question using only the current project files
- asking a strong model for a proof attempt while recording an audit trail
- verifying whether a proposed argument is source-backed and gap-free enough to publish
- turning useful model output into a reviewed Cosheaf PR or comment
- evaluating whether context preparation found the passages a task actually needed
Coverify deliberately does not:
- prove theorems by itself
- replace a formal verifier or human mathematical review
- own the wiki, chat UI, issue tracker, or PR system
- keep hidden long-term project memory outside Cosheaf
- browse the web or read unrelated local files by default
- ship project-specific research tools in the public CLI
- Durable mathematical state lives in Cosheaf.
- Local files are tools, prompts, tests, and reproducibility scripts.
- Python exposes stable mechanisms: source bundles, backend calls, audit bundles, schema checks, path/range validation, and verifier gates.
- Agents and oracles handle judgment: context selection, route choice, mathematical reasoning, and correctness review.
- Deterministic code is added only when the behavior is stable, replayable, and simpler than asking an agent to inspect the allowed context.
python3 -m venv .venv
. .venv/bin/activate
python -m pip install -e .
coverify --helpFrom a checkout without installing:
PYTHONPATH=src python3 -m coverify --helpask-oracle: send one prompt to a configured backend and record an audit bundle.repo-oracle ask: answer against a local source bundle.repo-oracle gather: inspect which source passages the gatherer selects.repo-oracle eval-gather: run JSONL checks for context-preparation quality.chat ask: create or append to a branch-scoped chat issue and answer it.chat-reply: read a Cosheaf issue thread, run the oracle, and post a reply.run-eval: run JSONL eval cases against a fixture, script, or enabled Codex backend.seed-research-evals: seed selected research eval candidates into a Cosheaf workspace.- Cosheaf primitives:
tree,read-file,write-file,delete-file,create-branch, issue commands, PR commands, reviews, and merge.
Direct oracle call:
PYTHONPATH=src python3 -m coverify ask-oracle \
--prompt "Give a concise proof attempt for the current lemma." \
--jsonSelf-verifying fixture smoke:
PYTHONPATH=src python3 -m coverify ask-oracle \
--backend verifying \
--verify-inner-backend fixture \
--prompt "Prove that there are infinitely many prime numbers." \
--jsonRun against a local source bundle:
PYTHONPATH=src python3 -m coverify repo-oracle ask \
--source-bundle /path/to/source-bundle \
--message "What does the current branch prove?" \
--jsonEvaluate gather quality:
PYTHONPATH=src python3 -m coverify repo-oracle eval-gather \
--source-bundle /path/to/source-bundle \
--cases evals/gather/sample-math-workspace.jsonlReply to a Cosheaf issue chat:
PYTHONPATH=src python3 -m coverify chat-reply \
--workspace my-workspace \
--issue 23 \
--backend verifyingCommon environment variables:
COSHEAF_API_URL=http://localhost:3030/api/v1
COSHEAF_TOKEN=<token>
COSHEAF_USERNAME=<username>
COSHEAF_PASSWORD=<password>
COSHEAF_REVIEW_TOKEN=<review-token>
COVERIFY_BACKEND=codex
COVERIFY_CODEX_MODEL=gpt-5.5
COVERIFY_CODEX_REASONING_EFFORT=xhigh
CHATGPT_CLI=chatgpt-cli
COVERIFY_CHATGPT_TIMEOUT_SECONDS=6000Backends share the same basic contract: prompt in, answer out, with audit
metadata recorded locally. The current backend surfaces include fixture, script,
Codex, ChatGPT CLI adapter, QED adapter, and the composite verifying backend.
Coverify ships repo-owned Codex skills in skills. They are the preferred operational interface for longer runs:
- context building
- exploration planning
- proof attempts
- KB writing
- PR review
- KB management
- lightweight run loop
Install or verify the skills:
python3 scripts/link_skills.py
python3 scripts/link_skills.py --check- Design: canonical architecture and workflow contract.
- Philosophy: durable-state and knowledge-base principles.
- Repo-Grounded Chat: focused chat/source-bundle design.
- Experiments: evaluation strategy.
- Eval Problem Selection: criteria for promoting math tasks into evals.
- Coflat Primer: context for Coflat documents.
- References: paper-inspired design lessons.
- Research-Agent Deep Dives: detailed summaries of related systems.
- LLM Math Failure Modes: prover, verifier, and harness failure taxonomy.
- Prover-Side Failure Modes: shareable prover failure summary and mitigations.
Prompt files under docs/prompts are compatibility references for older flows. Prefer the skills for new operational work.
python3 scripts/check_skills.py
python3 scripts/link_skills.py --check
PYTHONPATH=src python3 -m coverify --help
PYTHONPATH=src python3 -m unittest discover -s testsGitHub Actions runs the skill manifest check, installed CLI help, Python syntax compile, and the unit test suite.