Coverify

Coverify is a command-line harness for using LLMs and other proof backends on a mathematical knowledge base without losing source grounding, verification, or durable project state. It is designed to work with Cosheaf, and uses Coflat as the mathematical document format.

In practice, Coverify does four things:

Takes a user question plus a bounded source bundle from a Cosheaf workspace or local directory.
Prepares the relevant context, preferably with an agentic gatherer that can inspect the allowed files directly.
Calls a reasoner or verifier backend and records what was asked, what came back, and which sources were used.
Returns a checked answer, or publishes it as a Cosheaf issue comment, review, PR artifact, or knowledge-base update.

The useful loop is:

question + thread + source bundle
  -> prepare relevant context
  -> call a reasoner
  -> verify the candidate answer
  -> return or publish the checked result

Coverify is for workflows such as:

answering a branch-scoped chat question using only the current project files
asking a strong model for a proof attempt while recording an audit trail
verifying whether a proposed argument is source-backed and gap-free enough to publish
turning useful model output into a reviewed Cosheaf PR or comment
evaluating whether context preparation found the passages a task actually needed

Coverify deliberately does not:

prove theorems by itself
replace a formal verifier or human mathematical review
own the wiki, chat UI, issue tracker, or PR system
keep hidden long-term project memory outside Cosheaf
browse the web or read unrelated local files by default
ship project-specific research tools in the public CLI

Principles

Durable mathematical state lives in Cosheaf.
Local files are tools, prompts, tests, and reproducibility scripts.
Python exposes stable mechanisms: source bundles, backend calls, audit bundles, schema checks, path/range validation, and verifier gates.
Agents and oracles handle judgment: context selection, route choice, mathematical reasoning, and correctness review.
Deterministic code is added only when the behavior is stable, replayable, and simpler than asking an agent to inspect the allowed context.

Install

python3 -m venv .venv
. .venv/bin/activate
python -m pip install -e .
coverify --help

From a checkout without installing:

PYTHONPATH=src python3 -m coverify --help

Main Commands

ask-oracle: send one prompt to a configured backend and record an audit bundle.
repo-oracle ask: answer against a local source bundle.
repo-oracle gather: inspect which source passages the gatherer selects.
repo-oracle eval-gather: run JSONL checks for context-preparation quality.
chat ask: create or append to a branch-scoped chat issue and answer it.
chat-reply: read a Cosheaf issue thread, run the oracle, and post a reply.
run-eval: run JSONL eval cases against a fixture, script, or enabled Codex backend.
seed-research-evals: seed selected research eval candidates into a Cosheaf workspace.
Cosheaf primitives: tree, read-file, write-file, delete-file, create-branch, issue commands, PR commands, reviews, and merge.

Examples

Direct oracle call:

PYTHONPATH=src python3 -m coverify ask-oracle \
  --prompt "Give a concise proof attempt for the current lemma." \
  --json

Self-verifying fixture smoke:

PYTHONPATH=src python3 -m coverify ask-oracle \
  --backend verifying \
  --verify-inner-backend fixture \
  --prompt "Prove that there are infinitely many prime numbers." \
  --json

Run against a local source bundle:

PYTHONPATH=src python3 -m coverify repo-oracle ask \
  --source-bundle /path/to/source-bundle \
  --message "What does the current branch prove?" \
  --json

Evaluate gather quality:

PYTHONPATH=src python3 -m coverify repo-oracle eval-gather \
  --source-bundle /path/to/source-bundle \
  --cases evals/gather/sample-math-workspace.jsonl

Reply to a Cosheaf issue chat:

PYTHONPATH=src python3 -m coverify chat-reply \
  --workspace my-workspace \
  --issue 23 \
  --backend verifying

Backend Configuration

Common environment variables:

COSHEAF_API_URL=http://localhost:3030/api/v1
COSHEAF_TOKEN=<token>
COSHEAF_USERNAME=<username>
COSHEAF_PASSWORD=<password>
COSHEAF_REVIEW_TOKEN=<review-token>
COVERIFY_BACKEND=codex
COVERIFY_CODEX_MODEL=gpt-5.5
COVERIFY_CODEX_REASONING_EFFORT=xhigh
CHATGPT_CLI=chatgpt-cli
COVERIFY_CHATGPT_TIMEOUT_SECONDS=6000

Backends share the same basic contract: prompt in, answer out, with audit metadata recorded locally. The current backend surfaces include fixture, script, Codex, ChatGPT CLI adapter, QED adapter, and the composite verifying backend.

Skills

Coverify ships repo-owned Codex skills in skills. They are the preferred operational interface for longer runs:

context building
exploration planning
proof attempts
KB writing
PR review
KB management
lightweight run loop

Install or verify the skills:

python3 scripts/link_skills.py
python3 scripts/link_skills.py --check

Documents

Design: canonical architecture and workflow contract.
Philosophy: durable-state and knowledge-base principles.
Repo-Grounded Chat: focused chat/source-bundle design.
Experiments: evaluation strategy.
Eval Problem Selection: criteria for promoting math tasks into evals.
Coflat Primer: context for Coflat documents.
References: paper-inspired design lessons.
Research-Agent Deep Dives: detailed summaries of related systems.
LLM Math Failure Modes: prover, verifier, and harness failure taxonomy.
Prover-Side Failure Modes: shareable prover failure summary and mitigations.

Prompt files under docs/prompts are compatibility references for older flows. Prefer the skills for new operational work.

Checks

python3 scripts/check_skills.py
python3 scripts/link_skills.py --check
PYTHONPATH=src python3 -m coverify --help
PYTHONPATH=src python3 -m unittest discover -s tests

GitHub Actions runs the skill manifest check, installed CLI help, Python syntax compile, and the unit test suite.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
docs		docs
evals		evals
scripts		scripts
skills		skills
src/coverify		src/coverify
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coverify

Principles

Install

Main Commands

Examples

Backend Configuration

Skills

Documents

Checks

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Coverify

Principles

Install

Main Commands

Examples

Backend Configuration

Skills

Documents

Checks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages