Skip to content

Jacobious52/veritas

Repository files navigation

veritas

veritas is a Tree-sitter testing oracle for AI-written and AI-modified software.

It is a CLI harness for mutation testing, property testing, fuzzing, coverage feedback, corpus replay, differential behavior checks, and evolutionary analysis across Rust, Go, Python, TypeScript/JavaScript, and future Tree-sitter language plugins.

It answers the question ordinary test runs often miss:

Would the current tests catch the subtle mistakes an AI coding agent is likely to make?

veritas maps changed code to verification targets, generates reviewable harnesses, runs scoped tests under budgets, and writes CI-friendly reports plus AI-ready repair prompts.

The default path is deterministic and does not call an LLM. An optional external planner hook can be enabled for AI-assisted planning while veritas still owns execution scope, budgets, and artifact writes.

Project site: Jacobious52.github.io/veritas

Why It Feels Different

  • It gives an AI agent a concrete next-test queue instead of a vague "add more tests" warning.
  • It keeps generated tests reviewable and removable through .veritas/ artifacts and veritas cleanup.
  • It is built around a generic plugin contract: Rust, Go, Python, and TypeScript/JavaScript work today, and future languages can reuse the same reports through Tree-sitter symbols, line ranges, command budgets, mutation campaigns, replay, and scoring.
  • It is designed for bigger repos: changed-target selection, package/workspace awareness, command budgets, optional Rust cgroup/systemd limits, phase timing telemetry, CI profiles, benchmark fixtures, and external canaries.

Install

Prebuilt Linux and macOS binaries:

curl -fsSL https://github.com/Jacobious52/veritas/releases/latest/download/install.sh | sh

Install a specific release:

curl -fsSL https://github.com/Jacobious52/veritas/releases/latest/download/install.sh | VERSION=v0.1.1 sh

Cargo fallback:

cargo install veritas-cli --locked

From the Git repository:

cargo install --git https://github.com/Jacobious52/veritas veritas-cli --locked

For local development:

git clone https://github.com/Jacobious52/veritas.git
cd veritas
cargo build --workspace
cargo run -p veritas-cli -- scan

Optional tools:

# Go verification
go version

# Python verification
python3 --version
python3 -m coverage --version

# TypeScript/JavaScript verification
bun --version

# Rust coverage, only used when coverage_enabled = true
cargo install cargo-llvm-cov

Quick Start

Bootstrap a repo:

veritas init --ci --agent-instructions

Use veritas on a changed branch:

veritas review-ai
veritas verify --changed --profile ci
veritas score
veritas repair-prompt
veritas report --format markdown

Verify a specific target:

veritas verify --lang rust --target src/lib.rs
veritas verify --lang go --target ./pkg/invoice
veritas verify --lang python --target invoice.py
veritas verify --lang typescript --target src/invoice.ts

Explain and promote findings:

veritas explain <finding-id>
veritas promote-repro --dry-run
veritas evolve --dry-run
veritas evolve --index 0
veritas evolve --index 0 --evaluate
veritas replay-corpus --dry-run
veritas accept-quality-baseline
veritas accept-baseline --id <finding-id>
veritas cleanup

What a useful run looks like:

mutation survived: refund_cents <= available_cents -> refund_cents < available_cents
fuzz seed saved: " 12.34 " reproduced parser drift
replay drift: AuthorizeRefund("support", 500) changed behavior
next agent step: promote assertion candidate, rerun, keep only if the mutant dies

Documentation

  • AI Agent Guide: copy-paste instructions and review loop for coding agents.
  • Install Guide: release binary, cargo, git, and GitHub Actions setup.
  • AI Verification Loops: tangible Rust, Go, Python, TypeScript/JavaScript, and agent-loop examples.
  • Project Site: GitHub Pages landing page and public overview.
  • Evolution Demo: real before/candidate/after loop from the Go evolution fixture.
  • Production Guide: large-repo Go/Rust operation, budgets, CI policy, and host safety.
  • Architecture: workspace layout, plugin contract, artifacts, and planner model.
  • Plugin SDK: language plugin contract and the Python plugin path.
  • Confidence Guide: fixture tiers, seeded examples, and external canaries.
  • Releasing: crates.io publishing through GitHub Actions.

CLI Surface

veritas scan
veritas init --ci --agent-instructions
veritas review-ai
veritas review-packet
veritas verify --changed
veritas verify --changed --profile ci
veritas verify --lang rust --target path/to/file.rs
veritas verify --lang go --target ./pkg/foo
veritas verify --lang python --target path/to/file.py
veritas verify --lang typescript --target path/to/file.ts
veritas generate --kind property --target path
veritas generate --kind fuzz --target path
veritas run
veritas report --format markdown
veritas report --format sarif
veritas report --format junit
veritas mutants list --lang rust --target src/lib.rs --diffs
veritas mutants list --lang rust --target . --format json --shard-index 0 --shard-count 4
veritas mutants list --lang go --target . --format json --domain database
veritas mutants run --lang rust --target src/lib.rs --from-campaign .veritas/mutations/rust_campaign.json --status lived
veritas mutants merge .veritas/mutations/shard-*/rust_campaign.json --output .veritas/mutations/rust_merged.json
veritas next --explain
veritas score
veritas score --mode all
veritas badge
veritas accept-quality-baseline
veritas replay-corpus
veritas repair-prompt
veritas agent-instructions --agent codex
veritas explain <finding-id>
veritas promote-repro
veritas promote-repro --index 0
veritas promote-regression
veritas promote-regression --index 0
veritas evolve --dry-run
veritas evolve --index 0
veritas evolve --all-selected
veritas evolve --all-selected --evaluate
veritas conformance
veritas accept-baseline --id <finding-id>
veritas accept-baseline --all
veritas bench --root examples
veritas bench --root examples --format json
veritas bench --root examples --suite veritas-confidence-suite.toml --format json
veritas cleanup
veritas cleanup --dry-run

Capabilities

Language and plugin model:

  • Rust, Go, Python, and TypeScript/JavaScript plugins are available today
  • Tree-sitter discovery provides symbols, methods, line ranges, and risk surfaces where grammars support them
  • each plugin owns language-specific discovery, generated artifacts, command execution, coverage, replay compilation, and mutation operators
  • the core owns shared scoring, policy, replay manifests/results, baselines, corpus entries, mutation campaign records, evolution suites, SARIF/JUnit/Markdown rendering, and AI repair prompts
  • future language plugins can add their own Tree-sitter grammar and map into the same target/report/artifact contract

Changed-target verification:

  • reads git diffs, staged changes, and untracked files
  • maps changed lines to discovered Rust/Go/Python/TypeScript/JavaScript symbols when line ranges are available
  • scopes package commands to changed packages and selected reverse dependencies where graph data exists
  • writes AI review artifacts with change digests and verification guidance

Rust verification:

  • detects packages and virtual workspaces through Cargo.toml
  • discovers public free functions and public methods with Tree-sitter
  • writes package-local proptest integration harnesses for supported public free functions, including no-panic and deterministic-output properties where signatures allow them
  • runs cargo test --all-targets with configurable jobs, test threads, command timeouts, and optional systemd scope limits
  • runs AST-scoped mutation probes, including comparison, boundary, async/task, synchronization, database, retry, testability, and brittleness domains, then reports correctness survivors separately from behavior-preserving brittleness probes
  • collects cargo llvm-cov --summary-only when enabled
  • writes Rust symbol graph artifacts under .veritas/symbol_graph/

Go verification:

  • detects one or more go.mod roots
  • discovers exported functions and methods with Tree-sitter
  • builds package graphs with go list -json ./...
  • runs scoped go test commands for selected packages plus configurable reverse dependencies
  • discovers handwritten and generated fuzz targets
  • writes testing.F fuzz harnesses for exported free functions with supported Go fuzz parameter types and edge-case seed rows
  • runs relevant go test -run=^$ -fuzz=... targets through a bounded scheduler within caps and timeouts
  • applies build tags to Go list, test, fuzz, coverage, and mutation commands
  • runs AST-scoped mutation probes for comparisons, nil/error branches, return defaults, boolean connectors, arithmetic and bitwise operators, assignment operators, increment/decrement statements, unary negation, loop control, literal flips, self-assignments, goroutine/defer/context lifecycle, locks, transactions, tenant/idempotency strings, retry/backoff seams, and domain-labeled risk surfaces
  • writes package graph, package-awareness, and symbol graph artifacts

Python verification:

  • detects Python projects through pyproject.toml or Python source roots
  • discovers functions with Tree-sitter and emits symbol graph artifacts
  • runs python3 -m pytest -q when the project prefers pytest and it is installed, otherwise falls back to python3 -m unittest discover
  • writes reviewable Hypothesis property candidates and executes them when both hypothesis and pytest are installed, otherwise records a skipped command
  • collects coverage through coverage.py when enabled
  • runs executable source-range mutation checks for supported comparisons, boolean connectors, default returns, database strings, async/testability seams, and brittleness probes
  • supports replay cases for primitive single-argument and multi-argument public functions

TypeScript/JavaScript verification:

  • detects projects through package.json, tsconfig.json, jsconfig.json, or JS/TS source roots
  • discovers functions, class methods, and arrow/function-expression exports with Tree-sitter grammars for TypeScript, TSX, and JavaScript
  • emits symbol graph artifacts with signatures, line ranges, call hints, params, and risk labels
  • writes executable Bun property checks for supported exported free functions, with deterministic/no-throw markers for property quality scoring
  • runs source-range mutation checks for comparisons, strict equality, boolean guards, default returns, and string normalization
  • executes batched differential replay for supported primitive exported free functions
  • runs the project test command through Bun, npm, pnpm, or Yarn; Veritas-generated TS/JS probes use Bun when available and record skipped commands when the runtime is missing
  • collects optional Bun lcov coverage and turns uncovered TS/JS ranges into AI-readable assertion focus
  • mutates Tree-sitter-backed TS/JS constructs including comparisons, equality, boolean guards, optional chaining, nullish coalescing, async/await, object spread, env/config reads, array bounds, HTTP methods, defaults, and string normalization

Reports and artifacts:

  • renders Markdown, JSON, SARIF 2.1.0, and compact JUnit XML
  • saves the latest report to .veritas/report.json
  • lists and previews candidate mutants without executing tests through veritas mutants list, including JSON output, byte-range spans, diff previews, shard/filter controls, risk notes, and suggested tests
  • runs benchmark suites from veritas-bench.toml in temporary project copies and scores expected findings, commands, thresholds, and metrics
  • reports mutation score attribution/trends, per-mutant campaign records, per-run survivor diffs/logs, assertion candidates, corpus entries/replay, differential replay cases, budget skips/timeouts, property-test strength, fuzz execution, and persisted repro counts in .veritas/report.json
  • summarizes current confidence and baseline deltas with veritas score
  • writes API signature baselines and accepted finding baselines
  • writes coverage feedback, mutation feedback, assertion candidates, corpus entries, replay manifests/results, budget plans, mutation trend JSON, mutation campaign JSON, tail-able mutation run directories under .veritas/mutations/runs/, evolutionary candidate suites and generation outcomes with fitness/selection signals, repro notes, candidate verification patches, regression notes, evolution plans, promoted regression scaffolds, and promotion notes
  • veritas evolve --index <n> --evaluate and --all-selected --evaluate now emit before/after proof artifacts and remove generated candidates that regress or fail evaluation
  • veritas conformance checks the plugin contract for stable IDs, source-relative paths, function symbols, line ranges, and existing target files
  • cleans generated artifacts with veritas cleanup

Scale and performance posture:

  • changed branches are verified before full-repo sweeps; --changed is the default CI profile path
  • Go package graphs and Rust workspace discovery keep command scope close to the edited surface
  • command budgets, fuzz caps, mutation caps, package caps, and policy filters are configurable per repo
  • Rust test and coverage commands can run inside systemd scopes with CPU and memory limits on shared hosts
  • target discovery writes .veritas/cache/<language>_targets.json and reports cache hits as target_cache artifacts so stable large-repo scans can avoid repeated Tree-sitter discovery
  • every report records phase timings for discovery, generation, test execution, coverage, replay, synthesis, and total runtime
  • benchmark suites and external canaries track whether Veritas still works beyond tiny fixtures
  • near-term performance goals are plugin-safe concurrency, adaptive mutation sampling, and reusable corpus/baseline data across runs

CI behavior:

  • .github/workflows/ci.yml runs format, workspace tests, clippy, and Rust/Go/Python/TypeScript fixture scan/verify smoke checks on pull requests and pushes to main
  • CI also runs veritas conformance across the Rust, Go, Python, and TypeScript fixtures
  • veritas verify --profile ci implies --changed
  • CI profile disables full coverage, tightens package/fuzz/mutation/time caps, and enables policy-based failure on error severity by default
  • policy filters can select severity, language, artifact kind, and target risk
  • accepted finding IDs support new-findings-only CI behavior

Consumer GitHub Actions starter:

name: Veritas
on: [pull_request]
jobs:
  verify:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v5
        with:
          fetch-depth: 0
      - run: curl -fsSL https://github.com/Jacobious52/veritas/releases/latest/download/install.sh | sh
      - run: veritas verify --changed --profile ci
      - run: veritas repair-prompt --github-step-summary
        if: always()

Config

Create veritas.toml or .veritas.toml in the target repo:

[veritas]
budget_seconds = 120
write_generated_tests = true
fail_on_generated_test_failure = true
fail_on_findings = false

[planner]
mode = "deterministic"
# mode = "external_llm"
# command = "my-veritas-planner"
# fail_on_error = false

[policy]
fail_on_severity = "error"
fail_on_languages = []
fail_on_artifact_kinds = []
fail_on_target_risks = []
min_mutation_score = 70
min_mutation_efficacy = 70
min_mutant_coverage = 80

[mutation]
# Shared by language plugins. Operator names are intentionally generic so
# future Tree-sitter plugins can map their own AST mutations onto the same
# campaign/report model.
enabled_operators = []
disabled_operators = []
enabled_domains = []
disabled_domains = []
include_paths = []
exclude_paths = []
include_symbols = []
exclude_symbols = []
include_target_ids = []
exclude_target_ids = []
include_mutant_ids = []
exclude_mutant_ids = []
report_filtered = false
dry_run = false
max_mutants = 8
disable_test_selection = false # set true to run the broader verification package set for every mutant
baseline_timing = false # set true to derive mutation timeout metadata from the baseline test duration
workers = 1 # Rust/Go use isolated temp roots when workers > 1; keep small repos serial by default
isolation_exclude_paths = [] # extra names or relative paths to skip in isolated mutation copies
test_cpu = 1
timeout_coefficient = 1
timeout_min_seconds = 10
timeout_max_seconds = 120
shard_index = 0
shard_count = 1
output_statuses = [] # e.g. ["lived", "not_covered", "timed_out"]

[plugins.rust]
property_framework = "proptest"
command_timeout_seconds = 120
coverage_enabled = false
coverage_timeout_seconds = 120
cargo_jobs = 1
test_threads = 1
systemd_scope = false
memory_max = "4G"
cpu_quota = "200%"

[plugins.go]
fuzz_seconds = 10
fuzz_existing = true
fuzz_concurrency = 2
coverage_enabled = false
reverse_dependency_depth = 1
max_fuzz_targets = 20
command_timeout_seconds = 120
max_packages = 64
max_mutants = 8
build_tags = []

By default mutation runs select the narrowest package-level test commands the plugin can justify. Rust uses symbol/package ownership; Go uses the package graph plus reverse dependencies. Set disable_test_selection = true when a repo has global integration fixtures, hidden build tags, or cross-package side effects that make broad mutation commands safer than local selection.

Mutation filters are evaluated as include filters first, then exclude filters. Patterns accept exact:..., glob:... or * wildcards, and regex:... where the active plugin supports regex matching; legacy unprefixed patterns keep substring matching. Use include_target_ids / exclude_target_ids for lang:path:symbol targets and include_mutant_ids / exclude_mutant_ids for stable per-mutant IDs. Add veritas:skip-mutation inside a Rust, Go, Python, or TypeScript/JavaScript function to suppress local mutants, and set report_filtered = true when filtered candidates should appear as skipped records.

For shared machines, keep Rust coverage disabled unless needed and enable systemd scope limits:

[plugins.rust]
coverage_enabled = false
systemd_scope = true
cargo_jobs = 1
test_threads = 1
memory_max = "4G"
cpu_quota = "200%"

Development

Run the workspace checks:

cargo fmt --all
cargo test --workspace
cargo clippy --workspace --all-targets -- -D warnings

Run fixture checks:

cargo run -p veritas-cli -- scan --root fixtures/sample-rust
cargo run -p veritas-cli -- verify --root fixtures/sample-rust --lang rust --target src/lib.rs
cargo run -p veritas-cli -- cleanup --root fixtures/sample-rust --dry-run
cargo run -p veritas-cli -- verify --root fixtures/rust-workspace --lang rust --target .
cargo run -p veritas-cli -- scan --root fixtures/sample-go
cargo run -p veritas-cli -- verify --root fixtures/sample-go --lang go --target .
cargo run -p veritas-cli -- verify --root fixtures/go-multimodule --lang go --target services/billing/pkg/invoice

Run the richer example beds:

cargo test --manifest-path examples/rust-invoice/Cargo.toml
cargo run -p veritas-cli -- verify --root examples/rust-invoice --lang rust --target src/lib.rs
(cd examples/go-invoice && go test ./...)
cargo run -p veritas-cli -- verify --root examples/go-invoice --lang go --target .
cargo test --manifest-path examples/rust-commerce/Cargo.toml
cargo run -p veritas-cli -- verify --root examples/rust-commerce --lang rust --target src/lib.rs
(cd examples/go-api-service && go test ./...)
cargo run -p veritas-cli -- verify --root examples/go-api-service --lang go --target .
cargo test --manifest-path examples/rust-mutation-score/Cargo.toml
cargo run -p veritas-cli -- verify --root examples/rust-mutation-score --lang rust --target src/lib.rs
(cd examples/go-mutation-score && go test ./...)
cargo run -p veritas-cli -- verify --root examples/go-mutation-score --lang go --target .
cargo test --manifest-path examples/rust-risk-suite/Cargo.toml
cargo run -p veritas-cli -- verify --root examples/rust-risk-suite --lang rust --target src/lib.rs
(cd examples/go-risk-suite && go test ./...)
cargo run -p veritas-cli -- verify --root examples/go-risk-suite --lang go --target .
cargo run -p veritas-cli -- --root examples bench
cargo run -p veritas-cli -- --root examples bench --format json

The example projects intentionally contain hidden assumptions while their handwritten tests pass, so they are useful for validating generated property/fuzz artifacts and report output.

Run the concrete evolution demo:

cargo run -p veritas-cli -- --root examples/go-evolution-loop verify --lang go --target .
cargo run -p veritas-cli -- --root examples/go-evolution-loop score
cargo run -p veritas-cli -- --root examples/go-evolution-loop evolve --dry-run

The seeded fixture starts with 14 evolution candidates, 12 selected candidates, 4 surviving mutants, and a 55 confidence score. Promoting the top ParseInvoiceTotal candidate into owned assertions raises the mutation score from 58% to 91%, removes the surviving mutants, and raises the confidence score to 98. See docs/evolution.md for the exact before/candidate/after commands and artifact paths.

Run external canary smoke checks when you want confidence against real pinned repositories:

./scripts/run-canaries.sh smoke
./scripts/run-canaries.sh large-smoke
./scripts/run-canaries.sh verify-fast
./scripts/run-canaries.sh verify

The same canaries run weekly in GitHub Actions and can be started manually from the External Canaries workflow. large-smoke adds pinned larger Rust, Go, and Python repositories from canaries/pinned-repos.json while keeping them scan-only by default. Each run writes target/external-fixtures/reports/canary-dashboard.md with scan/verify tiers and trend deltas. Set VERITAS_CANARY_MIN_TIER, VERITAS_CANARY_MIN_CONFIDENCE, or VERITAS_CANARY_MAX_FINDINGS when a canary dashboard should fail CI on a missed threshold.

Run large-repo benchmarks when you want scale/performance signal:

./scripts/run-large-repo-benchmarks.py --manifest benchmarks/large-repos.toml --mode scan
./scripts/run-large-repo-benchmarks.py --manifest benchmarks/large-repos.toml --mode mutation-list
./scripts/run-large-repo-benchmarks.py --manifest benchmarks/large-repos.toml --mode mutation-inventory
./scripts/run-large-repo-benchmarks.py --manifest benchmarks/large-repos.toml --mode changed-only

This lane pins real Rust, Go, Python, and TypeScript/JavaScript repositories by SHA, measures Tree-sitter discovery, capped mutation preview where supported, file-level mutation inventory, and changed-only AI-agent verification, then writes target/large-repo-benchmarks/reports/large-repo-dashboard.md plus JSON trend artifacts. The TypeScript lane currently includes axios/axios for scan and mutation-inventory scale. Use mutation-list for a quick bounded sample; use mutation-inventory when you want the repo-level count of unique mutation opportunities, cap-hit paths, and domain/operator distribution.

About

CLI-first adversarial verification harness for AI-written and AI-modified software

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors