LeanEcon v2 is a formal verification API for mathematical economics claims. It combines deterministic Lean 4 checking with provider-backed formalization, proving, and benchmark reporting.
git clone https://github.com/Bonorinoa/leanecon_v2.git
cd leanecon_v2
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn src.api:app --reloadLean lives under lean_workspace/, not the repository root. If you need to verify the Lean environment explicitly:
cd lean_workspace
lake exe cache get
lake buildThe repo pins its Lean toolchain through lean_workspace/lean-toolchain and the Lake project through lean_workspace/lakefile.toml.
LeanEcon currently supports:
FORMALIZER_DRIVER=geminiwithGEMINI_FORMALIZER_MODEL=gemini-2.5-flashFORMALIZER_DRIVER=leanstralpinned tolabs-leanstral-2603FORMALIZER_DRIVER=mistralwithFORMALIZER_MODEL=mistral-large-2411(or another pinned Mistral formalizer model)LEANECON_DRIVER=mistralwithMISTRAL_MODEL=labs-leanstral-2603GOOGLE_API_KEY=...for the Gemini formalizer and optional planner experimentsMISTRAL_API_KEY=...for the prover and sub-agent helpers
GEMINI_FORMALIZER_MODEL controls the Gemini-backed planner/formalizer path. LEANECON_PLANNER_MODE=deterministic remains the safe default and bypasses the LLM planner entirely; LEANECON_PLANNER_MODE=llm enables the two-node pipeline where Gemini plans and the selected formalizer driver generates the Lean stub. MISTRAL_MODEL controls proving and the Mistral-backed runtime paths, including /api/v2/verify.
benchmark_baselines/local_gate/- tracked local-gate benchmark baseline used by reporting and regression gating
evals/claim_sets/- benchmark and canary source inputs; see
evals/claim_sets/README.mdfor the canonical taxonomy
- benchmark and canary source inputs; see
.cache/- benchmark artifacts, experimental planner logs, cache entries, and other disposable runtime data
lean_workspace/- Lean project, toolchain pin, and economics preamble library
| Workflow | Trigger | Purpose |
|---|---|---|
ci.yml |
push to main, PRs to main |
Build the CI/runtime image from the published Lean base and run deterministic tests |
build-lean-base.yml |
push to main when lean_workspace/** changes, manual |
Publish the reusable Lean base image to GHCR |
benchmark-local.yml |
manual | Run the full local-gate benchmark matrix and upload canonical artifacts |
benchmark-railway.yml |
daily, manual | Non-gating Railway production canary |
benchmark-lovable.yml |
manual | Lovable browser smoke check while the UI flow is unstable |
- The tracked local-gate baseline under
benchmark_baselines/local_gate/remains the main durable benchmark source of truth. - Local-gate runners write to
.cache/evals/canonical/local_gate/by default; hydrated artifacts are operational evidence, not durable truth. - The canonical local-gate tiers remain
tier0_smoke,tier1_core, andtier2_frontier. agentic_harnessandproduction_canaryare specialized eval surfaces, not canonical local-gate promotion tiers.- The tracked full-matrix local-gate baseline was generated on
2026-04-10and is still the source fordocs/MASTER_BENCHMARK.md. - Merge validation on branch
leanecon_v4on2026-04-16restored the known-good deterministic Gemini formalizer posture:formalizer_only tier1_core:23/23 = 1.000formalizer_only tier2_frontier:9/13 = 0.692
- Planner mode remains opt-in; production-safe default is still
LEANECON_PLANNER_MODE=deterministic. - Railway success rate and outcome-match remain separate signals: success rate measures completion, while outcome-match measures whether each canary landed in its expected bucket.
MISTRAL_API_KEY- provider-backed formalization/proving and benchmark workflows
GOOGLE_API_KEY- Gemini formalizer and optional planner experiments
Railway is the production deployment target for the FastAPI service.
Dockerfile- Railway/deployment image
- builds the Lean workspace inside the image and starts
uvicorn
Dockerfile.ci- CI image
- starts from the published GHCR Lean base image and layers Python/app code on top
Operationally, main is the production deploy branch. Pushes to main run ci.yml, and successful main
updates are what Railway redeploys from. Treat merges to main as production-affecting even when the
Railway deploy step itself is managed outside the repository workflows.
LeanInteract-backed REPL proving is integrated into the prover path, but it is still treated conservatively:
- final trust remains with deterministic Lean compilation/verification
- REPL validation is explicitly disabled in benchmark workflows unless enabled on purpose
- workflow health must never depend on REPL-only behavior
The repository now keeps the operator-facing docs compact:
README.md- setup, runtime model config, workflow topology, deployment shape
docs/API_CONTRACT.md- request/response behavior and endpoint guarantees
docs/ARCHITECTURE.md- system layout and artifact durability
docs/CACHE_POLICY.md- what is durable, ephemeral, and safe to purge
docs/CODING_AGENT_BEST_PRACTICES.md- coding-agent operating guidance
docs/DOCS_INDEX.md- short pointer map for the canonical doc set