LeanEcon v2

LeanEcon v2 is a formal verification API for mathematical economics claims. It combines deterministic Lean 4 checking with provider-backed formalization, proving, and benchmark reporting.

Quick Start

git clone https://github.com/Bonorinoa/leanecon_v2.git
cd leanecon_v2
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn src.api:app --reload

Lean Workspace

Lean lives under lean_workspace/, not the repository root. If you need to verify the Lean environment explicitly:

cd lean_workspace
lake exe cache get
lake build

The repo pins its Lean toolchain through lean_workspace/lean-toolchain and the Lake project through lean_workspace/lakefile.toml.

Runtime Models

LeanEcon currently supports:

FORMALIZER_DRIVER=gemini with GEMINI_FORMALIZER_MODEL=gemini-2.5-flash
FORMALIZER_DRIVER=leanstral pinned to labs-leanstral-2603
FORMALIZER_DRIVER=mistral with FORMALIZER_MODEL=mistral-large-2411 (or another pinned Mistral formalizer model)
LEANECON_DRIVER=mistral with MISTRAL_MODEL=labs-leanstral-2603
GOOGLE_API_KEY=... for the Gemini formalizer and optional planner experiments
MISTRAL_API_KEY=... for the prover and sub-agent helpers

GEMINI_FORMALIZER_MODEL controls the Gemini-backed planner/formalizer path. LEANECON_PLANNER_MODE=deterministic remains the safe default and bypasses the LLM planner entirely; LEANECON_PLANNER_MODE=llm enables the two-node pipeline where Gemini plans and the selected formalizer driver generates the Lean stub. MISTRAL_MODEL controls proving and the Mistral-backed runtime paths, including /api/v2/verify.

Key Paths

benchmark_baselines/local_gate/
- tracked local-gate benchmark baseline used by reporting and regression gating
evals/claim_sets/
- benchmark and canary source inputs; see evals/claim_sets/README.md for the canonical taxonomy
.cache/
- benchmark artifacts, experimental planner logs, cache entries, and other disposable runtime data
lean_workspace/
- Lean project, toolchain pin, and economics preamble library

Workflow Topology

Workflow	Trigger	Purpose
`ci.yml`	push to `main`, PRs to `main`	Build the CI/runtime image from the published Lean base and run deterministic tests
`build-lean-base.yml`	push to `main` when `lean_workspace/**` changes, manual	Publish the reusable Lean base image to GHCR
`benchmark-local.yml`	manual	Run the full local-gate benchmark matrix and upload canonical artifacts
`benchmark-railway.yml`	daily, manual	Non-gating Railway production canary
`benchmark-lovable.yml`	manual	Lovable browser smoke check while the UI flow is unstable

Current Benchmark Reality

The tracked local-gate baseline under benchmark_baselines/local_gate/ remains the main durable benchmark source of truth.
Local-gate runners write to .cache/evals/canonical/local_gate/ by default; hydrated artifacts are operational evidence, not durable truth.
The canonical local-gate tiers remain tier0_smoke, tier1_core, and tier2_frontier.
agentic_harness and production_canary are specialized eval surfaces, not canonical local-gate promotion tiers.
The tracked full-matrix local-gate baseline was generated on 2026-04-10 and is still the source for docs/MASTER_BENCHMARK.md.
Merge validation on branch leanecon_v4 on 2026-04-16 restored the known-good deterministic Gemini formalizer posture:
- formalizer_only tier1_core: 23/23 = 1.000
- formalizer_only tier2_frontier: 9/13 = 0.692
Planner mode remains opt-in; production-safe default is still LEANECON_PLANNER_MODE=deterministic.
Railway success rate and outcome-match remain separate signals: success rate measures completion, while outcome-match measures whether each canary landed in its expected bucket.

Secrets

MISTRAL_API_KEY
- provider-backed formalization/proving and benchmark workflows
GOOGLE_API_KEY
- Gemini formalizer and optional planner experiments

Deployment

Railway is the production deployment target for the FastAPI service.

Dockerfile
- Railway/deployment image
- builds the Lean workspace inside the image and starts uvicorn
Dockerfile.ci
- CI image
- starts from the published GHCR Lean base image and layers Python/app code on top

Operationally, main is the production deploy branch. Pushes to main run ci.yml, and successful main updates are what Railway redeploys from. Treat merges to main as production-affecting even when the Railway deploy step itself is managed outside the repository workflows.

REPL Status

LeanInteract-backed REPL proving is integrated into the prover path, but it is still treated conservatively:

final trust remains with deterministic Lean compilation/verification
REPL validation is explicitly disabled in benchmark workflows unless enabled on purpose
workflow health must never depend on REPL-only behavior

Docs

The repository now keeps the operator-facing docs compact:

README.md
- setup, runtime model config, workflow topology, deployment shape
docs/API_CONTRACT.md
- request/response behavior and endpoint guarantees
docs/ARCHITECTURE.md
- system layout and artifact durability
docs/CACHE_POLICY.md
- what is durable, ephemeral, and safe to purge
docs/CODING_AGENT_BEST_PRACTICES.md
- coding-agent operating guidance
docs/DOCS_INDEX.md
- short pointer map for the canonical doc set

Name		Name	Last commit message	Last commit date
Latest commit History 196 Commits
.claude/skills/lean4		.claude/skills/lean4
.github/workflows		.github/workflows
benchmark_baselines/local_gate		benchmark_baselines/local_gate
docs		docs
evals		evals
lean_workspace		lean_workspace
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.ci		Dockerfile.ci
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
railway.toml		railway.toml
requirements.txt		requirements.txt
test_docker_caching.sh		test_docker_caching.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LeanEcon v2

Quick Start

Lean Workspace

Runtime Models

Key Paths

Workflow Topology

Current Benchmark Reality

Secrets

Deployment

REPL Status

Docs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LeanEcon v2

Quick Start

Lean Workspace

Runtime Models

Key Paths

Workflow Topology

Current Benchmark Reality

Secrets

Deployment

REPL Status

Docs

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages