Skip to content

Bonorinoa/leanecon_v2

Repository files navigation

LeanEcon v2

LeanEcon v2 is a formal verification API for mathematical economics claims. It combines deterministic Lean 4 checking with provider-backed formalization, proving, and benchmark reporting.

Quick Start

git clone https://github.com/Bonorinoa/leanecon_v2.git
cd leanecon_v2
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn src.api:app --reload

Lean Workspace

Lean lives under lean_workspace/, not the repository root. If you need to verify the Lean environment explicitly:

cd lean_workspace
lake exe cache get
lake build

The repo pins its Lean toolchain through lean_workspace/lean-toolchain and the Lake project through lean_workspace/lakefile.toml.

Runtime Models

LeanEcon currently supports:

  • FORMALIZER_DRIVER=gemini with GEMINI_FORMALIZER_MODEL=gemini-2.5-flash
  • FORMALIZER_DRIVER=leanstral pinned to labs-leanstral-2603
  • FORMALIZER_DRIVER=mistral with FORMALIZER_MODEL=mistral-large-2411 (or another pinned Mistral formalizer model)
  • LEANECON_DRIVER=mistral with MISTRAL_MODEL=labs-leanstral-2603
  • GOOGLE_API_KEY=... for the Gemini formalizer and optional planner experiments
  • MISTRAL_API_KEY=... for the prover and sub-agent helpers

GEMINI_FORMALIZER_MODEL controls the Gemini-backed planner/formalizer path. LEANECON_PLANNER_MODE=deterministic remains the safe default and bypasses the LLM planner entirely; LEANECON_PLANNER_MODE=llm enables the two-node pipeline where Gemini plans and the selected formalizer driver generates the Lean stub. MISTRAL_MODEL controls proving and the Mistral-backed runtime paths, including /api/v2/verify.

Key Paths

  • benchmark_baselines/local_gate/
    • tracked local-gate benchmark baseline used by reporting and regression gating
  • evals/claim_sets/
    • benchmark and canary source inputs; see evals/claim_sets/README.md for the canonical taxonomy
  • .cache/
    • benchmark artifacts, experimental planner logs, cache entries, and other disposable runtime data
  • lean_workspace/
    • Lean project, toolchain pin, and economics preamble library

Workflow Topology

Workflow Trigger Purpose
ci.yml push to main, PRs to main Build the CI/runtime image from the published Lean base and run deterministic tests
build-lean-base.yml push to main when lean_workspace/** changes, manual Publish the reusable Lean base image to GHCR
benchmark-local.yml manual Run the full local-gate benchmark matrix and upload canonical artifacts
benchmark-railway.yml daily, manual Non-gating Railway production canary
benchmark-lovable.yml manual Lovable browser smoke check while the UI flow is unstable

Current Benchmark Reality

  • The tracked local-gate baseline under benchmark_baselines/local_gate/ remains the main durable benchmark source of truth.
  • Local-gate runners write to .cache/evals/canonical/local_gate/ by default; hydrated artifacts are operational evidence, not durable truth.
  • The canonical local-gate tiers remain tier0_smoke, tier1_core, and tier2_frontier.
  • agentic_harness and production_canary are specialized eval surfaces, not canonical local-gate promotion tiers.
  • The tracked full-matrix local-gate baseline was generated on 2026-04-10 and is still the source for docs/MASTER_BENCHMARK.md.
  • Merge validation on branch leanecon_v4 on 2026-04-16 restored the known-good deterministic Gemini formalizer posture:
    • formalizer_only tier1_core: 23/23 = 1.000
    • formalizer_only tier2_frontier: 9/13 = 0.692
  • Planner mode remains opt-in; production-safe default is still LEANECON_PLANNER_MODE=deterministic.
  • Railway success rate and outcome-match remain separate signals: success rate measures completion, while outcome-match measures whether each canary landed in its expected bucket.

Secrets

  • MISTRAL_API_KEY
    • provider-backed formalization/proving and benchmark workflows
  • GOOGLE_API_KEY
    • Gemini formalizer and optional planner experiments

Deployment

Railway is the production deployment target for the FastAPI service.

  • Dockerfile
    • Railway/deployment image
    • builds the Lean workspace inside the image and starts uvicorn
  • Dockerfile.ci
    • CI image
    • starts from the published GHCR Lean base image and layers Python/app code on top

Operationally, main is the production deploy branch. Pushes to main run ci.yml, and successful main updates are what Railway redeploys from. Treat merges to main as production-affecting even when the Railway deploy step itself is managed outside the repository workflows.

REPL Status

LeanInteract-backed REPL proving is integrated into the prover path, but it is still treated conservatively:

  • final trust remains with deterministic Lean compilation/verification
  • REPL validation is explicitly disabled in benchmark workflows unless enabled on purpose
  • workflow health must never depend on REPL-only behavior

Docs

The repository now keeps the operator-facing docs compact:

  • README.md
    • setup, runtime model config, workflow topology, deployment shape
  • docs/API_CONTRACT.md
    • request/response behavior and endpoint guarantees
  • docs/ARCHITECTURE.md
    • system layout and artifact durability
  • docs/CACHE_POLICY.md
    • what is durable, ephemeral, and safe to purge
  • docs/CODING_AGENT_BEST_PRACTICES.md
    • coding-agent operating guidance
  • docs/DOCS_INDEX.md
    • short pointer map for the canonical doc set

About

An attempt at auto formalization applied to mathematical economics to support pair researcher and tutoring roles.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages