A production-oriented bilingual (English/Persian) platform for invoice extraction, human review, and governance-safe automation.
InvoiceMind demonstrates how to build a production-style financial document workflow around local LLMs with:
- explicit quality gates instead of blind automation
- human-in-the-loop review and quarantine
- replayable lifecycle and audit-ready event logs
- evaluation-aware and policy-driven decisioning
Most invoice AI systems fail in production not because of OCR alone, but because automation decisions are hard to trust, hard to explain, and hard to control.
InvoiceMind is built to solve this operational gap:
- Decision traceability: why was a run auto-approved or escalated?
- Controlled automation: when should the system refuse auto-approval?
- Operational safety: replay, quarantine, and explicit reason codes
- Bilingual operations: practical workflows for FA/EN teams
Invoice processing with LLMs breaks at the governance layer before it breaks at raw extraction quality.
Common production failure patterns:
- untraceable approval decisions
- confidence-only routing without policy checks
- missing containment for low-quality inputs
- no replayability for post-incident analysis
InvoiceMind addresses these as first-class design requirements, not afterthoughts.
- Local-first inference (privacy before scale)
- Evidence before confidence
- Policy-driven automation
- Replayable and auditable runs
- Safe defaults over aggressive automation
- Not an OCR benchmark project
- Not a generic prompt demo
- Not an ERP replacement
- Not designed for blind fully automated posting
| Area | Status |
|---|---|
| Core extraction pipeline | ✅ Implemented |
| Review UI and quarantine flow | ✅ Implemented |
| Audit trail and replay controls | ✅ Implemented |
| Quality gates and policy routing | ✅ Implemented |
| Evaluation and Gold Set workflow | 🧪 Sample implementation |
| Calibration and risk scoring | 🟡 Baseline implemented, tuning ongoing |
| Capacity and change governance | 🟡 Baseline implemented, expansion planned |
Status legend:
- ✅ Implemented in current codebase
- 🧪 Implemented as sample/evaluation workflow
- 🟡 Implemented foundation with planned hardening
InvoiceMind design choices are aligned with current public guidance on trustworthy AI operations and secure LLM deployment:
- NIST AI RMF 1.0 (Govern, Map, Measure, Manage): https://www.nist.gov/publications/artificial-intelligence-risk-management-framework-ai-rmf-10
- NIST AI RMF Generative AI Profile: https://www.nist.gov/publications/artificial-intelligence-risk-management-framework-generative-artificial-intelligence
- OWASP Top 10 for LLM Applications: https://owasp.org/www-project-top-10-for-large-language-model-applications/
- GitHub README guidance (structure, discoverability, relative assets): https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-readmes
InvoiceMind takes inspiration from adjacent OSS ecosystems while focusing on financial-governance-safe workflows:
- docTR (OCR stack and document parsing patterns): https://github.com/mindee/doctr
- InvoiceNet / invoice2data (invoice extraction baselines): https://github.com/naiveHobo/InvoiceNet, https://github.com/invoice-x/invoice2data
- Langfuse (LLM observability mindset for production systems): https://github.com/langfuse/langfuse
Core flow:
Ingestion -> Validation -> OCR/Layout -> LLM Extraction -> Postprocess -> Routing -> Review -> Export
Architecture highlights:
- storage isolation (
raw,runs/artifacts,audit,quarantine) - local runtime with versioned config bundles
- governance endpoints for runtime/version/risk visibility
Primary states:
RECEIVED -> VALIDATED -> EXTRACTED -> GATED -> (AUTO_APPROVED | NEEDS_REVIEW) -> FINALIZED
Control points include cancel/replay and explicit escalation paths.
Review UX supports:
- queue-based triage
- field-level confidence and evidence context
- correction and comparison workflows
Gate categories include:
- required fields
- critical field validation
- evidence coverage
- consistency checks
- risk thresholds
Quarantine is reason-code-driven and reprocessable, so failures are contained rather than silently accepted.
Evaluation outputs support field-level and document-level analysis, including tradeoffs between review pressure and critical error risk.
InvoiceMind explicitly detects and contains high-risk failure classes such as:
- low-quality scans and weak OCR confidence
- missing required or critical fields
- unsupported currency or consistency violations
- schema-breaking or evidence-poor outputs
These cases are routed to quarantine or review, never silently auto-approved.
Automation is allowed only when:
- required fields are present and valid
- critical checks pass
- evidence coverage is sufficient
- consistency rules hold
- risk thresholds are not exceeded
Otherwise, the system defaults to human review.
- Every decision is traceable to gate results and reason codes
- Every run can be replayed for deterministic investigation
- Every runtime behavior is version-aware (model/prompt/policy/routing/template)
- Every failure path is explicit (review or quarantine)
| Typical LLM Invoice Tool | InvoiceMind |
|---|---|
| Confidence-only automation | Policy and gate-based routing |
| Blind extraction output | Evidence-aware decisioning |
| Weak operational replay | Explicit replay/cancel lifecycle |
| Limited auditability | Audit verify/events endpoints |
| Cloud-first assumptions | Local-first deployment strategy |
- Upload a low-quality invoice sample.
- Observe validation outcome and quarantine reason code.
- Reprocess after configuration adjustment.
- Review extracted fields with evidence and gate outcomes.
- Finalize and export with audit traceability.
- Python 3.11+
- FastAPI
- SQLAlchemy + Alembic
- SQLite (default)
- Next.js 16 (App Router)
- React 19
- TypeScript (strict)
- versioned runtime bundles in
config/ - model registry in
models.yaml - local storage roots for raw, runs, quarantine, and audit artifacts
- Python
3.11+ - Node.js
22+ - npm
run.batServices:
- Backend:
http://127.0.0.1:8000 - Frontend:
http://127.0.0.1:3000
Backend:
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python scripts/migrate.py
python -m uvicorn app.main:app --host 127.0.0.1 --port 8000Frontend:
cd frontend
npm install
npm run dev -- --hostname 127.0.0.1 --port 3000InvoiceMind is built for local inference workflows.
- Create backend env from template:
Copy-Item .env.example .env- Configure runtime versions and model settings:
INVOICEMIND_MODEL_VERSIONINVOICEMIND_MODEL_RUNTIMEINVOICEMIND_MODEL_QUANTIZATIONINVOICEMIND_PROMPT_VERSIONINVOICEMIND_TEMPLATE_VERSIONINVOICEMIND_ROUTING_VERSIONINVOICEMIND_POLICY_VERSION
- Keep large model weights outside Git; store metadata in
models.yaml.
Optional GGUF runtime example:
ollama import <path-to-gguf> --name invoicemind-extractor
ollama run invoicemind-extractor --prompt "extract invoice fields"- Configure frontend API bridge:
Copy-Item frontend/.env.local.example frontend/.env.localSet:
INVOICEMIND_API_BASE_URLINVOICEMIND_API_USERNAMEINVOICEMIND_API_PASSWORD
GET /healthGET /readyGET /metricsPOST /v1/auth/tokenPOST /v1/documentsGET /v1/documents/{document_id}POST /v1/documents/{document_id}/runsGET /v1/runs/{run_id}POST /v1/runs/{run_id}/cancelPOST /v1/runs/{run_id}/replayGET /v1/runs/{run_id}/exportGET /v1/quarantineGET /v1/quarantine/{item_id}POST /v1/quarantine/{item_id}/reprocessGET /v1/audit/verifyGET /v1/audit/eventsGET /v1/governance/runtime-versionsPOST /v1/governance/change-riskPOST /v1/governance/capacity-estimate
Backend:
pytestFrontend:
cd frontend
npm run typecheck
npm run test
npm run buildThe following decisions define current system direction:
- ADR-001: Local-first inference for financial documents
- ADR-002: Evidence-first extraction over confidence-only routing
- ADR-003: Policy-driven gates over implicit model thresholds
InvoiceMind/
├─ app/ # backend services, orchestration, API
├─ frontend/ # Next.js bilingual operations UI
├─ config/ # versioned model/prompt/template/policy/routing bundles
├─ scripts/ # migration and utility scripts
├─ tests/ # backend test suites
├─ tools/ # performance and benchmark tools
├─ assets/readme/ # README visual assets
├─ Docs/README.md # unified technical documentation
├─ run.bat # Windows one-click startup
├─ LICENSE # AGPL-3.0
└─ SECURITY.md # vulnerability reporting policy
- stronger calibration with larger and more diverse Gold Sets
- tighter reviewer-feedback loop into evaluation cycles
- deeper multi-tenant isolation and governance hardening
- richer observability export and dashboarding
- License:
AGPL-3.0(LICENSE) - Security Policy:
SECURITY.md
This project reflects how production financial LLM workflows should be engineered, not just prototyped.





