InvoiceFlow is a MVP for automated invoice ingestion and review. Users upload invoice PDFs, the backend extracts structured fields with deterministic parsing (plus OCR fallback), stores results in PostgreSQL, and exposes a polished server-rendered UI for search, review, and manual correction.
- PDF invoice upload from web UI
- Deterministic parser pipeline:
- Direct PDF text extraction (
pdfplumber,pypdf) - Text normalization
- Regex/heuristic field extraction
- OCR fallback (
pdf2image+pytesseract) when text extraction is weak
- Direct PDF text extraction (
- Extracts these fields when available:
vendor_name,invoice_number,invoice_date,due_datecurrency,net_amount,tax_amount,total_amountiban,vat_id
- Stores original file on disk under structured upload folders (
uploads/YYYY/MM/DD/...) - Stores parse metadata:
parse_status,parse_confidence,parsing_notes,raw_text_excerpt,extracted_data
- Invoice list with search/filter/sorting (HTMX-powered)
- Invoice detail and edit pages (manual correction workflow)
- Duplicate flagging based on
invoice_number + vendor_name + total_amount - CSV export endpoint
- Dashboard statistics cards
- Alembic migrations and seed script
- Dockerized setup with PostgreSQL
- Pytest coverage for parser behavior, invoice creation flow, and basic routes
- Python 3.12
- FastAPI
- SQLAlchemy 2.x
- Alembic
- PostgreSQL
- Jinja2 + HTMX + Tailwind CDN
- pdfplumber + pypdf
- OCR fallback: pytesseract + pdf2image
- Docker + docker-compose
- pytest
app/
core/ # settings + logging
db/ # base + session factory
models/ # SQLAlchemy models
repositories/ # DB data access
routes/ # web routes
schemas/ # Pydantic schemas
services/ # invoice + parsing service logic
templates/ # Jinja templates
static/ # CSS
alembic/ # migration environment + versions
scripts/ # seed script
tests/ # pytest suite
- Create env file:
cp .env.example .env- Install dependencies:
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtConda works as well:
conda create -n invoiceflow python=3.12 -y
conda activate invoiceflow
pip install -r requirements.txt- Start PostgreSQL (Docker):
docker compose up -d db- Run migrations:
alembic upgrade head- Run app:
uvicorn app.main:app --reloadOpen http://localhost:8000.
Local development defaults:
DATABASE_URLmay point tolocalhostfor non-Docker local setupsUPLOADS_DIRshould beuploads
Docker/VPS deployments should use the internal service host db and /app/uploads.
docker compose up --buildalembic upgrade head
alembic downgrade -1
alembic revision --autogenerate -m "your message"If you want sample entries without uploading PDFs:
python scripts/seed_data.pyOCR fallback requires system tools:
tesseract-ocrpoppler-utils(forpdf2image)
These are installed in Dockerfile. For local non-Docker runs, install them via your OS package manager.
pytestdocs/screenshots/dashboard.pngdocs/screenshots/upload.pngdocs/screenshots/invoice-list.pngdocs/screenshots/invoice-detail.png
- Parsing is deterministic and intentionally transparent for maintainability.
- You can extend parser rules in
app/services/pdf_parser.pyfor new invoice formats.