Skip to content

Similly/InvoiceFlow

Repository files navigation

InvoiceFlow

InvoiceFlow is a MVP for automated invoice ingestion and review. Users upload invoice PDFs, the backend extracts structured fields with deterministic parsing (plus OCR fallback), stores results in PostgreSQL, and exposes a polished server-rendered UI for search, review, and manual correction.

Features

  • PDF invoice upload from web UI
  • Deterministic parser pipeline:
    • Direct PDF text extraction (pdfplumber, pypdf)
    • Text normalization
    • Regex/heuristic field extraction
    • OCR fallback (pdf2image + pytesseract) when text extraction is weak
  • Extracts these fields when available:
    • vendor_name, invoice_number, invoice_date, due_date
    • currency, net_amount, tax_amount, total_amount
    • iban, vat_id
  • Stores original file on disk under structured upload folders (uploads/YYYY/MM/DD/...)
  • Stores parse metadata:
    • parse_status, parse_confidence, parsing_notes, raw_text_excerpt, extracted_data
  • Invoice list with search/filter/sorting (HTMX-powered)
  • Invoice detail and edit pages (manual correction workflow)
  • Duplicate flagging based on invoice_number + vendor_name + total_amount
  • CSV export endpoint
  • Dashboard statistics cards
  • Alembic migrations and seed script
  • Dockerized setup with PostgreSQL
  • Pytest coverage for parser behavior, invoice creation flow, and basic routes

Tech Stack

  • Python 3.12
  • FastAPI
  • SQLAlchemy 2.x
  • Alembic
  • PostgreSQL
  • Jinja2 + HTMX + Tailwind CDN
  • pdfplumber + pypdf
  • OCR fallback: pytesseract + pdf2image
  • Docker + docker-compose
  • pytest

Project Structure

app/
  core/              # settings + logging
  db/                # base + session factory
  models/            # SQLAlchemy models
  repositories/      # DB data access
  routes/            # web routes
  schemas/           # Pydantic schemas
  services/          # invoice + parsing service logic
  templates/         # Jinja templates
  static/            # CSS
alembic/             # migration environment + versions
scripts/             # seed script
tests/               # pytest suite

Setup (Local)

  1. Create env file:
cp .env.example .env
  1. Install dependencies:
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Conda works as well:

conda create -n invoiceflow python=3.12 -y
conda activate invoiceflow
pip install -r requirements.txt
  1. Start PostgreSQL (Docker):
docker compose up -d db
  1. Run migrations:
alembic upgrade head
  1. Run app:
uvicorn app.main:app --reload

Open http://localhost:8000.

Local development defaults:

  • DATABASE_URL may point to localhost for non-Docker local setups
  • UPLOADS_DIR should be uploads

Docker/VPS deployments should use the internal service host db and /app/uploads.

Full Docker Run

docker compose up --build

Database Migration Commands

alembic upgrade head
alembic downgrade -1
alembic revision --autogenerate -m "your message"

Seed Data

If you want sample entries without uploading PDFs:

python scripts/seed_data.py

OCR Dependencies

OCR fallback requires system tools:

  • tesseract-ocr
  • poppler-utils (for pdf2image)

These are installed in Dockerfile. For local non-Docker runs, install them via your OS package manager.

Test

pytest

Example UI Screenshot Placeholders

  • docs/screenshots/dashboard.png
  • docs/screenshots/upload.png
  • docs/screenshots/invoice-list.png
  • docs/screenshots/invoice-detail.png

Notes

  • Parsing is deterministic and intentionally transparent for maintainability.
  • You can extend parser rules in app/services/pdf_parser.py for new invoice formats.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages