AI-Powered Financial Document Forensics Platform β Built at DevSoc '26
Aegis is an autonomous forensic auditing system built during the DevSoc '26 hackathon for the Hotfoot sponsor track (with Backboard as the AI/RAG sponsor). It ingests financial documents (bank statements, invoices, payslips, images), detects fraud using forensic mathematics (Benford's Law, balance integrity verification), and visualizes entity relationships through an interactive knowledge graph.
Hotfoot provided the challenge track and the financial document datasets. When the official dataset was changed mid-hackathon from images to Excel spreadsheets, we adapted our pipeline to handle both β our system processes PDFs, images (via OCR), and Excel files through the same unified analysis engine.
- Metadata Integrity Check β Compares header/summary balances against calculated transaction balances. Catches "lying headers" where the closing balance has been tampered with (e.g. injecting a -61M balance into a statement that actually closes at βΉ61).
- Benford's Law Analysis β Flags unnatural leading-digit distributions in transaction amounts.
- Structuring / Smurfing Detection β Identifies clusters of transactions just below reporting thresholds.
- Round-Number Syndrome β Detects suspiciously high ratios of round-number transactions.
- Date-Sequence Violations β Catches out-of-order or impossible date progressions.
- Balance Continuity Checks β Verifies running balance consistency across every row.
- Dynamic integrity display on the Review page β shows INTEGRITY FAILURE (red) or INTEGRITY VERIFIED (green) based on real-time comparison of reported vs calculated balances.
- 3-priority fallback: backend
metadata_discrepancyβ anomaly structured fields β local transaction comparison. - Zero hardcoded values β all thresholds computed from actual data.
- Parses messy bank statement spreadsheets from any bank layout.
- Auto-detects header rows, column mappings, date formats.
- Repairs OCR artifacts, skips junk rows, handles merged cells.
- Extracts opening/closing balances from summary sections.
- Infers transaction types (debit/credit) from signed amounts.
- Interactive 3D Graph β Navigate documents, entities, and risk nodes.
- Conflict Hunter β Detects shared addresses/phones between vendors and employees.
- Entity Resolution β Links accounts, names, and references across documents.
- Smart Match β Auto-links invoices to bank transactions (exact & fuzzy).
- Ghost Detection β Flags invoices with no matching payment.
- Human-in-the-Loop β Force-match decisions feed the learning loop.
Original Vision: FinShield was initially built as "The Financial Flight Recorder" β an autonomous, real-time defense system for mobile users. We attempted to solve two Hotfoot sponsor track challenges simultaneously at DevSoc '26:
- Call Shield (Hotfoot Audio track) β Real-time analysis of phone calls to detect urgency manipulation, fear tactics, and pressure techniques used by scammers. Used WebSocket streaming with a live risk meter.
- Contract Scanner (Hotfoot Docs + Backboard track) β Instant detection of predatory clauses, hidden fees, and exploitative terms in financial documents, with cross-referencing via Backboard RAG.
β οΈ Note: Hotfoot's rules required teams to pick only one track. We initially tried both but pivoted to document intelligence after Review 1 (see below).
Tech at this stage:
- Frontend: Flutter with Riverpod state management, cybersecurity dark theme with neon accents, glassmorphism cards, animated shield logo, and a
call_shield_screen.dartwith a real-time risk meter - Backend: FastAPI with mock service implementations for Hotfoot Audio (
audio_analyzer.py), Hotfoot Docs (document_scanner.py), and Backboard RAG (context_engine.py) - Real-time pipeline: WebSocket-based streaming (
sockets.py+ConnectionManager) withaudio_processor.pystub anddocument_engine.pystub - Architecture: Monorepo with
/frontend(Flutter) and/backend(FastAPI), Docker Compose for full-stack dev
Key commits:
| Commit | Date | Description |
|---|---|---|
8ef22ce |
Feb 8 | Initial project setup β Flutter + FastAPI + mock Hotfoot Audio/Docs/Backboard |
c1e1a27 |
Feb 8 | Add Flutter lib source files (home screen, glassmorphism widgets) |
f3616fb |
Feb 8 | Phase 2: WebSocket Pipeline β audio_processor stub, call_shield_screen with risk meter |
The Pivot: After Review 1 feedback and Hotfoot's requirement to choose only one sponsor track, we made a strategic decision to drop the audio/call analysis track entirely and go all-in on document forensic intelligence. The project evolved from a dual-challenge attempt into a focused, deep financial document auditing platform.
Dataset curveball: Midway through, Hotfoot changed the official dataset from images (scanned documents) to Excel spreadsheets. Rather than just switching, we built a pipeline that handles both β PDFs and images go through OCR + LLM extraction, while Excel files go through our custom normalization engine. This dual-format capability became one of our strongest differentiators.
What changed and why:
| What Changed | Why |
|---|---|
Dropped Hotfoot Audio (deleted audio_analyzer.py, audio_processor.py, call_shield_screen.dart) |
Hotfoot required picking one track. Document intelligence had deeper forensic potential. |
Flutter β React webapp (new /webapp with React 18 + TypeScript + Vite + Chakra UI) |
Flutter was great for mobile but the use case shifted to a desktop-first investigator dashboard. React + Chakra UI enabled faster iteration on complex data-heavy UIs (tables, charts, graphs). |
| Mock services β Real AI pipeline (Backboard Assistant API integration with GPT-4o) | Replaced mock implementations with actual LLM-powered document classification, field extraction, and validation. |
| Added forensic analysis suite | Benford's Law, balance integrity checks, structuring detection, metadata fraud detection β none of this existed pre-review. |
| Added Knowledge Graph | Cross-document entity resolution with interactive force-directed visualization. Links accounts, vendors, and counterparties across uploaded documents. |
| Added Excel Normalization Engine | Parses messy bank statement spreadsheets from any bank layout β auto-detects headers, columns, date formats. |
| Added Self-Learning Loop | Human corrections feed back into the system. Error clustering, retraining triggers, continuous improvement. |
| Rebranded to Aegis | Reflected the evolved mission β from a "shield" to a full forensic auditing platform. |
Key commits in this phase:
| Commit | Date | Description |
|---|---|---|
05fffb1 |
Feb 9 | THE PIVOT β Finsight Knowledge Graph & Reconciliation |
e89bb45 |
Feb 9 | Backboard-only document intelligence system (classification, validation, entity resolution, learning loop) |
d03a5f5 |
Feb 9 | Universal Pipeline Dashboard with Bulk Ingestion and Review Queue |
7b5b646 |
Feb 9 | Complete frontend rebuild β audio services deleted, industry-grade Flutter redesign |
0e5c3e2 |
Feb 10 | Fix 10 diagnosed bugs + Excel viewer |
2852da7 |
Feb 10 | Metadata integrity fraud detection (Lie Detector panel) |
ecf29f1 |
Feb 10 | Enrich knowledge graph with accounts, balances & counterparties |
089f663 |
Feb 10 | Rebrand: Finsight/FinShield β Aegis everywhere |
6b59238 |
Feb 10 | UI: Dramatically improve DocumentReviewPage inspector UI |
The Flutter frontend still exists in /frontend as a legacy reference, but the primary interface is now the React webapp in /webapp.
| Layer | Technology |
|---|---|
| Backend | Python 3.13, FastAPI, SQLModel, SQLAlchemy, SQLite |
| AI/LLM | OpenAI GPT-4o (via Backboard client) |
| Excel Parsing | openpyxl |
| Frontend (Web) | React 18, TypeScript, Vite, Chakra UI |
| Visualization | react-force-graph-3d, Recharts, react-pdf |
| Frontend (Mobile) | Flutter (legacy, in /frontend) |
cd backend
python -m venv ../.venv
../.venv/Scripts/activate # Windows
pip install -e ".[dev]"
uvicorn app.main:app --reload --port 8000cd webapp
npm install
npm run dev # β http://localhost:5173cd frontend
flutter run -d chromeFinShield/
βββ backend/ # FastAPI server + AI pipeline
β βββ app/
β β βββ api/ # REST endpoints (ingestion, documents, forensics, review, dashboard)
β β βββ services/ # Excel normalizer, validation engine, backboard client
β β βββ db/ # SQLModel models + session management
β β βββ core/ # Settings, knowledge graph store
β βββ storage/ # Uploaded document files
βββ webapp/ # React + Vite frontend
β βββ src/
β βββ pages/ # Dashboard, DocumentReview, Upload
β βββ components/
βββ frontend/ # Flutter frontend (legacy)
βββ docker-compose.yml
- Document ingestion pipeline with AI classification
- Excel bank statement normalization engine
- PDF + image ingestion with OCR support
- Forensic validation suite (Benford, structuring, balance checks)
- Metadata integrity fraud detection (Lie Detector)
- React web dashboard with document review
- Knowledge graph entity resolution
- Cross-document entity linking
- Self-learning loop with human corrections
- Multi-currency support (dynamic round-number thresholds)
- Automated report generation
- Batch re-analysis on rule updates
DevSoc '26 β Hotfoot Sponsor Track (Document Intelligence)
| Event | DevSoc '26 |
| Sponsors | Hotfoot (challenge track + datasets), Backboard (AI/RAG platform) |
| Track | Financial Document Intelligence |
| Dataset | Changed mid-hackathon from images β Excel; we support both |
| Team | Built in ~48 hours |