A modern React + TypeScript web application for extracting information from documents using LlamaParse AI. This app specializes in processing government IDs and invoices with a FastAPI backend and MongoDB database.
π New in v2.0: Migrated to React + TypeScript (Lovable stack), real-time WebSocket sync, Statistics dashboard, field-level copy buttons, and enhanced validation!
- π Upload documents - Support for images (JPG, PNG) and PDF files
- π Government ID extraction - Extract information from passports, driver's licenses, national IDs, etc.
- π§Ύ Invoice extraction - Extract details from Indian GST invoices with complete line items
- βοΈ Edit with validation - Review and edit extracted data with inline validation
- π Copy to clipboard - One-click copy for any field
- π Real-time sync - WebSocket-based instant updates with connection status
- πΎ MongoDB storage - Fast, scalable document database
- π Self-hosted - Full control over your data and infrastructure
- π Statistics dashboard - Visual insights into document processing and revenue
- π History view - Browse, search, and manage all extracted documents
- π¨ Modern UI - Built with React, TailwindCSS, and shadcn/ui components
- π Responsive design - Works seamlessly on desktop and mobile browsers
Backend:
- FastAPI - Modern, fast Python web framework
- MongoDB - NoSQL document database
- LlamaParse - AI-powered document extraction (server-side)
- WebSocket - Real-time bidirectional communication
- Docker - Containerized deployment
Frontend:
- React 18 - Modern UI library
- TypeScript - Type-safe JavaScript
- Vite - Lightning-fast build tool
- TailwindCSS - Utility-first CSS framework
- shadcn/ui - High-quality React components
- React Query - Server state management
- WebSocket - Real-time updates
- Docker & Docker Compose (recommended)
- Python 3.11+ (for local development)
- MongoDB 7.0+ (MongoDB Atlas recommended)
- LlamaCloud API key
- Node.js 18+ and npm
- Modern web browser (Chrome, Firefox, Safari, Edge)
git clone https://github.com/yourusername/DocExtract.git
cd DocExtract# Navigate to backend directory
cd backend
# Copy environment template
cp .env.example .env
# Edit .env with your credentials
nano .envConfigure these variables:
MONGODB_URL=mongodb://admin:your_password@mongodb:27017
MONGO_USER=admin
MONGO_PASSWORD=your_secure_password
LLAMA_CLOUD_API_KEY=llx-your-api-key
ALLOWED_ORIGINS=http://localhost:3000,https://your-domain.comStart the backend:
docker-compose up -dVerify it's running:
curl http://localhost:8000/health
# Should return: {"status":"healthy","database":"connected"}# Navigate to frontend directory
cd frontend
# Install dependencies
npm install
# Copy environment template
cp .env.example .env
# Edit .env with backend URL (default: http://localhost:8000)
nano .env
# Start development server
npm run devThe frontend will be available at http://localhost:5173
# Start both backend and frontend together
./scripts/dev.shThis script will:
- Set up environment files if needed
- Start the backend on port 8000
- Start the frontend on port 5173
- Display all access URLs
For local development without Docker:
cd backend
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run MongoDB locally or use Docker
docker run -d -p 27017:27017 \
-e MONGO_INITDB_ROOT_USERNAME=admin \
-e MONGO_INITDB_ROOT_PASSWORD=password \
mongo:7.0
# Start the backend
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000Configure the API endpoint in frontend/.env:
# Development
VITE_API_BASE_URL=http://localhost:8000/api/v1
VITE_WS_URL=ws://localhost:8000/ws
VITE_API_PORT=8000
# Production (Railway)
# VITE_API_BASE_URL=https://your-backend.railway.app/api/v1
# VITE_WS_URL=wss://your-backend.railway.app/wsBuild the Docker images:
cd backend
docker-compose buildStart the backend services:
docker-compose upStart in detached mode (background):
docker-compose up -dView logs:
docker-compose logs -fStop services:
docker-compose downRebuild and restart (after code changes):
docker-compose down
docker-compose build
docker-compose upClean reset (removes volumes):
docker-compose down -v
docker-compose build
docker-compose upBackend will be available at:
- API: http://localhost:8000
- Health check: http://localhost:8000/health
- API docs: http://localhost:8000/docs
Run React frontend with hot reload:
cd frontend
# Development mode (with HMR)
npm run dev
# Build for production
npm run build
# Preview production build
npm run previewFrontend will be available at:
- Development: http://localhost:5173
- Preview: http://localhost:4173
# Build and start all services
docker-compose up --build
# Access the application
- Frontend: http://localhost:8080
- Backend: http://localhost:8000
- API Docs: http://localhost:8000/docsUse the helper script for easy deployment:
./scripts/deploy-railway.shOr deploy manually:
- Install Railway CLI:
npm install -g @railway/cli - Login:
railway login - Link project:
railway link - Set environment variables in Railway dashboard
- Deploy:
railway up
See LOVABLE_MIGRATION.md for detailed migration guide.
This project was migrated from Flutter to React + TypeScript in November 2025. The Flutter code has been archived in archive/flutter-app/ for reference.
Key changes:
- Migrated to React 18 + TypeScript + Vite
- Enhanced UI with TailwindCSS and shadcn/ui
- Added Statistics dashboard
- Implemented field-level copy buttons
- Added inline validation
- Improved WebSocket integration
- No raw JSON visible in UI
For complete migration details, see LOVABLE_MIGRATION.md
# Backend tests
cd backend
pytest
# Frontend tests (if configured)
cd frontend
npm test-
Start both services:
./scripts/dev.sh
-
Test the flow:
- Upload a document at http://localhost:5173
- Verify extraction results display correctly
- Edit and save the document
- Check WebSocket live indicator
- View document in History page
- Check Statistics dashboard
-
Monitor logs:
# Backend logs docker-compose logs -f backend # Or if running locally # Check terminal where uvicorn is running
- API Documentation - Complete API reference
- Migration Guide - Flutter to React migration
- Railway Deployment - Production deployment
- MongoDB Setup - Database configuration
Quick deploy script:
./deploy.sh# Copy Nginx config
sudo cp backend/nginx.conf /etc/nginx/sites-available/docextract
sudo ln -s /etc/nginx/sites-available/docextract /etc/nginx/sites-enabled/
# Get SSL certificate
sudo certbot --nginx -d your-domain.com
# Reload Nginx
sudo nginx -t
sudo systemctl reload nginxOnce the backend is running, visit:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
POST /api/v1/extract # Extract data from document
POST /api/v1/documents # Save document
GET /api/v1/documents # List documents
GET /api/v1/documents/{id} # Get document by ID
DELETE /api/v1/documents/{id} # Delete document
GET /api/v1/stats # Get statistics
WS /ws/documents # WebSocket for real-time updates
DocExtract/
βββ backend/ # FastAPI backend
β βββ app/
β β βββ main.py # FastAPI application
β β βββ config.py # Configuration
β β βββ models/ # Pydantic models
β β βββ schemas/ # LlamaParse schemas
β β βββ routes/ # API endpoints
β β βββ services/ # Business logic
β β β βββ database.py # MongoDB operations
β β β βββ llamaparse.py # LlamaParse integration
β β β βββ websocket_manager.py
β β βββ utils/ # Utilities
β βββ tests/ # Backend tests
β βββ Dockerfile
β βββ docker-compose.yml
β βββ requirements.txt
β
βββ lib/ # Flutter app
β βββ main.dart
β βββ config/
β β βββ api_config.dart # API configuration
β βββ models/
β β βββ extracted_document.dart
β βββ providers/
β β βββ document_provider.dart # State + WebSocket
β βββ screens/
β β βββ home_screen.dart
β β βββ document_type_selection_screen.dart
β β βββ edit_extraction_screen.dart
β β βββ extraction_result_screen.dart
β β βββ history_screen.dart
β βββ services/
β βββ api_service.dart # REST API client
β βββ websocket_service.dart # WebSocket client
β
βββ deploy.sh # Deployment script
βββ MIGRATION_GUIDE.md # Migration from v1.0
βββ README.md # This file
cd backend
pip install pytest
pytest tests/ -vflutter test- Full Name
- ID Number
- Date of Birth
- Gender
- Address
- Issue Date
- Expiry Date
- Nationality
- Document Type
7 Main Sections:
- Seller Information - Name, GSTIN, Contact Numbers
- Customer Information - Name, Address, Contact, GSTIN
- Invoice Details - Date, Bill Number, Gold Price
- Line Items - Description, HSN Code, Weight, Rate, Amount, etc.
- Financial Summary - Subtotal, Taxes (SGST/CGST), Grand Total
- Payment Details - Cash, UPI, Card
- Total in Words - Amount in text format
MongoDB Connection Error:
# Reset MongoDB
cd backend
docker-compose down -v
docker-compose up -dLlamaParse API Errors:
- Verify API key in
.env - Check quota at https://cloud.llamaindex.ai
- Test API key:
curl -H "Authorization: Bearer YOUR_KEY" \
https://api.cloud.llamaindex.ai/api/v1/extraction/runWebSocket Not Connecting:
- Check API endpoint configuration
- Verify backend is running:
curl http://localhost:8000/health - Check firewall rules
Camera Not Working (Android):
- Grant camera permissions in Settings > Apps > DocExtract > Permissions
- Extraction Time: 10-30 seconds (depends on document complexity)
- WebSocket Latency: < 100ms for real-time updates
- Concurrent Users: Tested with 100+ simultaneous connections
- Database: MongoDB indexes optimize query performance
- LlamaParse: ~$0.003 per page (pricing)
- VPS Hosting: $5-20/month (DigitalOcean, Linode, etc.)
- Domain: ~$10-15/year
- SSL Certificate: Free (Let's Encrypt)
If you're upgrading from Supabase-based v1.0, see MIGRATION_GUIDE.md.
- User authentication (OAuth, JWT)
- Multi-user support with teams
- Export to PDF/Excel
- Bulk upload and processing
- iOS app release
- Advanced search and filtering
- Custom document types (user-defined schemas)
- API for third-party integrations
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Open a Pull Request
See the LICENSE file for details.
- Flutter - UI framework
- FastAPI - Web framework
- LlamaIndex - Document extraction
- MongoDB - Database
- Docker - Containerization
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: support@your-domain.com
Version: 2.0.0 Last Updated: 2025-11-11 Status: Production Ready β