Welcome to ParseFlow.ai, a developer-first document intelligence API that converts PDFs to Markdown/JSON with intelligent OCR capabilities.
ParseFlow.ai is a comprehensive document processing platform built with:
- Hono for API/UI (running on Cloudflare Workers)
- Cloudflare D1 for database storage
- Cloudflare R2 for document storage
- Cloudflare Queues for job processing
- Modal for GPU-powered OCR processing
- Stripe for billing
- High-accuracy OCR: Powered by Docling (primary) and DeepSeek-OCR (fallback)
- Layout preservation: Maintains document structure and formatting
- Table and figure extraction: Accurate parsing of complex elements
- Webhook delivery: Real-time notifications when processing completes
- Financial document mode: Specialized processing for financial documents
- API-first design: Easy integration with your applications
- Scalable architecture: Built to handle high-volume processing
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ API Layer │ │ Processing │ │ Storage & │
│ (Hono/CF) │ │ Engine (Modal) │ │ Queues (CF) │
│ │ │ │ │ │
│ • /v1/extract │───▶│ • Docling │ │ • D1 (SQLite) │
│ • /v1/uploads │ │ • DeepSeek-OCR │◀───│ • R2 (S3) │
│ • /v1/jobs │ │ • vLLM │ │ • Queues │
│ • Webhooks │ │ │ │ │
└─────────────────┘ └──────────────────┘ └─────────────────┘
- Node.js 18+
- pnpm
- Python 3.10+
- Cloudflare account
- Modal account
- Install Node.js dependencies:
pnpm install- Install Python dependencies:
cd engine
source venv/bin/activate # On Windows: engine\venv\Scripts\activate
pip install -r requirements.txt- Set up environment variables:
# For Cloudflare Workers
wrangler secret put R2_ACCESS_KEY_ID
wrangler secret put R2_SECRET_ACCESS_KEY
wrangler secret put WORKER_API_SECRET
wrangler secret put STRIPE_SECRET_KEY
wrangler secret put STRIPE_WEBHOOK_SECRET- Deploy to Cloudflare:
# Deploy the main API
cd pages && wrangler deploy
# Deploy workers
cd ../workers/email && wrangler deploy
cd ../sync && wrangler deploy
cd ../billing && wrangler deployCreate a .dev.vars file with the following:
# Cloudflare
CF_ACCOUNT_ID=your_account_id
R2_PUBLIC_URL=your_r2_public_url
# API Secrets
ENGINE_SECRET=your_engine_secret
WORKER_API_SECRET=your_worker_api_secret
# Stripe
STRIPE_SECRET_KEY=your_stripe_secret_key
STRIPE_WEBHOOK_SECRET=your_stripe_webhook_secret
STRIPE_STARTER_PRICE_ID=your_starter_price_id
STRIPE_PRO_PRICE_ID=your_pro_price_id
APP_URL=your_app_url
# R2
R2_ACCESS_KEY_ID=your_r2_access_key
R2_SECRET_ACCESS_KEY=your_r2_secret_keyAll API requests require an API key in the Authorization header:
Authorization: Bearer pf_live_...
First, get a presigned URL to upload your document directly to our storage:
POST /v1/uploads/init
{
"content_type": "application/pdf",
"file_name": "document.pdf"
}Then upload your file to the returned presigned URL, and optionally create a processing job:
POST /v1/extract
{
"url": "https://your-storage.com/file.pdf", // Optional, if you host the file
"webhook_url": "https://your-app.com/webhook",
"mode": "general" // or "financial" for high-accuracy financial document processing
}GET /v1/jobs/{job_id}
When processing is complete, we'll send a POST request to your webhook URL with the job result:
{
"id": "job_123...",
"status": "completed",
"result_url": "https://storage-url-to-result",
"trust_score": 0.95
}Run the test suite:
pnpm test
# or
npx vitestThe system is designed for deployment on Cloudflare Workers:
- Set up your D1 database:
wrangler d1 create parseflow-db
wrangler d1 execute parseflow-db --file=db/schema.sql- Set up your R2 bucket:
wrangler r2 bucket create parseflow-storage- Deploy the application:
# Deploy the main application
cd pages && wrangler deploy
# Deploy the workers
cd ../workers/email && wrangler deploy
cd ../sync && wrangler deploy
cd ../billing && wrangler deployThis project was transformed from FreightStructurize (a freight auditing system) to ParseFlow.ai (a general document intelligence API) as specified in the PRD. Key changes include:
- Database: Migrated from freight-specific schema to ParseFlow schema with accounts, api_keys, and jobs
- API: Implemented full REST API with authentication, upload endpoints, and job management
- Processing: Updated from freight-specific extraction to general document processing with Docling and DeepSeek-OCR
- Frontend: Redesigned from freight dashboard to general API management UI
- Billing: Migrated from Lemon Squeezy to Stripe integration
parseflow/
├── db/ # Database schemas
├── engine/ # Python processing engine
├── modal/ # Modal GPU workers
├── pages/ # Frontend (Cloudflare Pages)
├── src/ # API layer (Hono)
├── workers/ # Cloudflare Workers
│ ├── email/ # Email processing worker
│ ├── sync/ # Job processing worker
│ └── billing/ # Stripe billing worker
├── README.md
├── package.json
└── prd.md # Original PRD
This project is licensed under the MIT License.