Upload any document → get structured, machine-readable data back.
DocIntel is an end-to-end document intelligence pipeline that takes PDFs and images through layout detection, OCR, and named entity extraction — returning clean, structured JSON. Built for real-world use cases like digitizing land deeds, processing research papers, and extracting data from scanned forms.
Drag-and-drop document upload with supported format indicators and processing capabilities overview.
Extracted text blocks with semantic type classification (Title, Paragraph, List, Table, Footer), confidence scores, and bounding box coordinates.
Named entities detected across pages — people, locations, dates, monetary values, and organizations — with type-coded badges and character offsets.
Full API response with syntax highlighting — ready for downstream integration.
Organizations worldwide — from smallholder farmers registering land deeds, to NGOs digitizing health records, to researchers processing paper archives — need to convert unstructured documents into machine-readable data. Existing tools are either expensive cloud APIs, or fragmented open-source libraries that require significant glue code.
DocIntel provides a single API endpoint that handles the entire pipeline: upload a document, get structured JSON back with text blocks, bounding boxes, and extracted entities.
- Multi-format support — PDF, PNG, JPG, JPEG, TIFF, BMP
- Layout detection — Identifies titles, paragraphs, tables, lists, and figures using heuristic-based connected component analysis
- OCR extraction — Tesseract-powered text extraction with per-block confidence scores
- Named entity recognition — Extracts dates, monetary amounts, percentages, emails, phone numbers, and addresses via regex patterns; optional spaCy integration for PERSON, ORG, GPE entities
- Structured output — Clean JSON with bounding boxes, block types, and page-level organization
- Async processing — Submit large documents for background processing with status polling
- React dashboard — Upload documents and explore results with an interactive UI
| Layer | Technology |
|---|---|
| API | FastAPI, Uvicorn, Pydantic v2 |
| OCR | Tesseract (via pytesseract) |
| pdf2image (Poppler), pypdf | |
| NLP | Regex patterns + spaCy (optional) |
| Image | Pillow, NumPy |
| Frontend | React 18, Vite, Tailwind CSS |
| Container | Docker, Docker Compose |
- Python 3.11+
- Node.js 18+
- Tesseract OCR (
brew install tesseract/apt install tesseract-ocr) - Poppler (
brew install poppler/apt install poppler-utils)
git clone https://github.com/Jonathan-321/docintel.git
cd docintel
docker-compose up --buildThe API will be available at http://localhost:8000 and the frontend at http://localhost:3000.
Backend:
cd backend
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Optional: install spaCy model for enhanced NER
python -m spacy download en_core_web_sm
uvicorn app.main:app --reload --port 8000Frontend:
cd frontend
npm install
npm run devPOST /api/v1/process
Content-Type: multipart/form-data| Parameter | Type | Description |
|---|---|---|
file |
File | Document to process |
Response:
{
"filename": "land_deed.pdf",
"num_pages": 2,
"pages": [
{
"page_number": 1,
"width": 2550,
"height": 3300,
"blocks": [
{
"text": "CERTIFICATE OF TITLE",
"confidence": 0.95,
"bbox": { "x": 120, "y": 50, "width": 800, "height": 60 },
"block_type": "title"
}
],
"entities": [
{
"text": "January 15, 2024",
"label": "DATE",
"start": 45,
"end": 61
},
{
"text": "$150,000",
"label": "MONEY",
"start": 120,
"end": 128
}
]
}
],
"metadata": {
"ocr_engine": "tesseract",
"language": "eng",
"spacy_available": true
},
"processing_time_ms": 1234.56
}POST /api/v1/process/async
Content-Type: multipart/form-dataReturns a job_id for polling:
{
"job_id": "abc123",
"status": "processing",
"progress": 0.0
}GET /api/v1/status/{job_id}GET /api/v1/formatsFastAPI auto-generates interactive API documentation:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
docintel/
├── backend/
│ ├── app/
│ │ ├── api/
│ │ │ └── routes.py # API endpoints
│ │ ├── core/
│ │ │ └── config.py # Settings & configuration
│ │ ├── models/
│ │ │ └── schemas.py # Pydantic data models
│ │ ├── pipeline/
│ │ │ ├── processor.py # Main document processor
│ │ │ ├── layout.py # Layout detection
│ │ │ ├── ocr.py # OCR engine
│ │ │ └── entities.py # Entity extraction
│ │ ├── utils/
│ │ │ └── file_utils.py # File handling utilities
│ │ └── main.py # FastAPI app entry point
│ ├── tests/
│ │ ├── test_health.py
│ │ ├── test_process.py
│ │ └── test_entities.py
│ ├── Dockerfile
│ └── requirements.txt
├── frontend/
│ ├── src/
│ │ ├── components/
│ │ │ ├── FileUpload.jsx
│ │ │ ├── ResultsView.jsx
│ │ │ ├── ProcessingStatus.jsx
│ │ │ ├── Header.jsx
│ │ │ └── Sidebar.jsx
│ │ ├── api/
│ │ │ └── client.js
│ │ ├── App.jsx
│ │ └── main.jsx
│ ├── package.json
│ └── vite.config.js
├── docker-compose.yml
├── LICENSE
└── README.md
cd backend
pip install pytest pytest-asyncio httpx
pytest -vCopy .env.example to .env in the backend directory:
| Variable | Default | Description |
|---|---|---|
DEBUG |
false |
Enable debug mode |
OCR_LANGUAGE |
eng |
Tesseract language pack |
TESSERACT_CMD |
tesseract |
Path to tesseract binary |
MAX_FILE_SIZE |
20971520 |
Max upload size in bytes (20MB) |
UPLOAD_DIR |
/tmp/docintel |
Temporary upload directory |
- Land administration — Digitize property deeds and extract parcel numbers, dates, and monetary values
- Healthcare — Process scanned medical records and extract patient information, dates, and diagnoses
- Research — Bulk-process academic papers and extract titles, authors, citations, and key findings
- Finance — Extract transaction data from scanned invoices, receipts, and bank statements
- Government — Digitize census forms, birth certificates, and other civil documents
- Table structure recognition (row/column detection)
- Handwriting recognition support
- Multi-language OCR (Arabic, Kinyarwanda, French)
- Document classification (invoice vs. letter vs. form)
- Batch processing endpoint
- Webhook notifications for async jobs
- Fine-tuned layout detection model (YOLO-based)
MIT License — see LICENSE for details.




