An end-to-end Document Intelligence Platform that enhances, extracts, and enables intelligent interaction with documents using Deep Learning and Large Language Models.
This system allows users to:
- 1 Upload PDF or image documents
- 2 Enhance document quality (Deblur + Super Resolution)
- 3 Extract text using AI-powered Vision OCR
- 4 Store metadata and extracted text in MySQL
- 5 Select single or multiple documents
- 6 Chat with documents using real-time streaming LLM
- 7 Track document processing status
The platform combines Computer Vision, Deep Learning, OCR, and LLM-based reasoning into a unified, scalable architecture.
User Upload
↓
Store File Locally
↓
Save Metadata in MySQL
↓
Preprocess (DE-GAN + RealESRGAN)
↓
Save Enhanced File
↓
OCR (OpenAI Vision)
↓
Save Extracted Text
↓
User Selects Documents
↓
Streaming LLM Chat (LLAMA 3.2(open source))
| Layer | Technology |
|---|---|
| Backend | FastAPI |
| Database | MySQL |
| Enhancement | DE-GAN + RealESRGAN |
| OCR | OpenAI GPT-4o Vision |
| Chat LLM | LLAMA 3.2 |
| Streaming | FastAPI StreamingResponse |
| PDF Handling | PyMuPDF |
| Image Processing | ocr |
| Server | Uvicorn |
DE-GAN/
│
├── main.py
├── db.py
├── ocr_helper.py
├── degain_esrgan_pipeline.py
├── enhance.py
│
├── routers/
│ ├── upload_api.py
│ ├── preprocess_api.py
│ ├── ocr_api.py
│ ├── chat_api.py
│ ├── file_retriver.py
│ └── list_documents.py
│
├── image_for_enhancement/
├── realesrgan_output/
├── temp_pages/
├── templates/
└── weights/
Stores document metadata and processing results.
CREATE TABLE common (
id INT AUTO_INCREMENT PRIMARY KEY,
preproc_path VARCHAR(255) NOT NULL,
extract_text TEXT NOT NULL,
usrdoc_name VARCHAR(255) NOT NULL,
doc_name VARCHAR(255) NOT NULL,
size INT NOT NULL,
doc_status VARCHAR(255) NOT NULL,
file_path VARCHAR(255) NOT NULL,
time_stamp VARCHAR(255) NOT NULL
);Stores chat history per document.
CREATE TABLE querries (
doc_id INT NOT NULL,
q_id VARCHAR(255) NOT NULL PRIMARY KEY,
upload_time DATETIME NOT NULL,
ext_time DATETIME NOT NULL,
querry VARCHAR(255) NOT NULL,
response VARCHAR(255) NOT NULL,
FOREIGN KEY (doc_id) REFERENCES common(id)
);POST /upload
- Saves file locally
- Inserts metadata in DB
- Sets
status = uploaded
POST /preprocess/{doc_id}
- Convert PDF → Images
- Run DE-GAN (Deblur)
- Run RealESRGAN (Super Resolution)
- Rebuild multi-page PDF
- Update
preproc_path - Set
status = preprocessed
POST /ocr/{doc_id}
- Load enhanced file
- Extract text using OpenAI Vision
- Save text in DB
- Set
status = extracted
GET /documents
GET /retrieve/{doc_id}/file?file_type=output
GET /retrieve/{doc_id}/file?file_type=preproc
POST /chat
Request Body:
{
"doc_ids": [1, 2],
"question": "Summarize the key findings."
}How It Works:
- Fetch extracted text from selected documents
- Combine into context
- Send to GPT-4o
- Stream response using
StreamingResponse - Save query & response in DB
return StreamingResponse(generator(), media_type="text/event-stream")Frontend receives incremental tokens in real time.
- ✔ Multi-page PDF support
- ✔ Parallel document enhancement
- ✔ AI Vision OCR
- ✔ Real-time streaming LLM chat
- ✔ Multi-document reasoning
- ✔ Modular API architecture
- ✔ MySQL integration
- ✔ Status tracking
pip install fastapi uvicorn mysql-connector-python
pip install opencv-python torch torchvision
pip install pymupdf pillow
pip install openaiuvicorn main:app --reloadSwagger UI available at: http://127.0.0.1:8000/docs
- Python 3.12.12+
- MySQL running with
doc_manschema - RealESRGAN weights inside
/weights - OpenAI API key configured( I have used open source model name LLAMA 3.2)
- Background processing queue
- User authentication & role-based access
- Cloud deployment
Talib Hussain Ansari
AI Engineer | Document Intelligence Systems
This project integrates Computer Vision, Deep Learning Enhancement, AI OCR, and LLM Streaming Chat into a scalable Document Intelligence platform designed for intelligent automation and interactive document analysis.