Skip to content

Talib2021/DocMan

Repository files navigation

AI Document Enhancement & Intelligent Chat System

An end-to-end Document Intelligence Platform that enhances, extracts, and enables intelligent interaction with documents using Deep Learning and Large Language Models.


Overview

This system allows users to:

  • 1 Upload PDF or image documents
  • 2 Enhance document quality (Deblur + Super Resolution)
  • 3 Extract text using AI-powered Vision OCR
  • 4 Store metadata and extracted text in MySQL
  • 5 Select single or multiple documents
  • 6 Chat with documents using real-time streaming LLM
  • 7 Track document processing status

The platform combines Computer Vision, Deep Learning, OCR, and LLM-based reasoning into a unified, scalable architecture.


System Architecture

User Upload
    ↓
Store File Locally
    ↓
Save Metadata in MySQL
    ↓
Preprocess (DE-GAN + RealESRGAN)
    ↓
Save Enhanced File
    ↓
OCR (OpenAI Vision)
    ↓
Save Extracted Text
    ↓
User Selects Documents
    ↓
Streaming LLM Chat (LLAMA 3.2(open source))

Tech Stack

Layer Technology
Backend FastAPI
Database MySQL
Enhancement DE-GAN + RealESRGAN
OCR OpenAI GPT-4o Vision
Chat LLM LLAMA 3.2
Streaming FastAPI StreamingResponse
PDF Handling PyMuPDF
Image Processing ocr
Server Uvicorn

Project Structure

DE-GAN/
│
├── main.py
├── db.py
├── ocr_helper.py
├── degain_esrgan_pipeline.py
├── enhance.py
│
├── routers/
│   ├── upload_api.py
│   ├── preprocess_api.py
│   ├── ocr_api.py
│   ├── chat_api.py
│   ├── file_retriver.py
│   └── list_documents.py
│
├── image_for_enhancement/
├── realesrgan_output/
├── temp_pages/
├── templates/
└── weights/

Database Schema

Table: common

Stores document metadata and processing results.

CREATE TABLE common (
    id INT AUTO_INCREMENT PRIMARY KEY,
    preproc_path VARCHAR(255) NOT NULL,
    extract_text TEXT NOT NULL,
    usrdoc_name VARCHAR(255) NOT NULL,
    doc_name VARCHAR(255) NOT NULL,
    size INT NOT NULL,
    doc_status VARCHAR(255) NOT NULL,
    file_path VARCHAR(255) NOT NULL,
    time_stamp VARCHAR(255) NOT NULL
);

Table: querries

Stores chat history per document.

CREATE TABLE querries (
    doc_id INT NOT NULL,
    q_id VARCHAR(255) NOT NULL PRIMARY KEY,
    upload_time DATETIME NOT NULL,
    ext_time DATETIME NOT NULL,
    querry VARCHAR(255) NOT NULL,
    response VARCHAR(255) NOT NULL,
    FOREIGN KEY (doc_id) REFERENCES common(id)
);

API Endpoints

Upload Document

POST /upload

  • Saves file locally
  • Inserts metadata in DB
  • Sets status = uploaded

Preprocess Document

POST /preprocess/{doc_id}

  • Convert PDF → Images
  • Run DE-GAN (Deblur)
  • Run RealESRGAN (Super Resolution)
  • Rebuild multi-page PDF
  • Update preproc_path
  • Set status = preprocessed

OCR Extraction

POST /ocr/{doc_id}

  • Load enhanced file
  • Extract text using OpenAI Vision
  • Save text in DB
  • Set status = extracted

List Documents

GET /documents

Retrieve File

GET /retrieve/{doc_id}/file?file_type=output
GET /retrieve/{doc_id}/file?file_type=preproc

Chat with Documents

POST /chat

Request Body:

{
  "doc_ids": [1, 2],
  "question": "Summarize the key findings."
}

How It Works:

  1. Fetch extracted text from selected documents
  2. Combine into context
  3. Send to GPT-4o
  4. Stream response using StreamingResponse
  5. Save query & response in DB

Real-Time Streaming

return StreamingResponse(generator(), media_type="text/event-stream")

Frontend receives incremental tokens in real time.


Features

  • ✔ Multi-page PDF support
  • ✔ Parallel document enhancement
  • ✔ AI Vision OCR
  • ✔ Real-time streaming LLM chat
  • ✔ Multi-document reasoning
  • ✔ Modular API architecture
  • ✔ MySQL integration
  • ✔ Status tracking

Installation

1. Install Dependencies

pip install fastapi uvicorn mysql-connector-python
pip install opencv-python torch torchvision
pip install pymupdf pillow
pip install openai

2. Run Server

uvicorn main:app --reload

Swagger UI available at: http://127.0.0.1:8000/docs


Requirements

  • Python 3.12.12+
  • MySQL running with doc_man schema
  • RealESRGAN weights inside /weights
  • OpenAI API key configured( I have used open source model name LLAMA 3.2)

Future Improvements

  • Background processing queue
  • User authentication & role-based access
  • Cloud deployment

Author

Talib Hussain Ansari
AI Engineer | Document Intelligence Systems


🏁 Summary

This project integrates Computer Vision, Deep Learning Enhancement, AI OCR, and LLM Streaming Chat into a scalable Document Intelligence platform designed for intelligent automation and interactive document analysis.

About

AI Document Enhancement & Intelligent Chat System is a FastAPI-based platform that enhances PDFs/images using DE-GAN and RealESRGAN, extracts text with OpenAI Vision OCR, and enables real-time multi-document chat via GPT-4o. It supports status tracking, MySQL storage, and scalable document intelligence workflows.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors