AI Document Enhancement & Intelligent Chat System

An end-to-end Document Intelligence Platform that enhances, extracts, and enables intelligent interaction with documents using Deep Learning and Large Language Models.

Overview

This system allows users to:

1 Upload PDF or image documents
2 Enhance document quality (Deblur + Super Resolution)
3 Extract text using AI-powered Vision OCR
4 Store metadata and extracted text in MySQL
5 Select single or multiple documents
6 Chat with documents using real-time streaming LLM
7 Track document processing status

The platform combines Computer Vision, Deep Learning, OCR, and LLM-based reasoning into a unified, scalable architecture.

System Architecture

User Upload
    ↓
Store File Locally
    ↓
Save Metadata in MySQL
    ↓
Preprocess (DE-GAN + RealESRGAN)
    ↓
Save Enhanced File
    ↓
OCR (OpenAI Vision)
    ↓
Save Extracted Text
    ↓
User Selects Documents
    ↓
Streaming LLM Chat (LLAMA 3.2(open source))

Tech Stack

Layer	Technology
Backend	FastAPI
Database	MySQL
Enhancement	DE-GAN + RealESRGAN
OCR	OpenAI GPT-4o Vision
Chat LLM	LLAMA 3.2
Streaming	FastAPI StreamingResponse
PDF Handling	PyMuPDF
Image Processing	ocr
Server	Uvicorn

Project Structure

DE-GAN/
│
├── main.py
├── db.py
├── ocr_helper.py
├── degain_esrgan_pipeline.py
├── enhance.py
│
├── routers/
│   ├── upload_api.py
│   ├── preprocess_api.py
│   ├── ocr_api.py
│   ├── chat_api.py
│   ├── file_retriver.py
│   └── list_documents.py
│
├── image_for_enhancement/
├── realesrgan_output/
├── temp_pages/
├── templates/
└── weights/

Database Schema

Table: `common`

Stores document metadata and processing results.

CREATE TABLE common (
    id INT AUTO_INCREMENT PRIMARY KEY,
    preproc_path VARCHAR(255) NOT NULL,
    extract_text TEXT NOT NULL,
    usrdoc_name VARCHAR(255) NOT NULL,
    doc_name VARCHAR(255) NOT NULL,
    size INT NOT NULL,
    doc_status VARCHAR(255) NOT NULL,
    file_path VARCHAR(255) NOT NULL,
    time_stamp VARCHAR(255) NOT NULL
);

Table: `querries`

Stores chat history per document.

CREATE TABLE querries (
    doc_id INT NOT NULL,
    q_id VARCHAR(255) NOT NULL PRIMARY KEY,
    upload_time DATETIME NOT NULL,
    ext_time DATETIME NOT NULL,
    querry VARCHAR(255) NOT NULL,
    response VARCHAR(255) NOT NULL,
    FOREIGN KEY (doc_id) REFERENCES common(id)
);

API Endpoints

Upload Document

POST /upload

Saves file locally
Inserts metadata in DB
Sets status = uploaded

Preprocess Document

POST /preprocess/{doc_id}

Convert PDF → Images
Run DE-GAN (Deblur)
Run RealESRGAN (Super Resolution)
Rebuild multi-page PDF
Update preproc_path
Set status = preprocessed

OCR Extraction

POST /ocr/{doc_id}

Load enhanced file
Extract text using OpenAI Vision
Save text in DB
Set status = extracted

List Documents

GET /documents

Retrieve File

GET /retrieve/{doc_id}/file?file_type=output
GET /retrieve/{doc_id}/file?file_type=preproc

Chat with Documents

POST /chat

Request Body:

{
  "doc_ids": [1, 2],
  "question": "Summarize the key findings."
}

How It Works:

Fetch extracted text from selected documents
Combine into context
Send to GPT-4o
Stream response using StreamingResponse
Save query & response in DB

Real-Time Streaming

return StreamingResponse(generator(), media_type="text/event-stream")

Frontend receives incremental tokens in real time.

Features

✔ Multi-page PDF support
✔ Parallel document enhancement
✔ AI Vision OCR
✔ Real-time streaming LLM chat
✔ Multi-document reasoning
✔ Modular API architecture
✔ MySQL integration
✔ Status tracking

Installation

1. Install Dependencies

pip install fastapi uvicorn mysql-connector-python
pip install opencv-python torch torchvision
pip install pymupdf pillow
pip install openai

2. Run Server

uvicorn main:app --reload

Swagger UI available at: http://127.0.0.1:8000/docs

Requirements

Python 3.12.12+
MySQL running with doc_man schema
RealESRGAN weights inside /weights
OpenAI API key configured( I have used open source model name LLAMA 3.2)

Future Improvements

Background processing queue
User authentication & role-based access
Cloud deployment

Author

Talib Hussain Ansari
AI Engineer | Document Intelligence Systems

🏁 Summary

This project integrates Computer Vision, Deep Learning Enhancement, AI OCR, and LLM Streaming Chat into a scalable Document Intelligence platform designed for intelligent automation and interactive document analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
directory_to_enhancement_image		directory_to_enhancement_image
docman		docman
models		models
realesrgan_output		realesrgan_output
results		results
routers		routers
static/js		static/js
templates		templates
vectorstores		vectorstores
README.md		README.md
db.py		db.py
degain_esrgan_pipeline.py		degain_esrgan_pipeline.py
enhance.py		enhance.py
llm_client.py		llm_client.py
load_vector.py		load_vector.py
main.py		main.py
ocr_helper.py		ocr_helper.py
ocr_prompts.py		ocr_prompts.py
requirements.txt		requirements.txt
search_doc.py		search_doc.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Document Enhancement & Intelligent Chat System

Overview

System Architecture

Tech Stack

Project Structure

Database Schema

Table: `common`

Table: `querries`

API Endpoints

Upload Document

Preprocess Document

OCR Extraction

List Documents

Retrieve File

Chat with Documents

Real-Time Streaming

Features

Installation

1. Install Dependencies

2. Run Server

Requirements

Future Improvements

Author

🏁 Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Document Enhancement & Intelligent Chat System

Overview

System Architecture

Tech Stack

Project Structure

Database Schema

Table: common

Table: querries

API Endpoints

Upload Document

Preprocess Document

OCR Extraction

List Documents

Retrieve File

Chat with Documents

Real-Time Streaming

Features

Installation

1. Install Dependencies

2. Run Server

Requirements

Future Improvements

Author

🏁 Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Table: `common`

Table: `querries`

Packages