Skip to content

Progress-infinitely/DocNest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DocNest

A RAG (Retrieval-Augmented Generation) knowledge base system for document Q&A. Upload PDF or Markdown files, then ask questions in natural language -- answers are grounded in your uploaded documents.

Architecture

Document Import Pipeline

Upload PDF/Markdown
    |
    v
PDF -> Markdown (MinerU VLM)
    |
    v
Image Processing (VLM descriptions)
    |
    v
Document Chunking
    |
    v
Product Name Recognition (LLM)
    |
    v
BGE-M3 Embedding
    |
    v
Store in Milvus

Query Pipeline

User Question
    |
    v
Item Name Confirmation (LLM)
    |
    v
+------------------+  +---------------+  +------------------+
|  Vector Search   |  |  HyDE Search  |  |  Web Search (MCP)|
+------------------+  +---------------+  +------------------+
    |
    v
RRF (Reciprocal Rank Fusion)
    |
    v
Reranking (BGE Reranker)
    |
    v
Answer Generation (LLM) -> SSE Stream
    |
    v
Save to MongoDB History

Tech Stack

Layer Technology
Backend Python 3.13, FastAPI (async)
Orchestration LangGraph, LangChain
Vector DB Milvus
Object Storage MinIO
Chat History MongoDB
PDF Parsing MinerU (VLM mode, remote API)
LLM / Embedding / Rerank OpenAI-compatible API (DashScope / DeepSeek)
Frontend Plain HTML + vanilla JS

Project Structure

nest/
+-- api/                # FastAPI routes (import + query)
+-- core/               # Config, dependency injection, path resolution
+-- graph/
|   +-- import_graph/   # LangGraph document import pipeline
|   |   +-- nodes/      # PDF->MD, image, chunking, embedding, Milvus import
|   +-- query_graph/    # LangGraph query pipeline
|       +-- nodes/      # HyDE, vector search, rerank, RRF, answer output
+-- prompts/            # LLM prompt templates (Jinja2)
+-- schema/             # Pydantic models (request/response)
+-- services/           # Business logic layer
+-- utils/              # Clients (Milvus, MinIO, MongoDB), providers, SSE helpers

Getting Started

Prerequisites

  • Python 3.13+
  • Milvus instance
  • MongoDB instance
  • MinIO instance
  • MinerU API access (for PDF parsing)
  • LLM API key (OpenAI-compatible)

Install

pip install -r requirements.txt

Configure

cp .env.example .env
# Edit .env with your API keys and service endpoints

Run

# Import service (document upload & processing)
uvicorn nest.api.main:app --host 0.0.0.0 --port 8000

# Query service (chat & Q&A)
uvicorn nest.api.main:app --host 0.0.0.0 --port 8001

Open http://localhost:8000 for document import, http://localhost:8001 for chat.

About

RAG knowledge base for document Q&A. LangGraph pipeline with HyDE, vector search, reranking, and Reciprocal Rank Fusion (RRF).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors