Skip to content
/ rag Public

A modular, self-hosted RAG pipeline for building a private, searchable personal knowledge base from PDFs and structured documents.

Notifications You must be signed in to change notification settings

RoodyCode/rag

Repository files navigation

RAG Knowledge Base

A private document RAG (Retrieval-Augmented Generation) system that ingests PDFs and exposes a search tool via an MCP server. Retrieval combines vector search (pgvector) and BM25 keyword search with cross-encoder reranking, and answers are generated by an AWS Bedrock LLM.

Architecture

flowchart LR
    subgraph Ingestion
        direction TB
        A[📄 data/ PDFs] --> B[Docling<br/>PDF Parser]
        B --> C[HybridChunker<br/>BAAI/bge-m3 tokenizer]
        C --> D[HuggingFace Embeddings<br/>BAAI/bge-m3 · 1024-dim]
        D --> E[(pgvector<br/>PostgreSQL)]
        C --> F[(Redis<br/>BM25 Docstore)]
    end

    subgraph Query["Query  —  mcp_server.py"]
        direction TB
        G[search_knowledge<br/>tool call] --> H[Vector Retriever<br/>pgvector]
        G --> I[BM25 Retriever<br/>Redis]
        H & I --> J[QueryFusionRetriever<br/>relative_score fusion]
        J --> K[Cross-encoder Reranker<br/>BAAI/bge-reranker-large]
        K --> L[BedrockConverse LLM]
        L --> M[Answer + Sources]
    end

    E --> H
    F --> I
Loading

Prerequisites

  • Python 3.11+
  • uv
  • Docker & Docker Compose
  • AWS credentials with Bedrock access

Setup

1. Start infrastructure

docker compose up pgvector redis -d

2. Create a .env file

DATABASE_URL=postgresql://chat-app:admin@localhost:5432/chat_app
BEDROCK_API_KEY=<your-aws-bearer-token>
AWS_REGION=eu-central-1

# Optional overrides (defaults shown)
EMBED_MODEL=BAAI/bge-m3
EMBED_DIM=1024
TABLE_NAME=documents
LLM_MODEL=openai.gpt-oss-20b-1:0
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_NAMESPACE=rag
RERANK_MODEL=BAAI/bge-reranker-large
RERANK_TOP_N=5
SIMILARITY_TOP_K=10

3. Install dependencies

uv sync

Ingestion

Drop PDF files into the data/ directory, then run:

uv run python ingest.py

This parses each PDF with Docling, chunks and embeds the content, stores vectors in PostgreSQL, and persists nodes to Redis for BM25 retrieval. Already-ingested documents are upserted (not duplicated).

MCP Server

Run the MCP server locally:

uv run python mcp_server.py

The server starts on http://localhost:8000 using SSE transport and exposes a single tool:

Tool Description
search_knowledge Searches the knowledge base and returns an answer with source file citations

Docker

To run the full stack including the MCP server in Docker:

docker compose up -d

Project Structure

.
├── data/                  # PDF documents to ingest
├── ingestion/
│   ├── config.py          # Pydantic settings (loaded from .env)
│   └── pipeline.py        # Docling parsing, embedding, pgvector + Redis ingestion
├── query/
│   └── engine.py          # Hybrid retriever + reranker + Bedrock LLM query engine
├── ingest.py              # Ingestion entry point
├── mcp_server.py          # FastMCP server exposing search_knowledge tool
├── Dockerfile
└── docker-compose.yml

About

A modular, self-hosted RAG pipeline for building a private, searchable personal knowledge base from PDFs and structured documents.

Topics

Resources

Stars

Watchers

Forks