RAG Knowledge Base

A private document RAG (Retrieval-Augmented Generation) system that ingests PDFs and exposes a search tool via an MCP server. Retrieval combines vector search (pgvector) and BM25 keyword search with cross-encoder reranking, and answers are generated by an AWS Bedrock LLM.

Architecture

flowchart LR
    subgraph Ingestion
        direction TB
        A[📄 data/ PDFs] --> B[Docling<br/>PDF Parser]
        B --> C[HybridChunker<br/>BAAI/bge-m3 tokenizer]
        C --> D[HuggingFace Embeddings<br/>BAAI/bge-m3 · 1024-dim]
        D --> E[(pgvector<br/>PostgreSQL)]
        C --> F[(Redis<br/>BM25 Docstore)]
    end

    subgraph Query["Query  —  mcp_server.py"]
        direction TB
        G[search_knowledge<br/>tool call] --> H[Vector Retriever<br/>pgvector]
        G --> I[BM25 Retriever<br/>Redis]
        H & I --> J[QueryFusionRetriever<br/>relative_score fusion]
        J --> K[Cross-encoder Reranker<br/>BAAI/bge-reranker-large]
        K --> L[BedrockConverse LLM]
        L --> M[Answer + Sources]
    end

    E --> H
    F --> I

Prerequisites

Python 3.11+
uv
Docker & Docker Compose
AWS credentials with Bedrock access

Setup

1. Start infrastructure

docker compose up pgvector redis -d

2. Create a .env file

DATABASE_URL=postgresql://chat-app:admin@localhost:5432/chat_app
BEDROCK_API_KEY=<your-aws-bearer-token>
AWS_REGION=eu-central-1

# Optional overrides (defaults shown)
EMBED_MODEL=BAAI/bge-m3
EMBED_DIM=1024
TABLE_NAME=documents
LLM_MODEL=openai.gpt-oss-20b-1:0
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_NAMESPACE=rag
RERANK_MODEL=BAAI/bge-reranker-large
RERANK_TOP_N=5
SIMILARITY_TOP_K=10

3. Install dependencies

uv sync

Ingestion

Drop PDF files into the data/ directory, then run:

uv run python ingest.py

This parses each PDF with Docling, chunks and embeds the content, stores vectors in PostgreSQL, and persists nodes to Redis for BM25 retrieval. Already-ingested documents are upserted (not duplicated).

MCP Server

Run the MCP server locally:

uv run python mcp_server.py

The server starts on http://localhost:8000 using SSE transport and exposes a single tool:

Tool	Description
`search_knowledge`	Searches the knowledge base and returns an answer with source file citations

Docker

To run the full stack including the MCP server in Docker:

docker compose up -d

Project Structure

.
├── data/                  # PDF documents to ingest
├── ingestion/
│   ├── config.py          # Pydantic settings (loaded from .env)
│   └── pipeline.py        # Docling parsing, embedding, pgvector + Redis ingestion
├── query/
│   └── engine.py          # Hybrid retriever + reranker + Bedrock LLM query engine
├── ingest.py              # Ingestion entry point
├── mcp_server.py          # FastMCP server exposing search_knowledge tool
├── Dockerfile
└── docker-compose.yml

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
ingestion		ingestion
query		query
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
ask.py		ask.py
docker-compose.yml		docker-compose.yml
ingest.py		ingest.py
mcp_server.py		mcp_server.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Knowledge Base

Architecture

Prerequisites

Setup

Ingestion

MCP Server

Docker

Project Structure

About

Uh oh!

Languages

RoodyCode/rag

Folders and files

Latest commit

History

Repository files navigation

RAG Knowledge Base

Architecture

Prerequisites

Setup

Ingestion

MCP Server

Docker

Project Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages