Knowledge RAG Assistant

A production-ready RAG-powered knowledge assistant demonstrating modern LLM backend development practices: custom RAG pipeline, MCP server, PydanticAI tool-calling agent, LangGraph Corrective RAG, Langfuse observability, and semantic caching.

Example domain: Digimon knowledge base (DAPI)

Tech Stack

Layer	Technology
API	FastAPI (Python 3.11+)
LLM	Anthropic `claude-haiku-4-5`
Embeddings	Voyage AI `voyage-3` (1024 dims)
Vector DB	Qdrant
Relational DB	PostgreSQL + SQLAlchemy
Caching	Redis
Observability	Langfuse
Agent Protocol	MCP (Model Context Protocol)
Typed Agent	PydanticAI (tool-calling agent)
Agentic RAG	LangGraph (Corrective RAG workflow)
Infrastructure	Docker + docker-compose

Architecture

  ┌─────────────────────────────────────────────────────┐
  │                    Client (HTTP)                    │
  └─────────────────────────┬───────────────────────────┘
                            │
             ┌──────────────┴─────────────┐
             │                            │
  ┌──────────▼──────────────┐  ┌──────────▼──────────────────────┐
  │   POST /api/v1/chat     │  │   POST /api/v1/agent            │
  │      (Custom RAG)       │  │   POST /api/v1/agent/graph      │
  └──────────┬──────────────┘  └──────────┬──────────────────────┘
             │                            │
             └──────────────┬─────────────┘
                            │
  ┌─────────────────────────▼───────────────────────────┐
  │                      RAG Core                       │
  │      Retriever → PromptBuilder → LLMClient          │
  └───────────────────┬─────────────────┬───────────────┘
                      │                 │
           ┌──────────▼──┐     ┌────────▼──────┐
           │   Qdrant    │     │     DAPI      │
           │  (vectors)  │     │  (live API)   │
           └─────────────┘     └───────────────┘

  LangGraph Corrective RAG flow:
  retrieve → grade → generate
                └──→ rewrite → retrieve (max 2 loops)

  Observability: Langfuse traces all LLM calls and retrievals
  Caching:       Redis caches /chat responses by query + filters

Features

Custom RAG Pipeline: Retriever → Prompt Builder → LLM, built from scratch without LangChain
PydanticAI Agent (/api/v1/agent): type-safe agent with 4 tools (RAG search, by name, by level, skills) and automatic tool selection
LangGraph Corrective RAG (/api/v1/agent/graph): agentic retrieve → grade → generate workflow with query rewriting when docs aren't relevant
MCP Server: 4 tools exposing Digimon data (by name, level, ID, skills)
Observability: Full tracing with Langfuse (LLM calls, retrievals, tool invocations)
Semantic Caching: Redis-based response caching to reduce API costs
Data Ingestion: Async pipeline from DAPI → Voyage AI embeddings → Qdrant + PostgreSQL

Setup

Prerequisites

Docker and Docker Compose
Python 3.11+
Anthropic API key → console.anthropic.com
Voyage AI API key → dash.voyageai.com

Installation

Clone the repository:

git clone https://github.com/JaimeRam/knowledge-rag-assistant.git
cd knowledge-rag-assistant

Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Python dependencies:

pip install -r requirements.txt

Copy and configure the environment file:

cp .env.example .env
# Edit .env — set ANTHROPIC_API_KEY and VOYAGE_API_KEY at minimum

Start infrastructure services (Qdrant, Redis, PostgreSQL, Langfuse):

make services

Run data ingestion:

make ingest

Voyage AI free tier note: without a payment method the limit is 3 RPM. The default INGEST_LIMIT=100 in .env keeps ingestion under ~5 minutes. Set INGEST_LIMIT=0 to ingest all ~1450 Digimon once you add a payment method (200M free tokens still apply).

Start the API:

make run

The API will be available at http://localhost:8000. Interactive docs at http://localhost:8000/docs.

Docker (full stack)

docker build -t digimon-rag-assistant .
docker run -p 8000:8000 --env-file .env digimon-rag-assistant

API Endpoints

Chat with RAG

POST /api/v1/chat
Content-Type: application/json

{
  "query": "What is Agumon?",
  "level_filter": "Rookie",
  "type_filter": "Reptile"
}

PydanticAI Agent (tool-calling)

POST /api/v1/agent
Content-Type: application/json

{"query": "What are the skills of Agumon?"}
# → agent selects get_digimon_by_name + get_digimon_skills automatically

{"query": "Which Rookie Digimon are best for beginners?"}
# → agent selects rag_search + get_digimon_by_level automatically

Response includes tool_calls (list of tools used) and token usage.

LangGraph Corrective RAG

POST /api/v1/agent/graph
Content-Type: application/json

{"query": "Tell me about fire-type Rookie Digimon"}

Response includes iterations (0 = direct, 1+ = query was rewritten) and documents_used.

Get Digimon by ID

GET /api/v1/digimon/{id}

List Digimons

GET /api/v1/digimon?limit=10&offset=0

Health Check

GET /api/v1/health

Data Ingestion

The ingestion pipeline fetches data from DAPI:

Fetches Digimon data (respects INGEST_LIMIT in .env)
Prepares text chunks per Digimon (basic info, description, skills)
Generates vector embeddings with Voyage AI (voyage-3, 1024 dims) in rate-limited batches
Stores embeddings in Qdrant and metadata in PostgreSQL

Key .env variables for ingestion:

Variable	Default	Description
`INGEST_LIMIT`	`100`	Max Digimon to ingest (0 = all ~1450)
`VOYAGE_EMBED_BATCH_SIZE`	`20`	Chunks per embedding API call
`VOYAGE_RPM`	`3`	Voyage AI requests per minute (free tier = 3)

To re-run ingestion:

make ingest

Project Structure

knowledge-rag-assistant/
├── app/
│   ├── api/              # FastAPI routes and app setup
│   │   ├── routes/
│   │   │   ├── chat.py       # /chat — Custom RAG + Redis caching
│   │   │   ├── agent.py      # /agent — PydanticAI + /agent/graph — LangGraph
│   │   │   ├── digimon.py    # Digimon metadata endpoints
│   │   │   └── health.py     # Health check
│   │   └── main.py
│   ├── core/             # Config (pydantic-settings) and logging
│   ├── ingestion/        # DAPI client, embedder, ingest pipeline
│   ├── rag/              # Retriever, prompt builder, LLM client
│   ├── mcp/              # MCP server with 4 Digimon tools
│   ├── agents/
│   │   ├── pydantic_agent.py # PydanticAI agent with 4 typed tools
│   │   └── graph/            # LangGraph Corrective RAG
│   │       ├── state.py      # GraphState TypedDict
│   │       ├── nodes.py      # retrieve / grade / generate / rewrite nodes
│   │       └── workflow.py   # Compiled StateGraph
│   ├── observability/    # Langfuse tracing integration
│   └── db/               # Qdrant, PostgreSQL, Redis managers
├── docker/
│   └── docker-compose.yml
├── tests/
├── requirements.txt
├── Dockerfile
├── CONTRIBUTING.md
└── README.md

Technical Highlights

This project demonstrates key skills for LLM Backend Developer roles:

Custom RAG without LangChain: Built the full pipeline (retrieval → prompt → generation) from first principles — shows deep understanding of how RAG works, not just framework usage.
Three agent patterns side by side: Custom RAG (/chat), PydanticAI typed agent (/agent), and LangGraph Corrective RAG (/agent/graph) — each demonstrates a different architectural trade-off.
PydanticAI typed tools: Type-safe agent with @agent.tool_plain decorators, AnthropicModel, and automatic tool selection. Zero boilerplate.
LangGraph Corrective RAG: Stateful retrieve → grade → generate loop with query rewriting when retrieved docs aren't relevant — demonstrates iterative reasoning beyond single-shot generation.
MCP (Model Context Protocol): Implements the emerging standard for tool integration between LLMs and external systems.
Production Observability: Full tracing with Langfuse covering LLM calls, retrieval steps, and tool invocations — critical for debugging and cost monitoring.
Cost Optimization: Redis caching avoids redundant LLM calls for identical queries, with proper cache-key design to prevent collisions across different filter combinations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledge RAG Assistant

Tech Stack

Architecture

Features

Setup

Prerequisites

Installation

Docker (full stack)

API Endpoints

Chat with RAG

PydanticAI Agent (tool-calling)

LangGraph Corrective RAG

Get Digimon by ID

List Digimons

Health Check

Data Ingestion

Project Structure

Technical Highlights

Roadmap

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
docker		docker
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Knowledge RAG Assistant

Tech Stack

Architecture

Features

Setup

Prerequisites

Installation

Docker (full stack)

API Endpoints

Chat with RAG

PydanticAI Agent (tool-calling)

LangGraph Corrective RAG

Get Digimon by ID

List Digimons

Health Check

Data Ingestion

Project Structure

Technical Highlights

Roadmap

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages