A production-ready RAG-powered knowledge assistant demonstrating modern LLM backend development practices: custom RAG pipeline, MCP server, PydanticAI tool-calling agent, LangGraph Corrective RAG, Langfuse observability, and semantic caching.
Example domain: Digimon knowledge base (DAPI)
| Layer | Technology |
|---|---|
| API | FastAPI (Python 3.11+) |
| LLM | Anthropic claude-haiku-4-5 |
| Embeddings | Voyage AI voyage-3 (1024 dims) |
| Vector DB | Qdrant |
| Relational DB | PostgreSQL + SQLAlchemy |
| Caching | Redis |
| Observability | Langfuse |
| Agent Protocol | MCP (Model Context Protocol) |
| Typed Agent | PydanticAI (tool-calling agent) |
| Agentic RAG | LangGraph (Corrective RAG workflow) |
| Infrastructure | Docker + docker-compose |
┌─────────────────────────────────────────────────────┐
│ Client (HTTP) │
└─────────────────────────┬───────────────────────────┘
│
┌──────────────┴─────────────┐
│ │
┌──────────▼──────────────┐ ┌──────────▼──────────────────────┐
│ POST /api/v1/chat │ │ POST /api/v1/agent │
│ (Custom RAG) │ │ POST /api/v1/agent/graph │
└──────────┬──────────────┘ └──────────┬──────────────────────┘
│ │
└──────────────┬─────────────┘
│
┌─────────────────────────▼───────────────────────────┐
│ RAG Core │
│ Retriever → PromptBuilder → LLMClient │
└───────────────────┬─────────────────┬───────────────┘
│ │
┌──────────▼──┐ ┌────────▼──────┐
│ Qdrant │ │ DAPI │
│ (vectors) │ │ (live API) │
└─────────────┘ └───────────────┘
LangGraph Corrective RAG flow:
retrieve → grade → generate
└──→ rewrite → retrieve (max 2 loops)
Observability: Langfuse traces all LLM calls and retrievals
Caching: Redis caches /chat responses by query + filters
- Custom RAG Pipeline: Retriever → Prompt Builder → LLM, built from scratch without LangChain
- PydanticAI Agent (
/api/v1/agent): type-safe agent with 4 tools (RAG search, by name, by level, skills) and automatic tool selection - LangGraph Corrective RAG (
/api/v1/agent/graph): agentic retrieve → grade → generate workflow with query rewriting when docs aren't relevant - MCP Server: 4 tools exposing Digimon data (by name, level, ID, skills)
- Observability: Full tracing with Langfuse (LLM calls, retrievals, tool invocations)
- Semantic Caching: Redis-based response caching to reduce API costs
- Data Ingestion: Async pipeline from DAPI → Voyage AI embeddings → Qdrant + PostgreSQL
- Docker and Docker Compose
- Python 3.11+
- Anthropic API key → console.anthropic.com
- Voyage AI API key → dash.voyageai.com
- Clone the repository:
git clone https://github.com/JaimeRam/knowledge-rag-assistant.git
cd knowledge-rag-assistant- Create and activate a virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install Python dependencies:
pip install -r requirements.txt- Copy and configure the environment file:
cp .env.example .env
# Edit .env — set ANTHROPIC_API_KEY and VOYAGE_API_KEY at minimum- Start infrastructure services (Qdrant, Redis, PostgreSQL, Langfuse):
make services- Run data ingestion:
make ingestVoyage AI free tier note: without a payment method the limit is 3 RPM. The default
INGEST_LIMIT=100in.envkeeps ingestion under ~5 minutes. SetINGEST_LIMIT=0to ingest all ~1450 Digimon once you add a payment method (200M free tokens still apply).
- Start the API:
make runThe API will be available at http://localhost:8000. Interactive docs at http://localhost:8000/docs.
docker build -t digimon-rag-assistant .
docker run -p 8000:8000 --env-file .env digimon-rag-assistantPOST /api/v1/chat
Content-Type: application/json
{
"query": "What is Agumon?",
"level_filter": "Rookie",
"type_filter": "Reptile"
}POST /api/v1/agent
Content-Type: application/json
{"query": "What are the skills of Agumon?"}
# → agent selects get_digimon_by_name + get_digimon_skills automatically
{"query": "Which Rookie Digimon are best for beginners?"}
# → agent selects rag_search + get_digimon_by_level automaticallyResponse includes tool_calls (list of tools used) and token usage.
POST /api/v1/agent/graph
Content-Type: application/json
{"query": "Tell me about fire-type Rookie Digimon"}Response includes iterations (0 = direct, 1+ = query was rewritten) and documents_used.
GET /api/v1/digimon/{id}GET /api/v1/digimon?limit=10&offset=0GET /api/v1/healthThe ingestion pipeline fetches data from DAPI:
- Fetches Digimon data (respects
INGEST_LIMITin.env) - Prepares text chunks per Digimon (basic info, description, skills)
- Generates vector embeddings with Voyage AI (
voyage-3, 1024 dims) in rate-limited batches - Stores embeddings in Qdrant and metadata in PostgreSQL
Key .env variables for ingestion:
| Variable | Default | Description |
|---|---|---|
INGEST_LIMIT |
100 |
Max Digimon to ingest (0 = all ~1450) |
VOYAGE_EMBED_BATCH_SIZE |
20 |
Chunks per embedding API call |
VOYAGE_RPM |
3 |
Voyage AI requests per minute (free tier = 3) |
To re-run ingestion:
make ingestknowledge-rag-assistant/
├── app/
│ ├── api/ # FastAPI routes and app setup
│ │ ├── routes/
│ │ │ ├── chat.py # /chat — Custom RAG + Redis caching
│ │ │ ├── agent.py # /agent — PydanticAI + /agent/graph — LangGraph
│ │ │ ├── digimon.py # Digimon metadata endpoints
│ │ │ └── health.py # Health check
│ │ └── main.py
│ ├── core/ # Config (pydantic-settings) and logging
│ ├── ingestion/ # DAPI client, embedder, ingest pipeline
│ ├── rag/ # Retriever, prompt builder, LLM client
│ ├── mcp/ # MCP server with 4 Digimon tools
│ ├── agents/
│ │ ├── pydantic_agent.py # PydanticAI agent with 4 typed tools
│ │ └── graph/ # LangGraph Corrective RAG
│ │ ├── state.py # GraphState TypedDict
│ │ ├── nodes.py # retrieve / grade / generate / rewrite nodes
│ │ └── workflow.py # Compiled StateGraph
│ ├── observability/ # Langfuse tracing integration
│ └── db/ # Qdrant, PostgreSQL, Redis managers
├── docker/
│ └── docker-compose.yml
├── tests/
├── requirements.txt
├── Dockerfile
├── CONTRIBUTING.md
└── README.md
This project demonstrates key skills for LLM Backend Developer roles:
-
Custom RAG without LangChain: Built the full pipeline (retrieval → prompt → generation) from first principles — shows deep understanding of how RAG works, not just framework usage.
-
Three agent patterns side by side: Custom RAG (
/chat), PydanticAI typed agent (/agent), and LangGraph Corrective RAG (/agent/graph) — each demonstrates a different architectural trade-off. -
PydanticAI typed tools: Type-safe agent with
@agent.tool_plaindecorators,AnthropicModel, and automatic tool selection. Zero boilerplate. -
LangGraph Corrective RAG: Stateful
retrieve → grade → generateloop with query rewriting when retrieved docs aren't relevant — demonstrates iterative reasoning beyond single-shot generation. -
MCP (Model Context Protocol): Implements the emerging standard for tool integration between LLMs and external systems.
-
Production Observability: Full tracing with Langfuse covering LLM calls, retrieval steps, and tool invocations — critical for debugging and cost monitoring.
-
Cost Optimization: Redis caching avoids redundant LLM calls for identical queries, with proper cache-key design to prevent collisions across different filter combinations.
- Custom RAG pipeline (Retriever → PromptBuilder → LLMClient)
- MCP server with 4 tools
- Anthropic Claude Haiku + Voyage AI voyage-3 integration
- Langfuse observability
- Redis caching with proper cache-key design
- Docker infrastructure
- PydanticAI typed agent with automatic tool selection
- LangGraph Corrective RAG with query rewriting
- Comprehensive test suite with pytest
- RAG evaluation pipeline (precision, recall, faithfulness)
- Guardrails for output validation
- Rate limiting and authentication
See CONTRIBUTING.md for development setup and guidelines.
MIT — see LICENSE for details.