RAG Knowledge Base Platform with custom GraphRAG on PostgreSQL.
AskGraph is a self-hosted retrieval-augmented generation platform that lets you upload documents, build a queryable knowledge base, and get grounded answers from a language model of your choice. Unlike systems that rely on dedicated graph databases or closed-source embedding APIs, AskGraph implements its full GraphRAG pipeline — entity extraction, knowledge graph construction, and multi-hop traversal — entirely inside PostgreSQL using pgvector and recursive CTEs. The result is a single-database architecture that is straightforward to operate, easy to back up, and capable of three distinct retrieval strategies (vector similarity, graph traversal, and a weighted hybrid of both) selectable per query.
Browser
|
v
Next.js (port 3000)
| REST + WebSocket
v
FastAPI (port 8000)
|
+---> PostgreSQL 17 + pgvector
| |
| +-- collections
| +-- documents
| +-- chunks (embeddings via HNSW index)
| +-- kg_entities (entity nodes, embeddings via HNSW index)
| +-- kg_entity_chunks
| +-- kg_relations (edges with weight, recursive CTE traversal)
|
+---> LLM provider (Ollama / OpenAI / any LLMWire-compatible)
|
+---> Embedding provider (local sentence-transformers / OpenAI)
- Document ingestion for PDF, DOCX, and plain text files, with configurable chunk size and overlap
- Three retrieval modes per query: pure vector similarity search, knowledge graph traversal, and a weighted hybrid of both
- Knowledge graph built automatically during ingestion: entities and typed relations are extracted by the LLM from every chunk and stored as nodes and edges with pgvector embeddings
- Graph traversal using a
WITH RECURSIVECTE overkg_relations, following edges bidirectionally up to a configurable hop depth, with relation-weight decay per hop - Streaming chat responses over WebSocket, with per-token delivery and citation metadata
- Collection-scoped search so documents are isolated by project or tenant
- Knowledge graph visualisation endpoint returning all entities and relations for a collection
- Alembic-managed database schema with HNSW indexes on both chunk and entity embedding columns
- Fully containerised: single
docker compose up -dstarts database, API, and frontend
# Clone the repository
git clone https://github.com/your-org/askgraph.git
cd askgraph
# Start all services
docker compose up -d
# Open the application
open http://localhost:3000The API interactive docs are available at http://localhost:8000/docs.
To use OpenAI instead of Ollama:
OPENAI_API_KEY=sk-... LLM_PROVIDER=openai LLM_MODEL=gpt-4o docker compose up -d| Method | Path | Description |
|---|---|---|
| GET | /health |
Service liveness check |
| POST | /collections |
Create a new collection |
| GET | /collections |
List all collections |
| DELETE | /collections/{id} |
Delete a collection and all its documents |
| POST | /documents |
Upload and ingest a document (multipart/form-data) |
| GET | /documents?collection_id={id} |
List documents in a collection |
| DELETE | /documents/{id} |
Delete a document |
| POST | /chat |
Run a RAG query, returns answer + citations |
| WS | /chat/stream |
Stream a RAG answer token-by-token over WebSocket |
| GET | /kg?collection_id={id} |
Return all KG entities and relations for a collection |
{
"query": "What are the main themes in the uploaded papers?",
"collection_id": "uuid",
"retrieval_mode": "hybrid",
"top_k": 5
}Send a JSON message after connecting to /chat/stream:
{
"query": "Explain the architecture",
"collection_id": "uuid",
"retrieval_mode": "graph",
"top_k": 5
}The server sends {"type": "token", "content": "..."} events during generation, followed by a single {"type": "citations", "data": [...]} event.
Standard dense retrieval. The query is embedded using the configured sentence-transformer model, then the chunks table is searched by cosine distance via the HNSW index on chunks.embedding. Returns the top-k most similar chunks.
Knowledge-graph traversal. The query embedding is first used to find the top-3 most similar entity nodes from kg_entities (seed entities). A WITH RECURSIVE CTE then walks kg_relations bidirectionally up to 2 hops. Each hop multiplies the previous relation weight by 1/(depth+1) to decay scores with distance. The best score per chunk is kept and the top-k chunks are returned. This mode is particularly effective for multi-hop questions that require connecting related concepts across documents.
Runs both the vector and graph retrievers, then fuses their result sets. Each chunk receives a merged score of 0.5 * vector_score + 0.5 * graph_score. Chunks that appear in only one result set receive 0.0 for the missing signal. The fused list is sorted by merged score and truncated to top-k. Weights are configurable in the HybridRetriever constructor.
All settings are read from environment variables (or a .env file in the project root).
| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
postgresql+asyncpg://askgraph:askgraph@localhost:5432/askgraph |
Async SQLAlchemy database URL |
EMBEDDING_PROVIDER |
local |
Embedding backend: local or openai |
EMBEDDING_MODEL |
all-MiniLM-L6-v2 |
Model name for the embedding provider |
LLM_PROVIDER |
ollama |
LLM backend: ollama, openai, or others |
LLM_MODEL |
llama3 |
Model name for the LLM provider |
LLM_API_KEY |
(empty) | API key for the LLM provider (if required) |
OPENAI_API_KEY |
(empty) | OpenAI API key (used when provider is openai) |
CHUNK_SIZE |
512 |
Maximum characters per chunk |
CHUNK_OVERLAP |
50 |
Character overlap between adjacent chunks |
Run these after ingesting a representative document set. Replace the placeholder values once measured.
| Mode | Precision@5 | Recall@5 | Latency (p50) | Latency (p95) |
|---|---|---|---|---|
| vector | — | — | — | — |
| graph | — | — | — | — |
| hybrid | — | — | — | — |
Run benchmarks after ingesting data and comparing retrieval results against a ground-truth QA set.
The design of AskGraph draws from the following research:
-
LightRAG — Simple and Fast Retrieval-Augmented Generation https://arxiv.org/abs/2410.05779
-
Microsoft GraphRAG — From Local to Global: A Graph RAG Approach to Query-Focused Summarization https://arxiv.org/abs/2404.16130
-
Original RAG — Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks https://arxiv.org/abs/2005.11401
MIT