Propagation & Retrieval via Informed Semantic Mapping
Epistemic Graph RAG with Spreading Activation
PRISM is a retrieval library that layers a typed epistemic knowledge graph over your existing vector store, then uses spreading activation to surface knowledge structured by how it relates — not just how similar it is.
Standard RAG returns a flat ranked list. Every chunk is treated the same — a similarity score and nothing else.
query → embed → similarity → [chunk, chunk, chunk, ...] ← no structure
This loses the relational fabric of your knowledge. Chunk B may refute Chunk A. Chunk C may specialize a principle in Chunk D. An older document may have been superseded by a newer one. Standard RAG can't express any of this — and neither can the LLM reasoning over it.
PRISM builds a graph where edges carry epistemic type:
Doc A ──[supports]──▶ Doc B
Doc C ──[refutes]───▶ Doc D
Doc E ──[supersedes]▶ Doc F
Retrieval then uses spreading activation: a query fires seed nodes via vector search, activation propagates through typed edges, and nodes reached by multiple independent paths (convergence) rank highest.
The result is a structured epistemic answer — not a ranked list:
| Bucket | Contents |
|---|---|
| PRIMARY | Core relevant chunks, highest convergence |
| SUPPORTING | Chunks that reinforce or extend the primary answer |
| CONTRASTING | Chunks that challenge or take a different position |
| QUALIFYING | Chunks that add conditions, exceptions, nuances |
| SUPERSEDED | Historically relevant context now replaced by newer work |
Three stages:
- Seed — embed the query, vector-search your corpus, get top-K scored chunks as activation seeds
- Activate — propagate activation through the epistemic graph; track which seeds independently reach each node (convergence)
- Structure — bucket results by epistemic role based on dominant edge valence
| System | Vector Search | Knowledge Graph | Epistemic Edge Typing | Spreading Activation |
|---|---|---|---|---|
| Standard RAG | ✅ | ❌ | ❌ | ❌ |
| GraphRAG (Microsoft, 2024) | ✅ | ✅ | ❌ | ❌ |
| SYNAPSE (2026) | ✅ | ✅ | ❌ | ✅ |
| PRISM | ✅ | ✅ | ✅ | ✅ |
To our knowledge, no open-source retrieval library combines all four of these signals. If you know of one, please open an issue — we'd genuinely like to know.
supports — A provides evidence reinforcing B
refutes — A directly contradicts B
supersedes — A replaces or updates B (A is newer / more authoritative)
derives_from — A is logically or conceptually derived from B
specializes — A is a specific instance of the general principle in B
contrasts_with — A and B take different but non-exclusive positions
implements — A is a concrete method that puts the abstract concept of B into practice
generalizes — A is a broader abstraction of which B is a specific case
exemplifies — A is a concrete example illustrating the concept in B
qualifies — A adds conditions, exceptions, or nuances to B
Each edge has:
- A propagation weight — how strongly it carries activation (0.40–0.90)
- A valence — determines which result bucket the target lands in (positive / qualifying / dialectical / temporal)
pip install prism-ragFrom source:
git clone https://github.com/MadMando/prism
cd prism
pip install -e .Requirements: Python 3.11+, an existing LanceDB vector store, and an embedding provider (Ollama local or any OpenAI-compatible API).
from prism import PRISM
# Using Ollama for embeddings (local)
p = PRISM(
lancedb_path = "/path/to/your/lancedb",
graph_path = "/path/to/prism_graph.json.gz",
ollama_url = "http://localhost:11434",
embed_model = "nomic-embed-text",
llm_base_url = "https://api.openai.com",
llm_model = "gpt-4o-mini",
llm_api_key = "sk-...",
)
# Using an API for embeddings instead
p = PRISM(
lancedb_path = "/path/to/your/lancedb",
graph_path = "/path/to/prism_graph.json.gz",
embed_api_url = "https://api.openai.com/v1/embeddings",
embed_api_key = "sk-...",
embed_model = "text-embedding-3-small",
llm_base_url = "https://api.openai.com",
llm_model = "gpt-4o-mini",
llm_api_key = "sk-...",
)
p.build(
k_neighbors = 8, # semantic neighbours per chunk to examine
cross_source_only = False, # False recommended — see note below
max_pairs = 50_000 # cap for testing; omit for full build
)Or via CLI:
prism-build \
--lancedb-path /path/to/lancedb \
--graph-path /path/to/prism_graph.json.gz \
--llm-api-key $OPENAI_API_KEY \
--all-sources
cross_source_onlyvsall-sourcesThe default (
cross_source_only=True/ no--all-sourcesflag) only extracts edges between different source documents. This sounds conservative but in practice leaves most intra-book relationships unextracted — a 5,000-chunk book like DMBOK has hundreds of internalsupports/derives_from/specializesconnections that the graph will never see.Use
cross_source_only=False(pass--all-sources) for richer graphs. On a 30k-chunk corpus, this roughly doubles candidate pairs and produces 3–5× more edges, making the epistemic buckets (SUPPORTING, QUALIFYING, CONTRASTING) fire on far more queries.
p.load_graph()
result = p.retrieve("your question here", top_k=5)
print(result.format_for_llm())Output:
PRISM retrieval for: "your question here"
────────────────────────────────────────────────────────────
## PRIMARY
[1] source-a p.14 § 2.1 (score: 0.923)
The core relevant passage from your corpus...
[2] source-b p.67 § 5.0 (score: 0.891)
Another highly activated chunk...
## SUPPORTING EVIDENCE
[1] source-c p.201 § 8.2 (score: 0.841 [via: specializes])
A passage that specializes or extends the primary answer...
## QUALIFICATIONS & NUANCES
[1] source-d p.38 § 3.1 (score: 0.712 [via: qualifies])
A passage adding conditions or exceptions...
─ 2 primary · 1 supporting · 0 contrasting · 1 qualifying · 0 superseded ─
# Access buckets directly
for chunk in result.primary:
print(chunk.source, chunk.page, chunk.final_score)
print(chunk.text)
for chunk in result.contrasting:
print("Contrasting view:", chunk.text[:200])
# Feed into your LLM
context = result.format_for_llm()
# ... pass `context` to your LLM system prompt
# Stats
print(f"Seeds: {result.n_seeds}")
print(f"Graph nodes reached: {result.n_graph_nodes}")
print(f"Graph used: {result.graph_was_used}")PRISM supports two embedding modes — use whichever matches how your corpus was built.
PRISM(
ollama_url = "http://localhost:11434", # your Ollama instance
embed_model = "nomic-embed-text", # any model loaded in Ollama
...
)Popular Ollama embedding models: nomic-embed-text, mxbai-embed-large, all-minilm
PRISM(
embed_api_url = "https://api.openai.com/v1/embeddings", # or any compatible endpoint
embed_api_key = "sk-...",
embed_model = "text-embedding-3-small",
...
)Works with OpenAI, Azure OpenAI, Together AI, Jina, Cohere, Mistral, or any endpoint that accepts {"model": ..., "input": ...} and returns {"data": [{"embedding": [...]}]}.
Important: The embedding model at retrieval time must match the model used when your LanceDB corpus was originally indexed. Dimensions must be identical.
PRISM works on top of your existing vector store. If you already have a LanceDB corpus with embeddings, you do not need to re-index anything.
- Existing vectors → used as-is for seed activation
- Epistemic graph → built from text via LLM, stored separately as a
.json.gzfile - Build is a one-time offline step
The build phase uses a two-stage pipeline to extract epistemic relationships efficiently:
Stage 1 — Ollama pre-filter (fast, free)
An Ollama model screens every candidate pair with a binary question: does any epistemic relationship exist here at all? About half of all candidate pairs are topically similar but not epistemically related — Stage 1 discards these before any API call is made. Runs via Ollama, costs nothing.
Stage 2 — Async LLM classification
Surviving pairs are sent to an OpenAI-compatible LLM in batches of 20, with 20 concurrent requests running in parallel. For each pair that passes Stage 1, the LLM determines the relationship type, direction, and confidence score. The async architecture means 20 batches complete in the time v1 took to complete one.
Stage 3 — Graph construction
Confirmed relationships (above confidence threshold) become typed, weighted edges saved to prism_graph.json.gz.
Build time comparison (30k-chunk corpus, ~50k candidate pairs):
| Pipeline | Batches | Wall Time |
|---|---|---|
| v1 — sync, batch=5 | ~10,000 | ~40 hours |
| v2 — async, batch=20, no filter | ~2,500 | ~30 minutes |
| v2 — async + stage-1 filter (fast model) | ~1,250 | ~15–20 minutes |
Cost estimate (30k-chunk corpus, v2 pipeline):
| Item | Estimate |
|---|---|
| Stage 1 Ollama calls (free) | ~5,000 |
| Stage 2 LLM calls after filter (batch=20) | ~1,250 |
| Input tokens | ~12M–18M |
| Output tokens | ~2M–4M |
Cost with deepseek-chat |
~$2–4 |
Cost with gpt-4o-mini |
~$1–2 |
| Total wall time | ~15–20 min |
Checkpoint & resume — if the build is interrupted, PRISM saves a .partial.json.gz checkpoint and resumes automatically from where it left off.
Stage 1 model check — PRISM verifies the filter model is available in Ollama before starting. If it isn't, it prints a clear warning listing available models and skips Stage 1 rather than silently doing nothing. Pass --filter-model <name> or filter_model= to override.
Stage 1 only pays off if the filter model is fast. The goal is a binary yes/no decision per pair in well under a second — if your model is slower than the API you're filtering for, skip Stage 1 entirely with --no-filter.
Rule of thumb: use a model under ~10 GB. On typical hardware these complete in 0.2–0.8 seconds per call. Models above 20–30 GB (especially cloud-streamed ones) can take 2–4 seconds per call, making Stage 1 slower than it saves.
Good filter model choices:
| Model | Size | Notes |
|---|---|---|
llama3.1:8b |
4.9 GB | Recommended default — fast, widely available, strong instruction following |
llama3.2:3b |
2.0 GB | Faster still, good for binary yes/no |
qwen3.5:latest |
6.6 GB | Strong instruction following |
gemma3:4b |
3.3 GB | Lightweight, accurate |
Avoid as filter models: models above ~6 GB — at that size, per-call latency often exceeds the API savings, especially over a network connection. Cloud-streamed models (*:cloud, *:1t) are never suitable.
Stage 1 requires true parallel GPU inference. OLLAMA_NUM_PARALLEL only helps if your GPU has enough VRAM to hold multiple simultaneous model instances. If the GPU is already near capacity running a single 8B model, Ollama will queue extra requests and concurrency has no effect. If Stage 1 isn't reducing your wall time, use --no-filter — Stage 2 alone typically finishes in ~30 minutes on a 50k-pair corpus.
If your Ollama server is remote (not localhost), point ollama_url at it:
p = PRISM(
ollama_url = "http://your-ollama-host:11434", # remote Ollama
embed_model = "qwen3-embedding:4b", # must match your ingest-time model
filter_model = "gemma4:latest", # fast model for Stage 1
...
)Or via CLI:
prism-build \
--ollama-url http://your-ollama-host:11434 \
--filter-model gemma4:latest \
...No suitable model? Skip Stage 1:
prism-build --no-filter ... # async Stage 2 only, ~30 min for 50k pairsIf no graph file exists (or graph loading fails), PRISM automatically falls back to pure vector search and still returns a valid EpistemicResult — just without epistemic bucketing. All chunks land in primary.
This means PRISM is a safe drop-in replacement for any standard vector retriever from day one.
PRISM(
# ── Storage ───────────────────────────────────────────────────
lancedb_path = "/path/to/lancedb",
graph_path = "/path/to/prism_graph.json.gz",
table_name = "knowledge", # LanceDB table name
# ── Embedding: Ollama (default) ───────────────────────────────
ollama_url = "http://localhost:11434",
embed_model = "nomic-embed-text",
# ── Embedding: API (set embed_api_key to activate) ────────────
embed_api_url = "https://api.openai.com/v1/embeddings",
embed_api_key = None, # set to switch from Ollama
# ── LLM for graph building ────────────────────────────────────
llm_base_url = "https://api.openai.com",
llm_model = "gpt-4o-mini",
llm_api_key = "sk-...",
min_confidence = 0.65, # edge confidence threshold
batch_size = 5, # pairs per LLM call
# ── Retrieval tuning ──────────────────────────────────────────
hops = 3, # spreading activation depth
decay = 0.7, # per-hop decay factor
seed_top_k = 20, # vector search seed count
convergence_weight = 0.4, # bonus weight for convergence
)prism/
├── prism/
│ ├── __init__.py public API
│ ├── prism.py PRISM — main entry point
│ ├── edges.py epistemic edge taxonomy + propagation weights
│ ├── graph.py EpistemicGraph (networkx MultiDiGraph + JSON serialisation)
│ ├── extractor.py LLM-based triple extraction (batch mode)
│ ├── activation.py SpreadingActivation engine + convergence scoring
│ ├── retriever.py PRISMRetriever — the 5-step pipeline
│ ├── result.py EpistemicResult + EpistemicChunk dataclasses
│ ├── cli.py prism-build CLI
│ └── adapters/
│ └── lancedb.py LanceDB adapter (Ollama + API embedding)
├── scripts/
│ └── build_graph.py standalone build script
├── examples/
│ └── governance_search.py
├── docs/
│ └── architecture.svg
└── pyproject.toml
PRISM is alpha software. These are known constraints you should understand before using it in production:
Graph quality depends on your corpus and LLM.
The epistemic graph is built by asking an LLM to classify relationships between chunk pairs. The LLM can misclassify — a chunk may be tagged supports when it actually only tangentially relates, or refutes when it merely offers a different framing. Graph quality is correlated with chunk quality, chunking strategy, and the capability of the extraction model. Review extracted edges before trusting them.
Confidence scores are not ground truth.
The confidence values returned by the LLM are self-reported estimates, not calibrated probabilities. They are used as a filter (default threshold: 0.65) and as edge weights, but should not be treated as precise measures of relationship strength.
Retrieval quality can degrade with noisy edges. If the graph contains many false-positive edges, spreading activation will propagate to irrelevant nodes. This produces worse results than pure vector search. Monitor your graph's edge yield rate and edge type distribution after building.
cross_source_only=True produces sparse graphs on multi-book corpora.
Setting cross_source_only=True skips all chunk pairs from the same source document. This seems conservative and clean, but in practice it misses the majority of valuable epistemic relationships. A large book like DMBOK (5,940 chunks) contains hundreds of intra-chapter supports, derives_from, and specializes relationships that are never extracted when same-source pairs are excluded.
In a real-world test on a 30k-chunk governance corpus (DMBOK + 2 governance books + erwin docs):
cross_source_only=True, k=8: 3,571 edges, ~51% of chunks unconnected — graph rarely fires on queriescross_source_only=False, k=8: 9,989 edges, same corpus — graph activates supporting/qualifying buckets on most queries
Recommendation: use cross_source_only=False for most corpora. Only use True if your sources are genuinely independent and you specifically want to suppress intra-document structure.
Build is slow and LLM-dependent. The one-time graph build requires thousands of LLM API calls and takes hours for large corpora. There is currently no incremental update — adding new documents requires a rebuild (or manual edge addition).
No evaluation benchmark yet.
We do not currently publish retrieval quality metrics comparing PRISM against standard RAG on standardised QA datasets. The benchmarks/ directory contains a structural demonstration only.
- Async LLM extraction (parallel API calls → 80× faster build)
- Two-stage pipeline: local pre-filter + async classification
- Checkpoint / resume for interrupted builds
- Incremental graph updates (add new docs without full rebuild)
- Additional vector store adapters (Chroma, Qdrant, Weaviate, pgvector)
- Graph visualisation (
prism-vizCLI — exports to Gephi / D3) - Export to Neo4j / NetworkX formats
- Collins, A.M. & Loftus, E.F. (1975). A spreading-activation theory of semantic processing. Psychological Review, 82(6), 407–428.
- Edge, D. et al. (2024). From Local to Global: A Graph RAG Approach to Query-Focused Summarization. Microsoft Research.
MIT