A conceptual knowledge graph engine for document systems.
Bridges flat filesystem search with semantic relationship discovery.
Inspired by arXiv:2512.05470 — Everything is Context: Agentic File System Abstraction for Context Engineering
Traditional documentation systems treat documents as isolated files in a flat directory. While keyword search works for single-topic queries, it fails for cross-cutting questions:
Example: "How does a subscription renewal create a charge and trigger a webhook?"
- ❌ Flat search: finds 3 isolated files, no connections
- ✓ Graph: discovers the causal chain → Subscription → Invoice → Charge → Event
This engine layers a semantic knowledge graph on top of the flat filesystem, enabling agents to navigate by concepts rather than folders.
┌────────────────────────────────────────────────────────────┐
│ HETEROGENEOUS CONTEXT SOURCES │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Docs (flat) │ │ Knowledge │ │ External │ │
│ │ filesystem │ │ Graph (graph)│ │ APIs │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└────────┬──────────────────────────────────────┬────────────┘
│ │
↓ [Resolver Adapters - Pluggable] │
┌────────────────────────────────────────────────────────────┐
│ SYSTEMFS — VIRTUAL FILE SYSTEM (Unified Hierarchy) │
│ ┌──────────────────────────────────────────────────┐ │
│ │ /docs/ [DocsResolver] │ │
│ │ /graph/nodes/ [GraphResolver] │ │
│ │ /graph/edges/ [GraphResolver] │ │
│ │ /context/memory/ [MemoryResolver] │ │
│ │ /context/history/ [History Layer] │ │
│ │ /context/scratchpad/ [Ephemeral] │ │
│ │ /modules/ [ModuleResolver] │ │
│ └──────────────────────────────────────────────────┘ │
└────────┬──────────────────────────────────────────────────┘
│
↓ [Persistent Context Layers]
┌─────────────────────────────────────┐
│ History: append-only JSONL log │
│ Memory: fact/episodic/proc JSON │
│ Scratchpad: ephemeral workspace │
└─────────────────────────────────────┘
│
↓ [Context Engineering Pipeline]
┌──────────────────────────────────────┐
│ Constructor: query, rank, select │
│ Updater: inject into LLM window │
│ Evaluator: validate → write memory │
└──────────────────────────────────────┘
│
↓ [Agent Tools]
┌──────────────────────────────────────┐
│ Traditional: navigate, inspect, ... │
│ VFS-native: vfs_read, vfs_list, .. │
└──────────────────────────────────────┘
Five semantic node types extracted from documentation:
| Type | Description | Example |
|---|---|---|
| Document | Source markdown file | Create A Charge, Webhooks |
| APIObject | Domain entity extracted from docs | Charge, Customer, Invoice |
| Endpoint | REST endpoint | POST /v1/charges, GET /v1/customers/:id |
| Event | Webhook/system event | charge.succeeded, invoice.paid |
| Concept | Abstract concept spanning multiple docs | Authentication, Pagination, Idempotency |
Seven directed relationship types model how concepts connect:
| Edge Type | Semantics | Example |
|---|---|---|
contains |
Document → entity (hierarchical) | Document "Create A Charge" contains Endpoint POST /v1/charges |
requires |
APIObject → APIObject (dependency) | Charge requires Customer |
returns |
Endpoint → APIObject (output) | POST /v1/charges returns Charge |
triggers |
APIObject → Event (causality) | Charge triggers charge.succeeded event |
belongs_to |
APIObject → APIObject (ownership) | PaymentMethod belongs_to Customer |
generates |
APIObject → APIObject (production) | Subscription generates Invoice |
references |
Cross-document generic link | Generic mention/relationship |
Each node carries metadata:
type— one of the five NodeType enumssource_file— which markdown file it came fromdescription— extracted or provided textproperties— arbitrary key-value attributes
Each edge carries:
edge_type— one of the seven relationship typesweight— float (1.0 default, increases with co-occurrence)source_file— which document contributed this relationship
Raw markdown files in docs/ contain:
- Headers (entity names)
- REST endpoint patterns
- Backtick-quoted event names
- Bold-referenced entities
- Explicit "Relationships" sections
Sample section:
## Relationships
- A **Charge** requires a **Customer**
- A **Charge** requires a **PaymentMethod**
- On success, triggers `charge.succeeded` eventHeuristic Extractor (kgraph/extractor_heuristic.py):
- Regex patterns for REST endpoints:
(GET|POST|PUT|DELETE) /v1/\w+ - Bold entity detection:
\*\*(\w+)\*\* - Backtick events:
`([\w.]+)` - Relationship line parsing:
**X** requires **Y** - Pros: Zero dependencies, instant, works without API key
- Cons: Misses implicit relationships, lower semantic accuracy
LLM Extractor (kgraph/extractor_llm.py):
- Sends document + structured prompt to OpenRouter (Claude, GPT-4, etc.)
- Returns JSON:
{nodes: [{name, type, description}], edges: [{source, target, edge_type}]} - Falls back to heuristic on API error
- Pros: Understands semantics, captures implicit relationships
- Cons: API costs, latency, requires credentials
Both extractors return ExtractionResult (Pydantic model) with consistent interface.
kgraph/builder.py orchestrates:
1. Walk docs/ directory
2. For each .md file:
a. Extract using heuristic OR LLM
b. Collect nodes and edges
3. Merge into single NetworkX DiGraph:
a. Normalize node names (title case)
b. Deduplicate nodes (merge by name)
c. Strengthen edges (co-occurrence → weight boost)
4. Serialize to data/graph.json using node_link_data()Graph stats from sample docs:
Total nodes: 59
Document: 17
Endpoint: 13
Event: 20
APIObject: 8
Concept: 1
Total edges: 167
contains: 51
references: 99
returns: 13
requires: 1
generates: 1
belongs_to: 2
Thin NetworkX wrapper with semantic operations:
Mutation:
add_node(NodeModel)— insert/merge nodes by nameadd_edge(EdgeModel)— insert/strengthen edges
Query:
get_node(name)→ full node dataget_neighbors(name, edge_type=None, direction='both')→ list of {node, edge_type, weight}find_path(source, target)→ shortest undirected path with edge typessubgraph(center, depth)→ BFS neighborhood within depth hopsnodes_by_type(NodeType)→ all nodes of a given typesearch(query)→ fuzzy name match across all nodes
Serialization:
to_json()→ NetworkX node_link_data format (JSON-serializable dict)save(path)→ write to fileload(path)→ class method, deserialize from file
Normalization:
- Node names normalized to title case:
"charge"→"Charge" - Bidirectional edges:
find_path()uses undirected graph for discovery - Weighted edges: repeated relationships boost weight (up to 5.0)
High-level semantic query layer on top of ConceptGraph:
# Get all related entities
related = gq.related_concepts("Charge")
# → {entity, type, description, relationships: [{node, edge_type, direction, weight}]}
# Find how two concepts connect
path = gq.find_connection("Subscription", "Customer")
# → {source, target, path_length, path: [{node, edge_to_next}]}
# Full node context
details = gq.explain_node("Invoice")
# → {entity, type, description, outgoing_relationships, incoming_relationships}
# List all of a type
objects = gq.concepts_by_type("APIObject")
# → {type, count, entities: [{name, description, source_file}]}
# Fuzzy search
results = gq.search_graph("payment")
# → {query, count, results: [{name, type, description, score}]}
# Local neighborhood for context windows
sub = gq.subgraph_context("Charge", depth=2)
# → {center, depth, stats, nodes}Wraps graph + filesystem as LLM tools. Implements the "filesystem-metaphor" pattern from arXiv:2512.05470:
Tools (file-system-like commands):
| Tool | Semantics | Example |
|---|---|---|
navigate(concept) |
cd + ls — explore neighbors |
navigate("Charge") → related entities |
inspect(concept) |
cat — full details |
inspect("Customer") → all relationships |
connect(a, b) |
find — discover paths |
connect("Subscription", "Event") → path |
search_graph(query) |
Graph search | search_graph("payment") → fuzzy matches |
search_files(query) |
Flat keyword search | search_files("idempotency") → file list |
read(file_path) |
cat file — doc content |
read("charges/create-charge.md") → markdown |
Tool Calling Loop:
User question
↓
LLM sees tools + question
↓
LLM calls tools in sequence (e.g., navigate → connect → read)
↓
Collect results, record traversal log
↓
Send results back to LLM
↓
LLM reasons and composes answer
↓
Return: {answer, traversal_log, sources}
Traversal Log Example:
{
"answer": "When a Subscription cycles, it generates an Invoice...",
"traversal_log": [
{"tool": "navigate", "args": {"concept": "Subscription"}, "result_summary": "3 relationships found"},
{"tool": "connect", "args": {"source": "Subscription", "target": "Event"}, "result_summary": "Subscription → Invoice → Charge → Event"},
{"tool": "read", "args": {"file_path": "subscriptions/create-subscription.md"}, "result_summary": "read file (1200 chars)"}
],
"sources": ["subscriptions/create-subscription.md", "invoices/pay-invoice.md"]
}SystemFS projects all heterogeneous context sources—documentation, knowledge graph, memory, external APIs—into a unified hierarchical directory structure. This enables agents to navigate, query, and persist knowledge using standard file operations (read, write, list, search, exec) across disparate backends.
Key design principle: Adapters (resolvers) translate complex external data structures into standard VFS nodes without modifying underlying sources.
Each resolver implements a pluggable adapter that maps a data source into the VFS hierarchy:
class BaseResolver(ABC):
name: str # resolver identifier
readonly: bool # read-only or writable
read(path: str) → VFSResult # fetch node content
write(path, content, metadata) # persist (if writable)
list(path: str) → VFSResult # list children
search(query, path, max_results) # keyword search
exec(path, args) → VFSResult # special operationsMounted Resolvers:
| Mount | Resolver | Backend | Writable | Purpose |
|---|---|---|---|---|
/docs/ |
DocsResolver | kgraph.filesystem.FileSystem | No | Markdown documentation |
/graph/ |
GraphResolver | ConceptGraph + GraphQuery | No | Knowledge graph nodes/edges/stats |
/context/memory/ |
MemoryResolver | JSON files (fact/episodic/procedural) | Yes | Persistent memory entries |
/modules/ |
ModuleResolver | Pluggable handlers | No | External API mounts |
Resolver dispatch: Longest-prefix match determines which resolver handles a path. E.g., /docs/charges/create.md → DocsResolver, /graph/nodes/Charge → GraphResolver.
Addressing the statelessness problem of LLMs via three-layer storage built into the filesystem:
Append-only JSON-lines log of all interactions and reasoning steps.
- Format: One file per session:
data/history/{date}_{session_id}.jsonl - Entry model:
HistoryEntry(timestamp, event_type, actor, path, data, session_id) - Events logged:
read,write,search,context_injection,tool_call, state transitions - Query API:
query_history(session_id=None, event_type=None, limit=100)— filtered retrieval
Use case: Audit trail, session replay, reasoning reconstruction.
Indexed structured knowledge in three categories.
Structure:
data/memory/
├── fact/ # Verified facts (high confidence)
├── episodic/ # Interaction memories (events, outcomes)
└── procedural/ # Procedural knowledge (how-to patterns)
Entry model: MemoryEntry(key, content, memory_type, confidence, source_paths, access_count, tags)
Write API: resolver.write("/context/memory/fact/key", content, metadata={confidence, source_paths, tags})
Query API: resolver.list("/context/memory/fact/"), search(query, "/context/memory/")
Use case: Long-term learning, fact persistence, procedural replay.
Ephemeral workspace for intermediate computations during active reasoning.
- Writable directory
- Automatically cleared between sessions
- Used by agents for drafting, exploration, temporary results
Solves the token-window constraint via dynamic context selection and injection:
Queries the VFS, scores artifacts by composite metrics, selects within a token budget.
constructor = ContextConstructor(vfs, max_tokens=8000)
# Build context manifest for a query
manifest = constructor.build_context("How do subscriptions work?")
# → ContextManifest(selected_paths=[...], freshness_scores={...},
# similarity_scores={...}, trust_scores={...})
# Materialize as a string for LLM injection
context_str = constructor.materialize(manifest)
# → markdown text with headers, ~8000 tokensScoring metrics:
- Freshness: nodes updated recently score higher (decay over 24h)
- Similarity: keyword overlap with query (TF-IDF approximation)
- Trust: provenance confidence; docs default 0.8, memory 0.6
Token estimation: ~4 characters per token (simple heuristic).
Injects materialized context into LLM message window and logs injections.
updater = ContextUpdater(history_layer)
# Inject context as a system message
messages = updater.inject_context(messages, manifest, context_str)
# Or rebuild context for a new query
messages, manifest = updater.refresh_context(messages, query, constructor)Behavior:
- Inserts or replaces a system-role context message
- Logs injection as a history event with source metadata
- Maintains coherence across multiple context refreshes
Closes the loop by validating model outputs against VFS sources.
evaluator = ContextEvaluator(vfs)
# Extract claims from output, score confidence against sources
memory_entries = evaluator.evaluate(llm_output, manifest)
# → MemoryEntry objects written to /context/memory/Workflow:
- Extract factual claims from LLM output (sentence-level)
- Score confidence via keyword overlap against source material
- High confidence (>0.7): write as
factentry (verified) - Low confidence (<0.7): write as
episodicentry withneeds_reviewtag - Return created
MemoryEntryobjects for auditing
| Tool | Purpose |
|---|---|
navigate(concept) |
Get related concepts from graph |
inspect(concept) |
Full node details and relationships |
connect(source, target) |
Find shortest path |
search_graph(query) |
Semantic graph search |
search_files(query) |
Flat keyword search |
read(file_path) |
Read doc content |
| Tool | Purpose | Example |
|---|---|---|
vfs_read(path) |
Read any VFS node | vfs_read("/graph/nodes/Charge") → JSON details |
vfs_list(path) |
List VFS directory | vfs_list("/context/memory/fact/") → fact entries |
vfs_search(query, path) |
Search within scope | vfs_search("payment", "/docs/") → matching docs |
vfs_write(path, content) |
Write to memory/scratchpad | vfs_write("/context/memory/fact/charge-rule", "...") |
System prompt now describes the VFS hierarchy, enabling agents to navigate proactively.
kgraph/schema.py:
NodeTypeenum: Document, APIObject, Endpoint, Event, ConceptEdgeTypeenum: contains, requires, returns, triggers, belongs_to, generates, referencesNodeModel,EdgeModel,ExtractionResultPydantic classes for type safety
kgraph/extractor_heuristic.py:
- Regex patterns for common structures
- Parses markdown headings, endpoints, bold refs, backtick events
- Returns ExtractionResult with nodes and edges
kgraph/extractor_llm.py:
- OpenRouter-compatible OpenAI SDK client
- Structured prompt requesting JSON output
- Fallback to heuristic on error
- Configurable model (default: anthropic/claude-sonnet-4-5)
- Node names:
.title()case (e.g., "charge" → "Charge") - Document nodes: extracted from markdown filename or H1 heading
- Deduplication: nodes merged by normalized name
- Edge strengthening: repeated relationships increase weight
Graph stored as NetworkX node_link_data() JSON:
{
"directed": true,
"multigraph": false,
"graph": {},
"nodes": [
{"id": "Charge", "type": "APIObject", "description": "...", "source_file": "..."},
...
],
"links": [
{"source": "Charge", "target": "Customer", "edge_type": "requires", "weight": 1.0},
...
]
}Handles flat document operations:
fs = FileSystem(Path("docs"))
# Recursive file listing
files = fs.list_files() # → [{path, name, title}]
# Keyword search with snippets
results = fs.search("payment method")
# → [{path, title, match_count, snippets: [context_snippets]}]
# Read file content
content = fs.read_file("charges/create-charge.md") # → markdown stringUsed by:
- Agent
search_filestool (for flat search fallback) - Agent
readtool (to fetch source documents)
Single entry point for all graph, VFS, and context operations:
# Build graph from docs
python run.py build [--no-llm]
# Query graph
python run.py query <concept> # related concepts
python run.py connect <source> <target> # shortest path
python run.py nodes <type> # list by type
python run.py search <query> # keyword search
python run.py stats # graph statistics
# Export
python run.py export # graph JSON to stdout# Explore the unified hierarchy
python run.py vfs mounts # list mounted resolvers
python run.py vfs list / # browse hierarchy
python run.py vfs list /docs/ # list docs
python run.py vfs list /graph/nodes/ # list graph nodes
python run.py vfs list /context/memory/ # list memory entries
# Read content
python run.py vfs read /docs/authentication.md
python run.py vfs read /graph/nodes/Charge
python run.py vfs read /context/memory/fact/my-key
# Search
python run.py vfs search "payment" # global search
python run.py vfs search "auth" /docs/ # scoped to docs# History
python run.py context history # recent interactions
python run.py context history --limit 50 # custom limit
# Memory
python run.py context memory # all entries
python run.py context memory fact # verified facts only
python run.py context memory episodic # memories only
python run.py context memory procedural # procedures only# LLM agent with full VFS/context access
python run.py chat # requires OPENROUTER_API_KEY86 unit tests covering:
- Node add/get/normalization
- Neighbor queries (in/out/both directions)
- Path finding and subgraph extraction
- Edge strengthening
- Serialization roundtrip
- Heuristic/LLM extraction
- Query interface (related concepts, type listing, fuzzy search)
Core VFS (tests/test_vfs.py):
- Mount/unmount, routing, root listing
- Longest-prefix resolver dispatch
- Read-only enforcement
- Writable resolver support
- Cross-resolver search merging
Sandbox (tests/test_sandbox.py):
- Path normalization, traversal prevention
- Boundary validation, relative path computation
DocsResolver (tests/test_resolver_docs.py):
- Read/list/search on markdown files
- Missing file errors, write rejection
GraphResolver (tests/test_resolver_graph.py):
- Node/edge/stats reads
- Virtual path layout
- exec() operations (find_path, subgraph, related_concepts)
MemoryResolver (tests/test_resolver_memory.py):
- Write/read roundtrip for all memory types
- Persistence across instances
- Access count tracking, search, type filtering
History Layer (tests/test_context_history.py):
- Session startup, JSON-lines format
- Log entry, query by event type/session ID
- Session isolation
Context Constructor (tests/test_context_constructor.py):
- Query ranking by freshness/similarity/trust
- Token budget enforcement
- Hint inclusion, materialization
pytest tests/ -v # all 86 tests
pytest tests/test_vfs.py -v # VFS only
pytest tests/test_*.py -v # specific suiteIncluded: 18 markdown files modeling a Stripe-like payment API:
- Core objects: Charge, Customer, PaymentMethod, Invoice, Subscription
- Operations: create, retrieve, update, list, attach, detach, pay, cancel
- Events & lifecycle: charge.succeeded, invoice.paid, subscription.deleted, etc.
- Cross-cutting: Authentication, Errors, Pagination, Webhooks
Files are structured with explicit "Relationships" sections to maximize extraction accuracy.
networkx>=3.0 # Graph data structure
openai>=1.0.0 # OpenRouter API client (OpenAI-compatible)
pydantic>=2.0.0 # Type validation
python-dotenv>=1.0.0 # Environment config
pytest>=7.0.0 # Testing
NetworkX over Neo4j:
- Zero infrastructure overhead
- In-memory for fast queries
- Serializes to standard JSON
- Sufficient for this scale (59 nodes, 167 edges)
OpenRouter for LLM:
- OpenAI-compatible API (swap models easily)
- No vendor lock-in
- User's choice of model/provider
Heuristic fallback:
- Works without credentials
- Fast for simple docs
- Graceful degradation
File-system metaphor (tools):
- Intuitive for LLMs:
navigate,inspect,read - Aligns with arXiv:2512.05470 design
- Reduces cognitive overhead for reasoning
Bidirectional traversal:
- Answers "what points to X?" (incoming edges)
- Answers "what does X point to?" (outgoing edges)
- Undirected pathfinding for discovery
The Virtual File System answers the core challenge from arXiv:2512.05470: how to project all heterogeneous context into a unified, programmable abstraction that doesn't require agents to learn multiple APIs.
Key innovations:
- Unified hierarchy: All context (docs, graph, memory, APIs) at
/with standard file semantics - Pluggable adapters: Resolvers add new data sources without touching existing code
- Persistent layers: History for auditability, memory for learning, scratchpad for workspace
- Context engineering: Automatic selection & injection with quality metrics (freshness, similarity, trust)
- Closed feedback loop: Evaluation validates outputs and routes high-confidence findings back to memory
This is the first implementation of a context-centric agent architecture where knowledge flows bidirectionally: agents read context, reason, write findings back to persistent memory.
- ModuleResolver handlers: Integrate external APIs (GitHub, Jira, web search) as mounted directories
- Confidence propagation: Surface extraction confidence through the VFS
- Context refresh heuristics: Automatic re-querying when scratchpad changes
- Memory decay: Temporal weighting in constructor (recent facts score higher)
- Vector embeddings: Semantic similarity for fuzzy entity linking and memory retrieval
- Multi-hop reasoning: Agent plans multi-step journeys through graph and memory
- Human-in-the-loop: Review loop for
needs_reviewtagged outputs before memory commit - Temporal versioning: Track how concepts evolve across doc versions
- Distributed memory: Shared memory pool across multiple agents
- Collaborative refinement: Agents update each other's memory; consensus scoring
- Knowledge compilation: Compress repeated reasoning patterns into new nodes/edges
- Adaptive extraction: Extraction weights learned from evaluator feedback