Research Question: Does graph-structured retrieval (GraphRAG) improve answer quality over standard vector-based RAG for network engineering knowledge domains?
Retrieval-Augmented Generation (RAG) systems typically chunk documents, embed them as vectors, and retrieve the top-k most similar chunks for a given query. This approach works well for factual lookups within a single document but struggles with:
- Multi-hop reasoning -- questions that require connecting information across multiple documents (e.g., "How does MPLS L3VPN use both BGP and LDP?")
- Structural relationships -- protocol dependencies, device hierarchies, and troubleshooting chains that are implicit in text but explicit in a graph
- Comparison queries -- questions requiring parallel retrieval of related but distinct concepts (e.g., "Compare OSPF and IS-IS")
GraphRAG augments standard retrieval with entity extraction, relationship graphs, community detection, and graph-aware search strategies. This project evaluates whether these additions measurably improve retrieval quality for networking domains.
+-------------------+
| Knowledge Corpus |
| (52 documents) |
+--------+----------+
|
+-------------+-------------+
| |
+--------v--------+ +--------v--------+
| Standard RAG | | Entity Extractor |
| (Chunk + TF-IDF)| | (Pattern-based) |
+--------+--------+ +--------+--------+
| |
| +--------v--------+
| | Graph Builder |
| | (networkx) |
| +--------+--------+
| |
| +--------v--------+
| | Community |
| | Detection |
| +--------+--------+
| |
| +--------v--------+
| | GraphRAG |
| | (4 strategies) |
| +--------+--------+
| |
+-------------+-------------+
|
+--------v----------+
| Evaluation Engine |
| (31 questions, |
| 3 metrics) |
+-------------------+
52 synthetic documents covering network engineering:
| Category | Count | Topics |
|---|---|---|
| RFC Summaries | 8 | BGP, OSPF, MPLS, VXLAN, IS-IS specifications |
| Vendor Guides | 5 | Cisco IOS, Junos, NX-OS configurations |
| Troubleshooting | 8 | BGP, OSPF, MPLS, VXLAN, MTU, convergence |
| Design Guides | 8 | Data center, WAN, campus, HA, SD-WAN, ZTA |
| Concept Overviews | 23 | QoS, STP, ACLs, DNS, NAT, automation, etc. |
Pattern-based extraction identifies 9 entity types:
- Protocols (BGP, OSPF, MPLS, VXLAN, ...)
- Devices (router, PE-router, VTEP, route-reflector, ...)
- Concepts (ECMP, VRF, traffic-engineering, ...)
- Standards (RFC-4271, RFC-7348, ...)
- Commands (show ip bgp summary, ...)
- Vendors (Cisco, Juniper, ...)
- IP Ranges, Ports, Metrics
Entity relationships are inferred through:
- Domain heuristics -- Known protocol dependencies (BGP depends-on TCP, VXLAN depends-on UDP)
- Device-protocol mapping -- PE-router uses MPLS, VTEP uses VXLAN
- Co-occurrence analysis -- Entities appearing together in 2+ documents
- Troubleshooting links -- MTU troubleshoots VXLAN/MPLS/IPsec
Standard RAG (baseline):
- Chunk documents into overlapping segments
- Build TF-IDF vectors (scikit-learn, no external APIs)
- Retrieve by cosine similarity
GraphRAG (4 strategies, merged):
- Entity-centric -- Find query entities in graph, retrieve their 1-hop neighborhood documents
- Community-based -- Match query to detected communities, return community summaries
- Subgraph retrieval -- BFS to depth 2 from query entities, return structural context
- Text fallback -- TF-IDF over full documents for coverage
Results from all four strategies are deduplicated and re-ranked.
31 questions across four query types, each with:
- Ground truth document IDs
- Expected entities
- Expected answer keywords
Metrics:
- Relevance -- Fraction of ground truth documents retrieved
- Completeness -- Fraction of expected answer keywords found in retrieved text
- Accuracy -- Fraction of expected entities found in retrieved text
- Combined -- Weighted average: 0.4 * relevance + 0.3 * completeness + 0.3 * accuracy
GraphRAG outperforms Standard RAG across most query types, with the largest gains on multi-hop and comparison queries where graph structure provides cross-document connectivity that pure text similarity misses.
| Query Type | Std RAG Combined | GraphRAG Combined | Advantage |
|---|---|---|---|
| Factual | Moderate | Moderate-High | Small |
| Multi-hop | Low-Moderate | Moderate-High | Large |
| Troubleshooting | Moderate | Moderate-High | Moderate |
| Comparison | Low-Moderate | Moderate | Moderate |
Key findings:
-
Multi-hop queries benefit most from graph structure. Entity-centric retrieval follows relationship edges to pull context from multiple documents that standard vector search treats independently.
-
Troubleshooting queries gain from troubleshooting relationship edges. The graph explicitly encodes diagnostic relationships (MTU troubleshoots VXLAN, convergence relates to BFD) that text similarity may miss.
-
Factual queries show the smallest improvement. Direct lookups are well-served by text similarity; the graph adds marginal value.
-
Community detection helps comparison queries. Related protocols cluster into communities, making it easier to retrieve parallel context for "compare X and Y" questions.
Run graphrag-net evaluate to see detailed results with your Python environment.
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"# Build the knowledge graph and indices
graphrag-net build
# Query both systems
graphrag-net query "What port does BGP use?"
graphrag-net query "How does MPLS L3VPN use BGP and LDP together?"
# Run the full evaluation benchmark
graphrag-net evaluate
graphrag-net evaluate --json-output# Lint
ruff check src/ tests/
# Test
pytest -v
# Build
pip install build
python -m buildgraphrag-network-knowledge/
src/graphrag_network_knowledge/
__init__.py # Package metadata
corpus.py # 52 synthetic network documents
entities.py # Entity extraction (9 types)
graph.py # Knowledge graph construction
standard_rag.py # TF-IDF baseline RAG
graphrag.py # Graph-enhanced RAG (4 strategies)
evaluation.py # 31-question benchmark
cli.py # Click CLI
tests/
test_corpus.py # Corpus integrity tests
test_entities.py # Entity extraction tests
test_graph.py # Graph construction tests
test_standard_rag.py # Standard RAG tests
test_graphrag.py # GraphRAG tests
test_evaluation.py # Evaluation framework tests
test_cli.py # CLI integration tests
pyproject.toml # Hatchling build config
README.md # This file
LICENSE # MIT
.github/workflows/ci.yml
MIT License. Copyright 2026 Corey Wade.