GraphRAG over Network Knowledge Base

Research Question: Does graph-structured retrieval (GraphRAG) improve answer quality over standard vector-based RAG for network engineering knowledge domains?

Problem Statement

Retrieval-Augmented Generation (RAG) systems typically chunk documents, embed them as vectors, and retrieve the top-k most similar chunks for a given query. This approach works well for factual lookups within a single document but struggles with:

Multi-hop reasoning -- questions that require connecting information across multiple documents (e.g., "How does MPLS L3VPN use both BGP and LDP?")
Structural relationships -- protocol dependencies, device hierarchies, and troubleshooting chains that are implicit in text but explicit in a graph
Comparison queries -- questions requiring parallel retrieval of related but distinct concepts (e.g., "Compare OSPF and IS-IS")

GraphRAG augments standard retrieval with entity extraction, relationship graphs, community detection, and graph-aware search strategies. This project evaluates whether these additions measurably improve retrieval quality for networking domains.

Architecture

                         +-------------------+
                         |  Knowledge Corpus |
                         |  (52 documents)   |
                         +--------+----------+
                                  |
                    +-------------+-------------+
                    |                           |
           +--------v--------+        +--------v--------+
           | Standard RAG    |        | Entity Extractor |
           | (Chunk + TF-IDF)|        | (Pattern-based)  |
           +--------+--------+        +--------+--------+
                    |                           |
                    |                  +--------v--------+
                    |                  | Graph Builder    |
                    |                  | (networkx)       |
                    |                  +--------+--------+
                    |                           |
                    |                  +--------v--------+
                    |                  | Community        |
                    |                  | Detection        |
                    |                  +--------+--------+
                    |                           |
                    |                  +--------v--------+
                    |                  | GraphRAG         |
                    |                  | (4 strategies)   |
                    |                  +--------+--------+
                    |                           |
                    +-------------+-------------+
                                  |
                         +--------v----------+
                         | Evaluation Engine |
                         | (31 questions,    |
                         |  3 metrics)       |
                         +-------------------+

Methodology

1. Knowledge Corpus

52 synthetic documents covering network engineering:

Category	Count	Topics
RFC Summaries	8	BGP, OSPF, MPLS, VXLAN, IS-IS specifications
Vendor Guides	5	Cisco IOS, Junos, NX-OS configurations
Troubleshooting	8	BGP, OSPF, MPLS, VXLAN, MTU, convergence
Design Guides	8	Data center, WAN, campus, HA, SD-WAN, ZTA
Concept Overviews	23	QoS, STP, ACLs, DNS, NAT, automation, etc.

2. Entity Extraction

Pattern-based extraction identifies 9 entity types:

Protocols (BGP, OSPF, MPLS, VXLAN, ...)
Devices (router, PE-router, VTEP, route-reflector, ...)
Concepts (ECMP, VRF, traffic-engineering, ...)
Standards (RFC-4271, RFC-7348, ...)
Commands (show ip bgp summary, ...)
Vendors (Cisco, Juniper, ...)
IP Ranges, Ports, Metrics

3. Graph Construction

Entity relationships are inferred through:

Domain heuristics -- Known protocol dependencies (BGP depends-on TCP, VXLAN depends-on UDP)
Device-protocol mapping -- PE-router uses MPLS, VTEP uses VXLAN
Co-occurrence analysis -- Entities appearing together in 2+ documents
Troubleshooting links -- MTU troubleshoots VXLAN/MPLS/IPsec

4. Retrieval Strategies

Standard RAG (baseline):

Chunk documents into overlapping segments
Build TF-IDF vectors (scikit-learn, no external APIs)
Retrieve by cosine similarity

GraphRAG (4 strategies, merged):

Entity-centric -- Find query entities in graph, retrieve their 1-hop neighborhood documents
Community-based -- Match query to detected communities, return community summaries
Subgraph retrieval -- BFS to depth 2 from query entities, return structural context
Text fallback -- TF-IDF over full documents for coverage

Results from all four strategies are deduplicated and re-ranked.

5. Evaluation

31 questions across four query types, each with:

Ground truth document IDs
Expected entities
Expected answer keywords

Metrics:

Relevance -- Fraction of ground truth documents retrieved
Completeness -- Fraction of expected answer keywords found in retrieved text
Accuracy -- Fraction of expected entities found in retrieved text
Combined -- Weighted average: 0.4 * relevance + 0.3 * completeness + 0.3 * accuracy

Results Summary

GraphRAG outperforms Standard RAG across most query types, with the largest gains on multi-hop and comparison queries where graph structure provides cross-document connectivity that pure text similarity misses.

Query Type	Std RAG Combined	GraphRAG Combined	Advantage
Factual	Moderate	Moderate-High	Small
Multi-hop	Low-Moderate	Moderate-High	Large
Troubleshooting	Moderate	Moderate-High	Moderate
Comparison	Low-Moderate	Moderate	Moderate

Key findings:

Multi-hop queries benefit most from graph structure. Entity-centric retrieval follows relationship edges to pull context from multiple documents that standard vector search treats independently.
Troubleshooting queries gain from troubleshooting relationship edges. The graph explicitly encodes diagnostic relationships (MTU troubleshoots VXLAN, convergence relates to BFD) that text similarity may miss.
Factual queries show the smallest improvement. Direct lookups are well-served by text similarity; the graph adds marginal value.
Community detection helps comparison queries. Related protocols cluster into communities, making it easier to retrieve parallel context for "compare X and Y" questions.

Run graphrag-net evaluate to see detailed results with your Python environment.

Installation

python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Usage

# Build the knowledge graph and indices
graphrag-net build

# Query both systems
graphrag-net query "What port does BGP use?"
graphrag-net query "How does MPLS L3VPN use BGP and LDP together?"

# Run the full evaluation benchmark
graphrag-net evaluate
graphrag-net evaluate --json-output

Development

# Lint
ruff check src/ tests/

# Test
pytest -v

# Build
pip install build
python -m build

Project Structure

graphrag-network-knowledge/
  src/graphrag_network_knowledge/
    __init__.py          # Package metadata
    corpus.py            # 52 synthetic network documents
    entities.py          # Entity extraction (9 types)
    graph.py             # Knowledge graph construction
    standard_rag.py      # TF-IDF baseline RAG
    graphrag.py          # Graph-enhanced RAG (4 strategies)
    evaluation.py        # 31-question benchmark
    cli.py               # Click CLI
  tests/
    test_corpus.py       # Corpus integrity tests
    test_entities.py     # Entity extraction tests
    test_graph.py        # Graph construction tests
    test_standard_rag.py # Standard RAG tests
    test_graphrag.py     # GraphRAG tests
    test_evaluation.py   # Evaluation framework tests
    test_cli.py          # CLI integration tests
  pyproject.toml         # Hatchling build config
  README.md              # This file
  LICENSE                # MIT
  .github/workflows/ci.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GraphRAG over Network Knowledge Base

Problem Statement

Architecture

Methodology

1. Knowledge Corpus

2. Entity Extraction

3. Graph Construction

4. Retrieval Strategies

5. Evaluation

Results Summary

Installation

Usage

Development

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
src/graphrag_network_knowledge		src/graphrag_network_knowledge
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

GraphRAG over Network Knowledge Base

Problem Statement

Architecture

Methodology

1. Knowledge Corpus

2. Entity Extraction

3. Graph Construction

4. Retrieval Strategies

5. Evaluation

Results Summary

Installation

Usage

Development

Project Structure

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages