Skip to content

cwccie/graphrag-network-knowledge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GraphRAG over Network Knowledge Base

Research Question: Does graph-structured retrieval (GraphRAG) improve answer quality over standard vector-based RAG for network engineering knowledge domains?

Problem Statement

Retrieval-Augmented Generation (RAG) systems typically chunk documents, embed them as vectors, and retrieve the top-k most similar chunks for a given query. This approach works well for factual lookups within a single document but struggles with:

  • Multi-hop reasoning -- questions that require connecting information across multiple documents (e.g., "How does MPLS L3VPN use both BGP and LDP?")
  • Structural relationships -- protocol dependencies, device hierarchies, and troubleshooting chains that are implicit in text but explicit in a graph
  • Comparison queries -- questions requiring parallel retrieval of related but distinct concepts (e.g., "Compare OSPF and IS-IS")

GraphRAG augments standard retrieval with entity extraction, relationship graphs, community detection, and graph-aware search strategies. This project evaluates whether these additions measurably improve retrieval quality for networking domains.

Architecture

                         +-------------------+
                         |  Knowledge Corpus |
                         |  (52 documents)   |
                         +--------+----------+
                                  |
                    +-------------+-------------+
                    |                           |
           +--------v--------+        +--------v--------+
           | Standard RAG    |        | Entity Extractor |
           | (Chunk + TF-IDF)|        | (Pattern-based)  |
           +--------+--------+        +--------+--------+
                    |                           |
                    |                  +--------v--------+
                    |                  | Graph Builder    |
                    |                  | (networkx)       |
                    |                  +--------+--------+
                    |                           |
                    |                  +--------v--------+
                    |                  | Community        |
                    |                  | Detection        |
                    |                  +--------+--------+
                    |                           |
                    |                  +--------v--------+
                    |                  | GraphRAG         |
                    |                  | (4 strategies)   |
                    |                  +--------+--------+
                    |                           |
                    +-------------+-------------+
                                  |
                         +--------v----------+
                         | Evaluation Engine |
                         | (31 questions,    |
                         |  3 metrics)       |
                         +-------------------+

Methodology

1. Knowledge Corpus

52 synthetic documents covering network engineering:

Category Count Topics
RFC Summaries 8 BGP, OSPF, MPLS, VXLAN, IS-IS specifications
Vendor Guides 5 Cisco IOS, Junos, NX-OS configurations
Troubleshooting 8 BGP, OSPF, MPLS, VXLAN, MTU, convergence
Design Guides 8 Data center, WAN, campus, HA, SD-WAN, ZTA
Concept Overviews 23 QoS, STP, ACLs, DNS, NAT, automation, etc.

2. Entity Extraction

Pattern-based extraction identifies 9 entity types:

  • Protocols (BGP, OSPF, MPLS, VXLAN, ...)
  • Devices (router, PE-router, VTEP, route-reflector, ...)
  • Concepts (ECMP, VRF, traffic-engineering, ...)
  • Standards (RFC-4271, RFC-7348, ...)
  • Commands (show ip bgp summary, ...)
  • Vendors (Cisco, Juniper, ...)
  • IP Ranges, Ports, Metrics

3. Graph Construction

Entity relationships are inferred through:

  • Domain heuristics -- Known protocol dependencies (BGP depends-on TCP, VXLAN depends-on UDP)
  • Device-protocol mapping -- PE-router uses MPLS, VTEP uses VXLAN
  • Co-occurrence analysis -- Entities appearing together in 2+ documents
  • Troubleshooting links -- MTU troubleshoots VXLAN/MPLS/IPsec

4. Retrieval Strategies

Standard RAG (baseline):

  • Chunk documents into overlapping segments
  • Build TF-IDF vectors (scikit-learn, no external APIs)
  • Retrieve by cosine similarity

GraphRAG (4 strategies, merged):

  1. Entity-centric -- Find query entities in graph, retrieve their 1-hop neighborhood documents
  2. Community-based -- Match query to detected communities, return community summaries
  3. Subgraph retrieval -- BFS to depth 2 from query entities, return structural context
  4. Text fallback -- TF-IDF over full documents for coverage

Results from all four strategies are deduplicated and re-ranked.

5. Evaluation

31 questions across four query types, each with:

  • Ground truth document IDs
  • Expected entities
  • Expected answer keywords

Metrics:

  • Relevance -- Fraction of ground truth documents retrieved
  • Completeness -- Fraction of expected answer keywords found in retrieved text
  • Accuracy -- Fraction of expected entities found in retrieved text
  • Combined -- Weighted average: 0.4 * relevance + 0.3 * completeness + 0.3 * accuracy

Results Summary

GraphRAG outperforms Standard RAG across most query types, with the largest gains on multi-hop and comparison queries where graph structure provides cross-document connectivity that pure text similarity misses.

Query Type Std RAG Combined GraphRAG Combined Advantage
Factual Moderate Moderate-High Small
Multi-hop Low-Moderate Moderate-High Large
Troubleshooting Moderate Moderate-High Moderate
Comparison Low-Moderate Moderate Moderate

Key findings:

  1. Multi-hop queries benefit most from graph structure. Entity-centric retrieval follows relationship edges to pull context from multiple documents that standard vector search treats independently.

  2. Troubleshooting queries gain from troubleshooting relationship edges. The graph explicitly encodes diagnostic relationships (MTU troubleshoots VXLAN, convergence relates to BFD) that text similarity may miss.

  3. Factual queries show the smallest improvement. Direct lookups are well-served by text similarity; the graph adds marginal value.

  4. Community detection helps comparison queries. Related protocols cluster into communities, making it easier to retrieve parallel context for "compare X and Y" questions.

Run graphrag-net evaluate to see detailed results with your Python environment.

Installation

python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Usage

# Build the knowledge graph and indices
graphrag-net build

# Query both systems
graphrag-net query "What port does BGP use?"
graphrag-net query "How does MPLS L3VPN use BGP and LDP together?"

# Run the full evaluation benchmark
graphrag-net evaluate
graphrag-net evaluate --json-output

Development

# Lint
ruff check src/ tests/

# Test
pytest -v

# Build
pip install build
python -m build

Project Structure

graphrag-network-knowledge/
  src/graphrag_network_knowledge/
    __init__.py          # Package metadata
    corpus.py            # 52 synthetic network documents
    entities.py          # Entity extraction (9 types)
    graph.py             # Knowledge graph construction
    standard_rag.py      # TF-IDF baseline RAG
    graphrag.py          # Graph-enhanced RAG (4 strategies)
    evaluation.py        # 31-question benchmark
    cli.py               # Click CLI
  tests/
    test_corpus.py       # Corpus integrity tests
    test_entities.py     # Entity extraction tests
    test_graph.py        # Graph construction tests
    test_standard_rag.py # Standard RAG tests
    test_graphrag.py     # GraphRAG tests
    test_evaluation.py   # Evaluation framework tests
    test_cli.py          # CLI integration tests
  pyproject.toml         # Hatchling build config
  README.md              # This file
  LICENSE                # MIT
  .github/workflows/ci.yml

License

MIT License. Copyright 2026 Corey Wade.

About

GraphRAG over network knowledge base vs standard RAG

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages