Skip to content

feat: add NeuG graph database as optional backend with native Cypher support #681

@BingqingLyu

Description

@BingqingLyu

Problem

CodeGraph stores its code knowledge graph in SQLite — two flat tables (nodes, edges) with B-tree indexes. This works, but has two inherent limitations:

1. Multi-hop traversal = N rounds of SQL

GraphTraverser.traverseBFS() does application-level BFS: each layer calls getOutgoingEdges(nodeId)SELECT * FROM edges WHERE source = ?. An N-hop path requires N separate SQL queries plus application-level queue management. SQLite has no native variable-length path operator.

2. No graph query language

Questions like "all paths from A to B", "all nodes within 3 hops of X", or "all classes implementing interface Y with their methods" cannot be expressed in a single SQL statement. They require multiple queries and application-level assembly. The MCP tool set (search/callers/callees/impact/explore) covers the common cases but cannot expose arbitrary structural queries.


Proposed Solution: NeuG graph database backend

An optional NeuG graph database backend, gated behind codegraph init --backend neug. SQLite remains the default — full backward compatibility, zero breaking changes.

4 key advantages:

  1. High-performance graph storage — CSR (Compressed Sparse Row) optimized adjacency traversal. NeuG is built on GraphScope Flex, which set the world record on the LDBC SNB Interactive benchmark — the industry's gold standard for graph database performance — achieving 80,000+ QPS using purely declarative Cypher queries.

  2. Industry-standard Cypher — Complex multi-hop traversals become single declarative queries. Exposed via codegraph cypher CLI and executeCypher() API, enabling users and agents to run arbitrary graph pattern matching.

  3. Lightweight & embeddable — No external server process. Supports incremental updates, fitting CodeGraph's local-first architecture.

  4. Extensible via native C++ extensions — Graph algorithms (Connected Components, PageRank, Louvain community detection, etc.) are planned for upcoming NeuG releases, enabling advanced code analysis like community detection and influence ranking.

Cypher query examples

All verified running on NeuG:

-- Find call paths (SQLite requires application-level BFS with N rounds of queries)
MATCH (a:CodeNode {name: 'handleRequest'})-[:CodeEdge*1..5]->(b:CodeNode {name: 'query'})
RETURN a.name, b.name

-- Find all classes implementing an interface and their methods
MATCH (i:CodeNode {name: 'Repository'})<-[:CodeEdge {kind: 'implements'}]-(c:CodeNode)
      -[:CodeEdge {kind: 'contains'}]->(m:CodeNode {kind: 'method'})
RETURN c.name, m.name

-- Graph-level statistics
MATCH (n:CodeNode)-[e:CodeEdge]->()
RETURN n.kind, e.kind, count(e) ORDER BY count(e) DESC

We're the NeuG team and happy to own this integration end-to-end — implementation, tests, and ongoing maintenance. We already have a working branch with 67 integration tests passing and all CLI/MCP functionality verified. Happy to discuss the approach.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions