Problem
CodeGraph stores its code knowledge graph in SQLite — two flat tables (nodes, edges) with B-tree indexes. This works, but has two inherent limitations:
1. Multi-hop traversal = N rounds of SQL
GraphTraverser.traverseBFS() does application-level BFS: each layer calls getOutgoingEdges(nodeId) → SELECT * FROM edges WHERE source = ?. An N-hop path requires N separate SQL queries plus application-level queue management. SQLite has no native variable-length path operator.
2. No graph query language
Questions like "all paths from A to B", "all nodes within 3 hops of X", or "all classes implementing interface Y with their methods" cannot be expressed in a single SQL statement. They require multiple queries and application-level assembly. The MCP tool set (search/callers/callees/impact/explore) covers the common cases but cannot expose arbitrary structural queries.
Proposed Solution: NeuG graph database backend
An optional NeuG graph database backend, gated behind codegraph init --backend neug. SQLite remains the default — full backward compatibility, zero breaking changes.
4 key advantages:
-
High-performance graph storage — CSR (Compressed Sparse Row) optimized adjacency traversal. NeuG is built on GraphScope Flex, which set the world record on the LDBC SNB Interactive benchmark — the industry's gold standard for graph database performance — achieving 80,000+ QPS using purely declarative Cypher queries.
-
Industry-standard Cypher — Complex multi-hop traversals become single declarative queries. Exposed via codegraph cypher CLI and executeCypher() API, enabling users and agents to run arbitrary graph pattern matching.
-
Lightweight & embeddable — No external server process. Supports incremental updates, fitting CodeGraph's local-first architecture.
-
Extensible via native C++ extensions — Graph algorithms (Connected Components, PageRank, Louvain community detection, etc.) are planned for upcoming NeuG releases, enabling advanced code analysis like community detection and influence ranking.
Cypher query examples
All verified running on NeuG:
-- Find call paths (SQLite requires application-level BFS with N rounds of queries)
MATCH (a:CodeNode {name: 'handleRequest'})-[:CodeEdge*1..5]->(b:CodeNode {name: 'query'})
RETURN a.name, b.name
-- Find all classes implementing an interface and their methods
MATCH (i:CodeNode {name: 'Repository'})<-[:CodeEdge {kind: 'implements'}]-(c:CodeNode)
-[:CodeEdge {kind: 'contains'}]->(m:CodeNode {kind: 'method'})
RETURN c.name, m.name
-- Graph-level statistics
MATCH (n:CodeNode)-[e:CodeEdge]->()
RETURN n.kind, e.kind, count(e) ORDER BY count(e) DESC
We're the NeuG team and happy to own this integration end-to-end — implementation, tests, and ongoing maintenance. We already have a working branch with 67 integration tests passing and all CLI/MCP functionality verified. Happy to discuss the approach.
Problem
CodeGraph stores its code knowledge graph in SQLite — two flat tables (
nodes,edges) with B-tree indexes. This works, but has two inherent limitations:1. Multi-hop traversal = N rounds of SQL
GraphTraverser.traverseBFS()does application-level BFS: each layer callsgetOutgoingEdges(nodeId)→SELECT * FROM edges WHERE source = ?. An N-hop path requires N separate SQL queries plus application-level queue management. SQLite has no native variable-length path operator.2. No graph query language
Questions like "all paths from A to B", "all nodes within 3 hops of X", or "all classes implementing interface Y with their methods" cannot be expressed in a single SQL statement. They require multiple queries and application-level assembly. The MCP tool set (search/callers/callees/impact/explore) covers the common cases but cannot expose arbitrary structural queries.
Proposed Solution: NeuG graph database backend
An optional NeuG graph database backend, gated behind
codegraph init --backend neug. SQLite remains the default — full backward compatibility, zero breaking changes.4 key advantages:
High-performance graph storage — CSR (Compressed Sparse Row) optimized adjacency traversal. NeuG is built on GraphScope Flex, which set the world record on the LDBC SNB Interactive benchmark — the industry's gold standard for graph database performance — achieving 80,000+ QPS using purely declarative Cypher queries.
Industry-standard Cypher — Complex multi-hop traversals become single declarative queries. Exposed via
codegraph cypherCLI andexecuteCypher()API, enabling users and agents to run arbitrary graph pattern matching.Lightweight & embeddable — No external server process. Supports incremental updates, fitting CodeGraph's local-first architecture.
Extensible via native C++ extensions — Graph algorithms (Connected Components, PageRank, Louvain community detection, etc.) are planned for upcoming NeuG releases, enabling advanced code analysis like community detection and influence ranking.
Cypher query examples
All verified running on NeuG:
We're the NeuG team and happy to own this integration end-to-end — implementation, tests, and ongoing maintenance. We already have a working branch with 67 integration tests passing and all CLI/MCP functionality verified. Happy to discuss the approach.