Four bugs in graph DSL handlers (pgwire/ddl/graph_ops.rs, match_ops.rs). Two are silent-wrong-result bugs (GRAPH PATH returns BFS frontier instead of an actual path; MATCH leaks tenant-scoped node IDs to the client). Two are parser correctness bugs (unbounded numeric params, substring-based keyword extraction broken on data containing those keywords).
1. GRAPH PATH FROM '<src>' TO '<dst>' returns BFS discovery set, not a path
File: nodedb/src/control/server/pgwire/ddl/graph_ops.rs:208-257
match dispatch_utils::cross_core_bfs(state, tenant_id, vec![src.clone()], label, dir, max_depth).await {
Ok(resp) => {
let json_text = ...decode_payload_to_json(&resp.payload);
if let Ok(nodes) = sonic_rs::from_str::<Vec<String>>(&json_text)
&& nodes.contains(&dst)
{
// Path exists — return the discovered nodes as the path.
let payload = sonic_rs::to_vec(&nodes)...
return payload_to_query_response(&path_resp.payload);
}
// dst not found — empty result.
...
}
}
The handler does a cross_core_bfs from src up to MAX_DEPTH, then if dst appears anywhere in the discovery set, returns the entire discovery set as the "path". The inline comment at line 237 says "Path exists — return the discovered nodes as the path" — there is no parent-tracking, no path reconstruction, no depth association. On a connected graph with MAX_DEPTH 10, callers expecting [src, hop1, hop2, dst] get every node reachable within 10 hops. Any client that depends on the returned shape being an actual path (e.g. for visualisation, RAG ranking, hop counting) gets misleading data with no error signal.
Repro:
GRAPH INSERT EDGE FROM 'a' TO 'b' TYPE 'l';
GRAPH INSERT EDGE FROM 'b' TO 'c' TYPE 'l';
GRAPH INSERT EDGE FROM 'a' TO 'x' TYPE 'l';
GRAPH INSERT EDGE FROM 'a' TO 'y' TYPE 'l';
GRAPH PATH FROM 'a' TO 'c' MAX_DEPTH 5 LABEL 'l';
-- Returns: ["a", "b", "c", "x", "y", ...] (whole BFS frontier)
-- Expected: ["a", "b", "c"] (actual shortest path)
2. MATCH returns tenant-scoped node IDs ({tenant_id}:name) to the client without unscoped()
Files: nodedb/src/control/server/pgwire/ddl/match_ops.rs:52-90 + nodedb/src/data/executor/handlers/graph.rs:22-23, 75-76
Data Plane writes node names as scoped_src = scoped_node(tid, src_id) (handlers/graph.rs:22), prefixing every CSR node name with the numeric tenant id ("{tid}:{src_id}"). The MATCH response builder in match_ops.rs::match_payload_to_response passes the raw column strings straight to the client:
let val = row.get(col_name).and_then(|v| v.as_str()).unwrap_or("NULL");
encoder.encode_field(&val.to_string())?;
It does not call unscoped(). The pgwire client receives "5:alice" instead of "alice". Two consequences:
- Information leak — exposes the internal numeric tenant id to clients; combined with cross-tenant comparison this can be used to enumerate tenant numbering.
- Cross-core dedup correctness — the merge relies on the scoped form being consistent. If any code path returns the unscoped form (
edge_store direct scans vs. CSR traversal — they take different paths), the row dedup treats them as different rows and double-counts.
Repro:
GRAPH INSERT EDGE FROM 'a' TO 'b' TYPE 'l';
MATCH (x)-[:l]->(y) RETURN x, y;
-- Returns "<tenant_id>:a" / "<tenant_id>:b" instead of "a" / "b"
3. Unbounded numeric DSL parameters across every graph handler — single-statement DoS
Files: nodedb/src/control/server/pgwire/ddl/graph_ops.rs:152, 219, 317, nodedb/src/control/server/pgwire/ddl/tree_ops.rs:200, 320, nodedb/src/control/server/pgwire/ddl/dsl/search_fusion.rs:60-62, nodedb/src/engine/graph/algo/params.rs:172-184
// graph_ops.rs:152, 219
let depth = extract_number_after(&upper, "DEPTH").unwrap_or(2);
let max_depth = extract_number_after(&upper, "MAX_DEPTH").unwrap_or(10);
// graph_ops.rs:317
max_iterations: extract_number_after(&upper, "ITERATIONS"),
// search_fusion.rs:60-62
let vector_top_k = extract_param(&upper, "VECTOR_TOP_K").unwrap_or(20);
let expansion_depth = extract_param(&upper, "DEPTH").unwrap_or(2);
// params.rs:172-174
pub fn iterations(&self, default: usize) -> usize {
self.max_iterations.unwrap_or(default).max(1) // floor only, no ceiling
}
extract_number_after / extract_param return Option<usize> with no upper-bound clamp. Values propagate unchanged into:
cross_core_bfs(..., max_depth) — every hop fans out via broadcast_to_all_cores
- PageRank
for iter in 1..=max_iter
- Louvain / Label-Propagation iteration loops
- RAG-FUSION
expansion_depth
A single statement with ITERATIONS 1000000000 or DEPTH 4294967295 saturates the Control Plane handler. AlgoParams::iterations has a .max(1) floor but no .min(cap) ceiling. There is no per-tenant concurrency limit on graph DSL in the reviewed paths. This is the same DoS class as #41 sub-item 1 (ef_search) but for graph operators.
Repro:
GRAPH ALGO PAGERANK ON c ITERATIONS 1000000000 TOLERANCE 1e-30;
GRAPH TRAVERSE FROM 'n' DEPTH 4294967295 LABEL 'l' DIRECTION both;
SEARCH c USING FUSION(ARRAY[0.1] VECTOR_TOP_K 100000 DEPTH 50 TOP 10);
4. extract_*_after helpers use plain substring match — node IDs / labels containing keywords parsed wrong; PROPERTIES { ... } swallows the entire statement tail
File: nodedb/src/control/server/pgwire/ddl/graph_ops.rs:22-55, 424-491
fn extract_properties(upper: &str, original: &str) -> PgWireResult<String> {
let Some(kw_pos) = upper.find("PROPERTIES") else { return Ok(String::new()); };
let after = original[kw_pos + "PROPERTIES".len()..].trim_start();
if after.starts_with('{') {
let obj_str = after.trim_end_matches(';').trim_end(); // ← whole tail
match nodedb_sql::parser::object_literal::parse_object_literal(obj_str) {
...
}
} else {
Ok(extract_quoted_after(upper, original, "PROPERTIES").unwrap_or_default())
}
}
Two problems:
PROPERTIES { ... } extraction grabs whole tail. Object-literal parsing receives everything from { to end of statement minus a trailing ;. There is no brace-balancing or quote-aware scanning — PROPERTIES { k: '}' } or any value containing } or ; is mis-terminated.
- All
extract_*_after helpers use plain upper.find("KEYWORD") — DEPTH, TO, FROM, TYPE, LABEL, DIRECTION, MAX_DEPTH, DAMPING, TOLERANCE, ITERATIONS, RESOLUTION, SAMPLE, MODE, PROPERTIES. A node id, label, or property string containing the literal keyword (FROM 'TO', label 'DEPTH', type 'PROPERTIES') short-circuits extraction at the wrong byte position — handlers parse arguments out of user data instead of statement structure.
Affects insert_edge, delete_edge, traverse, neighbors, shortest_path, algo.
Repro:
GRAPH INSERT EDGE FROM 'TO' TO 'FROM' TYPE 'LABEL' PROPERTIES { "x": 1 };
GRAPH INSERT EDGE FROM 'a' TO 'b' TYPE 'l' PROPERTIES { note: '} DEPTH 999' };
GRAPH ALGO PAGERANK ON c ITERATIONS 5 DAMPING 0.85; SELECT 'ITERATIONS 99';
GRAPH TRAVERSE FROM 'node_with_DEPTH_in_name' DEPTH 2;
Checklist
All items independently reproducible.
Four bugs in graph DSL handlers (
pgwire/ddl/graph_ops.rs,match_ops.rs). Two are silent-wrong-result bugs (GRAPH PATHreturns BFS frontier instead of an actual path;MATCHleaks tenant-scoped node IDs to the client). Two are parser correctness bugs (unbounded numeric params, substring-based keyword extraction broken on data containing those keywords).1.
GRAPH PATH FROM '<src>' TO '<dst>'returns BFS discovery set, not a pathFile:
nodedb/src/control/server/pgwire/ddl/graph_ops.rs:208-257The handler does a
cross_core_bfsfromsrcup toMAX_DEPTH, then ifdstappears anywhere in the discovery set, returns the entire discovery set as the "path". The inline comment at line 237 says "Path exists — return the discovered nodes as the path" — there is no parent-tracking, no path reconstruction, no depth association. On a connected graph withMAX_DEPTH 10, callers expecting[src, hop1, hop2, dst]get every node reachable within 10 hops. Any client that depends on the returned shape being an actual path (e.g. for visualisation, RAG ranking, hop counting) gets misleading data with no error signal.Repro:
2.
MATCHreturns tenant-scoped node IDs ({tenant_id}:name) to the client withoutunscoped()Files:
nodedb/src/control/server/pgwire/ddl/match_ops.rs:52-90+nodedb/src/data/executor/handlers/graph.rs:22-23, 75-76Data Plane writes node names as
scoped_src = scoped_node(tid, src_id)(handlers/graph.rs:22), prefixing every CSR node name with the numeric tenant id ("{tid}:{src_id}"). The MATCH response builder inmatch_ops.rs::match_payload_to_responsepasses the raw column strings straight to the client:It does not call
unscoped(). The pgwire client receives"5:alice"instead of"alice". Two consequences:edge_storedirect scans vs. CSR traversal — they take different paths), the row dedup treats them as different rows and double-counts.Repro:
3. Unbounded numeric DSL parameters across every graph handler — single-statement DoS
Files:
nodedb/src/control/server/pgwire/ddl/graph_ops.rs:152, 219, 317,nodedb/src/control/server/pgwire/ddl/tree_ops.rs:200, 320,nodedb/src/control/server/pgwire/ddl/dsl/search_fusion.rs:60-62,nodedb/src/engine/graph/algo/params.rs:172-184extract_number_after/extract_paramreturnOption<usize>with no upper-bound clamp. Values propagate unchanged into:cross_core_bfs(..., max_depth)— every hop fans out viabroadcast_to_all_coresfor iter in 1..=max_iterexpansion_depthA single statement with
ITERATIONS 1000000000orDEPTH 4294967295saturates the Control Plane handler.AlgoParams::iterationshas a.max(1)floor but no.min(cap)ceiling. There is no per-tenant concurrency limit on graph DSL in the reviewed paths. This is the same DoS class as #41 sub-item 1 (ef_search) but for graph operators.Repro:
4.
extract_*_afterhelpers use plain substring match — node IDs / labels containing keywords parsed wrong;PROPERTIES { ... }swallows the entire statement tailFile:
nodedb/src/control/server/pgwire/ddl/graph_ops.rs:22-55, 424-491Two problems:
PROPERTIES { ... }extraction grabs whole tail. Object-literal parsing receives everything from{to end of statement minus a trailing;. There is no brace-balancing or quote-aware scanning —PROPERTIES { k: '}' }or any value containing}or;is mis-terminated.extract_*_afterhelpers use plainupper.find("KEYWORD")—DEPTH,TO,FROM,TYPE,LABEL,DIRECTION,MAX_DEPTH,DAMPING,TOLERANCE,ITERATIONS,RESOLUTION,SAMPLE,MODE,PROPERTIES. A node id, label, or property string containing the literal keyword (FROM 'TO', label'DEPTH', type'PROPERTIES') short-circuits extraction at the wrong byte position — handlers parse arguments out of user data instead of statement structure.Affects
insert_edge,delete_edge,traverse,neighbors,shortest_path,algo.Repro:
Checklist
GRAPH PATHshould track parents during BFS and reconstruct the actual path; return[src, …, dst]in order, not the BFS closureMATCHresponse builder must callunscoped()on every node-id column before encoding to client; audit every graph response path for the sameDEPTH/MAX_DEPTH/ITERATIONS/EXPANSION_DEPTH/VECTOR_TOP_Kat configured ceilings before dispatch; reject excess as 22023extract_*_aftersubstring helpers with a tokenising parser that respects quoted strings and{ … }brace-balanced regions; or migrate graph DSL to the typed AST module being introduced (nodedb_sql::ddl_ast)All items independently reproducible.