Graph DSL handlers — parser correctness, result correctness, tenant-scope leak (4 sub-items)

Four bugs in graph DSL handlers (`pgwire/ddl/graph_ops.rs`, `match_ops.rs`). Two are silent-wrong-result bugs (`GRAPH PATH` returns BFS frontier instead of an actual path; `MATCH` leaks tenant-scoped node IDs to the client). Two are parser correctness bugs (unbounded numeric params, substring-based keyword extraction broken on data containing those keywords).

---

## 1. `GRAPH PATH FROM '<src>' TO '<dst>'` returns BFS discovery set, not a path

**File:** `nodedb/src/control/server/pgwire/ddl/graph_ops.rs:208-257`

```rust
match dispatch_utils::cross_core_bfs(state, tenant_id, vec![src.clone()], label, dir, max_depth).await {
    Ok(resp) => {
        let json_text = ...decode_payload_to_json(&resp.payload);
        if let Ok(nodes) = sonic_rs::from_str::<Vec<String>>(&json_text)
            && nodes.contains(&dst)
        {
            // Path exists — return the discovered nodes as the path.
            let payload = sonic_rs::to_vec(&nodes)...
            return payload_to_query_response(&path_resp.payload);
        }
        // dst not found — empty result.
        ...
    }
}
```

The handler does a `cross_core_bfs` from `src` up to `MAX_DEPTH`, then if `dst` appears anywhere in the discovery set, returns the entire discovery set as the "path". The inline comment at line 237 says "Path exists — return the discovered nodes as the path" — there is no parent-tracking, no path reconstruction, no depth association. On a connected graph with `MAX_DEPTH 10`, callers expecting `[src, hop1, hop2, dst]` get every node reachable within 10 hops. Any client that depends on the returned shape being an actual path (e.g. for visualisation, RAG ranking, hop counting) gets misleading data with no error signal.

**Repro:**
```sql
GRAPH INSERT EDGE FROM 'a' TO 'b' TYPE 'l';
GRAPH INSERT EDGE FROM 'b' TO 'c' TYPE 'l';
GRAPH INSERT EDGE FROM 'a' TO 'x' TYPE 'l';
GRAPH INSERT EDGE FROM 'a' TO 'y' TYPE 'l';
GRAPH PATH FROM 'a' TO 'c' MAX_DEPTH 5 LABEL 'l';
-- Returns: ["a", "b", "c", "x", "y", ...]   (whole BFS frontier)
-- Expected: ["a", "b", "c"]                  (actual shortest path)
```

---

## 2. `MATCH` returns tenant-scoped node IDs (`{tenant_id}:name`) to the client without `unscoped()`

**Files:** `nodedb/src/control/server/pgwire/ddl/match_ops.rs:52-90` + `nodedb/src/data/executor/handlers/graph.rs:22-23, 75-76`

Data Plane writes node names as `scoped_src = scoped_node(tid, src_id)` (handlers/graph.rs:22), prefixing every CSR node name with the numeric tenant id (`"{tid}:{src_id}"`). The MATCH response builder in `match_ops.rs::match_payload_to_response` passes the raw column strings straight to the client:

```rust
let val = row.get(col_name).and_then(|v| v.as_str()).unwrap_or("NULL");
encoder.encode_field(&val.to_string())?;
```

It does not call `unscoped()`. The pgwire client receives `"5:alice"` instead of `"alice"`. Two consequences:

1. **Information leak** — exposes the internal numeric tenant id to clients; combined with cross-tenant comparison this can be used to enumerate tenant numbering.
2. **Cross-core dedup correctness** — the merge relies on the scoped form being consistent. If any code path returns the unscoped form (`edge_store` direct scans vs. CSR traversal — they take different paths), the row dedup treats them as different rows and double-counts.

**Repro:**
```sql
GRAPH INSERT EDGE FROM 'a' TO 'b' TYPE 'l';
MATCH (x)-[:l]->(y) RETURN x, y;
-- Returns "<tenant_id>:a" / "<tenant_id>:b" instead of "a" / "b"
```

---

## 3. Unbounded numeric DSL parameters across every graph handler — single-statement DoS

**Files:** `nodedb/src/control/server/pgwire/ddl/graph_ops.rs:152, 219, 317`, `nodedb/src/control/server/pgwire/ddl/tree_ops.rs:200, 320`, `nodedb/src/control/server/pgwire/ddl/dsl/search_fusion.rs:60-62`, `nodedb/src/engine/graph/algo/params.rs:172-184`

```rust
// graph_ops.rs:152, 219
let depth = extract_number_after(&upper, "DEPTH").unwrap_or(2);
let max_depth = extract_number_after(&upper, "MAX_DEPTH").unwrap_or(10);
// graph_ops.rs:317
max_iterations: extract_number_after(&upper, "ITERATIONS"),
// search_fusion.rs:60-62
let vector_top_k = extract_param(&upper, "VECTOR_TOP_K").unwrap_or(20);
let expansion_depth = extract_param(&upper, "DEPTH").unwrap_or(2);
// params.rs:172-174
pub fn iterations(&self, default: usize) -> usize {
    self.max_iterations.unwrap_or(default).max(1)         // floor only, no ceiling
}
```

`extract_number_after` / `extract_param` return `Option<usize>` with no upper-bound clamp. Values propagate unchanged into:

- `cross_core_bfs(..., max_depth)` — every hop fans out via `broadcast_to_all_cores`
- PageRank `for iter in 1..=max_iter`
- Louvain / Label-Propagation iteration loops
- RAG-FUSION `expansion_depth`

A single statement with `ITERATIONS 1000000000` or `DEPTH 4294967295` saturates the Control Plane handler. `AlgoParams::iterations` has a `.max(1)` floor but no `.min(cap)` ceiling. There is no per-tenant concurrency limit on graph DSL in the reviewed paths. This is the same DoS class as #41 sub-item 1 (ef_search) but for graph operators.

**Repro:**
```sql
GRAPH ALGO PAGERANK ON c ITERATIONS 1000000000 TOLERANCE 1e-30;
GRAPH TRAVERSE FROM 'n' DEPTH 4294967295 LABEL 'l' DIRECTION both;
SEARCH c USING FUSION(ARRAY[0.1] VECTOR_TOP_K 100000 DEPTH 50 TOP 10);
```

---

## 4. `extract_*_after` helpers use plain substring match — node IDs / labels containing keywords parsed wrong; `PROPERTIES { ... }` swallows the entire statement tail

**File:** `nodedb/src/control/server/pgwire/ddl/graph_ops.rs:22-55, 424-491`

```rust
fn extract_properties(upper: &str, original: &str) -> PgWireResult<String> {
    let Some(kw_pos) = upper.find("PROPERTIES") else { return Ok(String::new()); };
    let after = original[kw_pos + "PROPERTIES".len()..].trim_start();
    if after.starts_with('{') {
        let obj_str = after.trim_end_matches(';').trim_end();   // ← whole tail
        match nodedb_sql::parser::object_literal::parse_object_literal(obj_str) {
            ...
        }
    } else {
        Ok(extract_quoted_after(upper, original, "PROPERTIES").unwrap_or_default())
    }
}
```

Two problems:

1. **`PROPERTIES { ... }` extraction grabs whole tail.** Object-literal parsing receives everything from `{` to end of statement minus a trailing `;`. There is no brace-balancing or quote-aware scanning — `PROPERTIES { k: '}' }` or any value containing `}` or `;` is mis-terminated.
2. **All `extract_*_after` helpers use plain `upper.find("KEYWORD")`** — `DEPTH`, `TO`, `FROM`, `TYPE`, `LABEL`, `DIRECTION`, `MAX_DEPTH`, `DAMPING`, `TOLERANCE`, `ITERATIONS`, `RESOLUTION`, `SAMPLE`, `MODE`, `PROPERTIES`. A node id, label, or property string containing the literal keyword (`FROM 'TO'`, label `'DEPTH'`, type `'PROPERTIES'`) short-circuits extraction at the wrong byte position — handlers parse arguments out of user data instead of statement structure.

Affects `insert_edge`, `delete_edge`, `traverse`, `neighbors`, `shortest_path`, `algo`.

**Repro:**
```sql
GRAPH INSERT EDGE FROM 'TO' TO 'FROM' TYPE 'LABEL' PROPERTIES { "x": 1 };
GRAPH INSERT EDGE FROM 'a' TO 'b' TYPE 'l' PROPERTIES { note: '} DEPTH 999' };
GRAPH ALGO PAGERANK ON c ITERATIONS 5 DAMPING 0.85; SELECT 'ITERATIONS 99';
GRAPH TRAVERSE FROM 'node_with_DEPTH_in_name' DEPTH 2;
```

---

## Checklist

- [ ] 1. `GRAPH PATH` should track parents during BFS and reconstruct the actual path; return `[src, …, dst]` in order, not the BFS closure
- [ ] 2. `MATCH` response builder must call `unscoped()` on every node-id column before encoding to client; audit every graph response path for the same
- [ ] 3. Clamp `DEPTH` / `MAX_DEPTH` / `ITERATIONS` / `EXPANSION_DEPTH` / `VECTOR_TOP_K` at configured ceilings before dispatch; reject excess as 22023
- [ ] 4. Replace all `extract_*_after` substring helpers with a tokenising parser that respects quoted strings and `{ … }` brace-balanced regions; or migrate graph DSL to the typed AST module being introduced (`nodedb_sql::ddl_ast`)

All items independently reproducible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph DSL handlers — parser correctness, result correctness, tenant-scope leak (4 sub-items) #52

1. `GRAPH PATH FROM '<src>' TO '<dst>'` returns BFS discovery set, not a path

2. `MATCH` returns tenant-scoped node IDs (`{tenant_id}:name`) to the client without `unscoped()`

3. Unbounded numeric DSL parameters across every graph handler — single-statement DoS

4. `extract_*_after` helpers use plain substring match — node IDs / labels containing keywords parsed wrong; `PROPERTIES { ... }` swallows the entire statement tail

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Graph DSL handlers — parser correctness, result correctness, tenant-scope leak (4 sub-items) #52

Description

1. GRAPH PATH FROM '<src>' TO '<dst>' returns BFS discovery set, not a path

2. MATCH returns tenant-scoped node IDs ({tenant_id}:name) to the client without unscoped()

3. Unbounded numeric DSL parameters across every graph handler — single-statement DoS

4. extract_*_after helpers use plain substring match — node IDs / labels containing keywords parsed wrong; PROPERTIES { ... } swallows the entire statement tail

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. `GRAPH PATH FROM '<src>' TO '<dst>'` returns BFS discovery set, not a path

2. `MATCH` returns tenant-scoped node IDs (`{tenant_id}:name`) to the client without `unscoped()`

4. `extract_*_after` helpers use plain substring match — node IDs / labels containing keywords parsed wrong; `PROPERTIES { ... }` swallows the entire statement tail