Unbounded-input DoS hardening — ef_search, TLS handshake, ILP line, SQL expr depth

Four distinct code paths accept attacker-controlled input without an upper bound. Each is independently exploitable to either OOM the process or wedge/crash a server thread from a single connection. Grouped as an epic because all share the same root cause (missing resource limit) and a single hardening sweep can cover them.

---

## 1. `ef_search` parameter has no upper bound — single-query OOM

**File:** [`nodedb/src/data/executor/handlers/vector_search.rs:354-361`](nodedb/src/data/executor/handlers/vector_search.rs) — `effective_ef`:

```rust
fn effective_ef(ef_search: usize, top_k: usize) -> usize {
    if ef_search > 0 {
        ef_search.max(top_k)          // ← only floor, no ceiling
    } else {
        top_k.saturating_mul(4).max(64)
    }
}
```

and the HNSW consumer at [`nodedb-vector/src/hnsw/search.rs:18-48`](nodedb-vector/src/hnsw/search.rs):

```rust
pub fn search(&self, query: &[f32], k: usize, ef: usize) -> Vec<SearchResult> {
    ...
    let ef = ef.max(k);               // ← only floor again
    ...
    let results = search_layer(self, query, current_ep, ef, 0, None);
```

`ef_search` propagates from user SQL (`SET ef_search = N`), from the protocol struct (`nodedb-types/src/protocol.rs:391 pub ef_search: Option<u64>`), and from the SQL planner (`nodedb-sql/src/planner/select.rs:568, 654` sets `ef_search: limit * 2`) straight into `search_layer`, which allocates a `BinaryHeap` of up to `ef` candidates plus a `HashSet<u32>` that grows until the heap is drained.

A single authenticated client issuing `SET ef_search = 1_000_000_000` causes immediate multi-GB allocation. Also exploitable via a huge `LIMIT` because `planner/select.rs` sets `ef_search = limit * 2`.

Repo-wide grep for `MAX_EF` / `ef.min` returns zero matches — no ceiling exists anywhere.

---

## 2. TLS handshake has no deadline — slow-loris wedges connection semaphore across native / RESP / ILP listeners

**Files:**
- [`nodedb/src/control/server/listener.rs:120-138`](nodedb/src/control/server/listener.rs) (native)
- [`nodedb/src/control/server/resp/listener.rs`](nodedb/src/control/server/resp/listener.rs) (RESP)
- [`nodedb/src/control/server/ilp_listener.rs`](nodedb/src/control/server/ilp_listener.rs) (ILP)

Representative pattern (native, listener.rs:120-138):

```rust
if let Some(ref acceptor) = tls_acceptor {
    let acceptor = acceptor.clone();
    connections.spawn(async move {
        match acceptor.accept(stream).await {   // ← no tokio::time::timeout
            Ok(tls_stream) => { /* session.run() */ }
            Err(e) => { warn!(...); }
        }
        drop(permit);
    });
}
```

The accept loop acquires a semaphore permit, spawns a task, then awaits the TLS handshake with no deadline. `tokio_rustls::TlsAcceptor::accept` only makes progress when the client sends data, so a client who opens TCP, sends 1 byte of ClientHello, and holds pins the permit indefinitely. The session-level idle timeout in `native/session.rs:88` only runs **after** a successful handshake.

N slow clients pin N permits; once the semaphore is drained, every legitimate TLS client is RST'd at accept (listener.rs:102-113 — `try_acquire_owned` + continue with dropped socket).

All three listeners share the same pattern.

---

## 3. ILP plaintext listener reads unbounded line length → OOM from one connection

**File:** [`nodedb/src/control/server/ilp_listener.rs:154-200`](nodedb/src/control/server/ilp_listener.rs)

```rust
async fn handle_ilp_connection(stream: ConnStream, peer: SocketAddr, state: &SharedState) -> crate::Result<()> {
    ...
    let reader = BufReader::new(stream);
    let mut lines = reader.lines();
    ...
    loop {
        tokio::select! {
            result = lines.next_line() => {
                match result {
                    Ok(Some(line)) => {
                        ...
                        batch.push_str(&line);
```

`tokio::io::AsyncBufReadExt::lines` grows the returned `String` until it hits `\n`. No maximum length.

ILP is plaintext (port 9009 by default, used by telegraf / vector / InfluxDB clients) and per-tenant quota checks happen **after** the read. An attacker connects, streams `a` bytes forever without ever sending `\n` — the `String` reallocates until OOM. The semaphore permit stays held the entire time; the task never yields to any idle-based cancellation at the line level.

Slow-drip variant (one byte per second) is also effective because there's no per-read deadline.

---

## 4. SQL expression parser + resolver have no recursion depth limit → stack overflow DoS

**Files:**
- [`nodedb-query/src/expr_parse.rs:199`](nodedb-query/src/expr_parse.rs) — `fn parse_expr` → … → `parse_primary` recurses into `parse_expr` on `LParen`.
- [`nodedb-sql/src/resolver/expr.rs`](nodedb-sql/src/resolver/expr.rs) — `convert_expr` is unconditionally recursive, notably `Expr::Nested(inner) => convert_expr(inner)`.
- [`nodedb-query/src/expr/eval.rs`](nodedb-query/src/expr/eval.rs) — `eval_scope` recurses for every nested node.

Grep for `MAX_DEPTH` / `recursion_limit` / `depth` across `nodedb-query/src/expr_parse.rs` and `nodedb-sql/src/resolver/expr.rs` returns **zero** matches. No depth guard anywhere in the pipeline.

A `WHERE ((((...((x))...))))` with tens of thousands of parentheses (or a deeply nested generated-column expression) recurses through `parse_expr → parse_or → parse_and → parse_comparison → parse_additive → parse_multiplicative → parse_unary → parse_primary` (≈ 8 stack frames per `(`), **stack-overflowing the server thread**. On Linux with default 8 MB stack that's ~10–20 k parens; on macOS non-main threads (512 KB) it's ~1–2 k.

A single SQL statement from a single authenticated client crashes the thread (and in some handler paths, the node).

**Reproduction:**

```sql
SELECT ( ( ( ( ... x ... ) ) ) ) FROM t;        -- 10 000 nested parens
-- or:
CREATE TABLE t (x INT GENERATED ALWAYS AS (( … x … )) STORED);
```

---

## Checklist

- [ ] 1. Clamp `ef_search` to a configured max in `effective_ef` and `HnswIndex::search`; reject excessive values at the protocol boundary.
- [ ] 2. Wrap `acceptor.accept(stream)` in `tokio::time::timeout(tls_handshake_timeout, …)` for all three listeners (native, RESP, ILP). Also consider a pre-handshake read deadline for the plaintext branches.
- [ ] 3. Replace `BufReader::lines` in `ilp_listener.rs` with a length-bounded reader (e.g. `LinesCodec::new_with_max_length`, or manual `read_until(b'\n')` with a cap).
- [ ] 4. Thread a depth counter through `parse_expr`, `convert_expr`, and `eval_scope` (or convert hot cases to iterative-with-explicit-stack). Return a typed error on exceed.

## Notes

- Found during a CPU/memory + DoS audit sweep. Each item is independently verifiable; checkboxes let PRs close them one-by-one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unbounded-input DoS hardening — ef_search, TLS handshake, ILP line, SQL expr depth #41

1. `ef_search` parameter has no upper bound — single-query OOM

2. TLS handshake has no deadline — slow-loris wedges connection semaphore across native / RESP / ILP listeners

3. ILP plaintext listener reads unbounded line length → OOM from one connection

4. SQL expression parser + resolver have no recursion depth limit → stack overflow DoS

Checklist

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unbounded-input DoS hardening — ef_search, TLS handshake, ILP line, SQL expr depth #41

Description

1. ef_search parameter has no upper bound — single-query OOM

2. TLS handshake has no deadline — slow-loris wedges connection semaphore across native / RESP / ILP listeners

3. ILP plaintext listener reads unbounded line length → OOM from one connection

4. SQL expression parser + resolver have no recursion depth limit → stack overflow DoS

Checklist

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. `ef_search` parameter has no upper bound — single-query OOM