Skip to content

Unbounded-input DoS hardening — ef_search, TLS handshake, ILP line, SQL expr depth #41

@hollanf

Description

@hollanf

Four distinct code paths accept attacker-controlled input without an upper bound. Each is independently exploitable to either OOM the process or wedge/crash a server thread from a single connection. Grouped as an epic because all share the same root cause (missing resource limit) and a single hardening sweep can cover them.


1. ef_search parameter has no upper bound — single-query OOM

File: nodedb/src/data/executor/handlers/vector_search.rs:354-361effective_ef:

fn effective_ef(ef_search: usize, top_k: usize) -> usize {
    if ef_search > 0 {
        ef_search.max(top_k)          // ← only floor, no ceiling
    } else {
        top_k.saturating_mul(4).max(64)
    }
}

and the HNSW consumer at nodedb-vector/src/hnsw/search.rs:18-48:

pub fn search(&self, query: &[f32], k: usize, ef: usize) -> Vec<SearchResult> {
    ...
    let ef = ef.max(k);               // ← only floor again
    ...
    let results = search_layer(self, query, current_ep, ef, 0, None);

ef_search propagates from user SQL (SET ef_search = N), from the protocol struct (nodedb-types/src/protocol.rs:391 pub ef_search: Option<u64>), and from the SQL planner (nodedb-sql/src/planner/select.rs:568, 654 sets ef_search: limit * 2) straight into search_layer, which allocates a BinaryHeap of up to ef candidates plus a HashSet<u32> that grows until the heap is drained.

A single authenticated client issuing SET ef_search = 1_000_000_000 causes immediate multi-GB allocation. Also exploitable via a huge LIMIT because planner/select.rs sets ef_search = limit * 2.

Repo-wide grep for MAX_EF / ef.min returns zero matches — no ceiling exists anywhere.


2. TLS handshake has no deadline — slow-loris wedges connection semaphore across native / RESP / ILP listeners

Files:

Representative pattern (native, listener.rs:120-138):

if let Some(ref acceptor) = tls_acceptor {
    let acceptor = acceptor.clone();
    connections.spawn(async move {
        match acceptor.accept(stream).await {   // ← no tokio::time::timeout
            Ok(tls_stream) => { /* session.run() */ }
            Err(e) => { warn!(...); }
        }
        drop(permit);
    });
}

The accept loop acquires a semaphore permit, spawns a task, then awaits the TLS handshake with no deadline. tokio_rustls::TlsAcceptor::accept only makes progress when the client sends data, so a client who opens TCP, sends 1 byte of ClientHello, and holds pins the permit indefinitely. The session-level idle timeout in native/session.rs:88 only runs after a successful handshake.

N slow clients pin N permits; once the semaphore is drained, every legitimate TLS client is RST'd at accept (listener.rs:102-113 — try_acquire_owned + continue with dropped socket).

All three listeners share the same pattern.


3. ILP plaintext listener reads unbounded line length → OOM from one connection

File: nodedb/src/control/server/ilp_listener.rs:154-200

async fn handle_ilp_connection(stream: ConnStream, peer: SocketAddr, state: &SharedState) -> crate::Result<()> {
    ...
    let reader = BufReader::new(stream);
    let mut lines = reader.lines();
    ...
    loop {
        tokio::select! {
            result = lines.next_line() => {
                match result {
                    Ok(Some(line)) => {
                        ...
                        batch.push_str(&line);

tokio::io::AsyncBufReadExt::lines grows the returned String until it hits \n. No maximum length.

ILP is plaintext (port 9009 by default, used by telegraf / vector / InfluxDB clients) and per-tenant quota checks happen after the read. An attacker connects, streams a bytes forever without ever sending \n — the String reallocates until OOM. The semaphore permit stays held the entire time; the task never yields to any idle-based cancellation at the line level.

Slow-drip variant (one byte per second) is also effective because there's no per-read deadline.


4. SQL expression parser + resolver have no recursion depth limit → stack overflow DoS

Files:

Grep for MAX_DEPTH / recursion_limit / depth across nodedb-query/src/expr_parse.rs and nodedb-sql/src/resolver/expr.rs returns zero matches. No depth guard anywhere in the pipeline.

A WHERE ((((...((x))...)))) with tens of thousands of parentheses (or a deeply nested generated-column expression) recurses through parse_expr → parse_or → parse_and → parse_comparison → parse_additive → parse_multiplicative → parse_unary → parse_primary (≈ 8 stack frames per (), stack-overflowing the server thread. On Linux with default 8 MB stack that's ~10–20 k parens; on macOS non-main threads (512 KB) it's ~1–2 k.

A single SQL statement from a single authenticated client crashes the thread (and in some handler paths, the node).

Reproduction:

SELECT ( ( ( ( ... x ... ) ) ) ) FROM t;        -- 10 000 nested parens
-- or:
CREATE TABLE t (x INT GENERATED ALWAYS AS (( … x … )) STORED);

Checklist

  • 1. Clamp ef_search to a configured max in effective_ef and HnswIndex::search; reject excessive values at the protocol boundary.
  • 2. Wrap acceptor.accept(stream) in tokio::time::timeout(tls_handshake_timeout, …) for all three listeners (native, RESP, ILP). Also consider a pre-handshake read deadline for the plaintext branches.
  • 3. Replace BufReader::lines in ilp_listener.rs with a length-bounded reader (e.g. LinesCodec::new_with_max_length, or manual read_until(b'\n') with a cap).
  • 4. Thread a depth counter through parse_expr, convert_expr, and eval_scope (or convert hot cases to iterative-with-explicit-stack). Return a typed error on exceed.

Notes

  • Found during a CPU/memory + DoS audit sweep. Each item is independently verifiable; checkboxes let PRs close them one-by-one.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions