Skip to content

Columnar memtable never flushes — unbounded memory growth on INSERT #38

@hollanf

Description

@hollanf

Every INSERT into a columnar collection accumulates in the in-memory memtable forever; nothing ever drains it to disk. A steady write workload OOMs the process.

Summary

ColumnarMemtable exposes should_flush() (returns true at DEFAULT_FLUSH_THRESHOLD = 65_536 rows) and MutationEngine exposes on_memtable_flushed(segment_id) to finalize a flush — but no caller on the write path ever invokes either. The columnar write handler inserts rows in a loop and returns; rows stay resident in ColumnarMemtable::columns indefinitely.

Current code

nodedb/src/data/executor/handlers/columnar_write.rs:75-99 — the insert loop:

for row in &ndb_rows {
    let obj = match row {
        nodedb_types::Value::Object(m) => m,
        _ => continue,
    };
    let values: Vec<Value> = schema
        .columns
        .iter()
        .map(|col| ndb_field_to_value(obj.get(&col.name), &col.column_type))
        .collect();

    match engine.insert(&values) {
        Ok(_) => accepted += 1,
        Err(e) => {
            return self.response_error(task, ErrorCode::Internal {
                detail: format!("columnar insert failed: {e}"),
            });
        }
    }
}
// ↑ No should_flush() / drain_optimized() / on_memtable_flushed() call anywhere.

Memtable + engine APIs that exist but are unused from the write path:

  • nodedb-columnar/src/memtable/mod.rs:85pub fn should_flush(&self) -> bool
  • nodedb-columnar/src/memtable/mod.rs:152pub fn drain_optimized(&mut self)
  • nodedb-columnar/src/mutation.rs:173pub fn on_memtable_flushed(&mut self, new_segment_id: u32)
  • nodedb-columnar/src/mutation.rs:279pub fn should_flush(&self) -> bool

Repo-wide grep confirms the only callers of on_memtable_flushed are the columnar crate's own tests (mutation.rs:465, 519); should_flush is called from nodedb-fts and nodedb/src/engine/timeseries/memtable.rs but never for the columnar engine in nodedb/src/**.

Why it's broken

  • Every inserted row stays in ColumnarMemtable::columns (Vec<ColumnData>) forever.
  • PkIndex::upsert also clones the PK bytes (Vec<u8>) into the in-memory index — a second permanent allocation per row.
  • No segment is ever written to disk → WAL is also never checkpoint-truncated because MemtableFlushed records are never emitted → the WAL itself grows without bound alongside the memtable.
  • No backpressure: insert() always returns Ok. A client ingesting at 50 k rows/s at ~100 B/row adds ~5 MB/s to RSS with no ceiling until OOM.

Reproduction

CREATE COLLECTION c (id INT PRIMARY KEY, v TEXT) USING COLUMNAR;
-- Loop INSERT ~100 k rows in a script.
-- Observe: RSS grows monotonically, no *.seg files appear under $DATA_DIR.
-- Process OOMs long before any row hits disk.

Notes

  • Found during a CPU/memory audit sweep of the columnar engine and its wire-up in nodedb/src/data/executor/handlers/.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions