Summary
When ingesting large volumes of data into a timeseries collection via ILP, queries only return results from the active memtable. Once data is flushed to TS partitions on disk or exists in WAL segments, it becomes completely invisible to all query operations — SELECT, COUNT(*), WHERE, and GROUP BY all only see the unflushed memtable.
This means at scale, the query engine is blind to 99%+ of ingested data.
Environment
- NodeDB version: v0.0.0-beta.1 (branch
main, commit fd8fb8f)
- OS: Linux 6.17.0 / 8GB RAM / 4 CPU
- Ingest method: ILP over TCP on port 8086
- Collection type: Timeseries (
CREATE TIMESERIES dns_mega)
Reproduction steps
- Start NodeDB with
NODEDB_PORT_ILP=8086 ./target/release/nodedb
- Create a timeseries collection:
CREATE TIMESERIES dns_mega
- Ingest 100M rows via ILP (high-cardinality DNS telemetry data with tags:
client_ip, qname, qtype, rcode, upstream, cached, interface, interface_ip and field elapsed_ms)
- Wait for memtable flushes to occur (they happen automatically as data accumulates)
- Run:
SELECT COUNT(*) FROM dns_mega
Expected behavior
COUNT(*) should return 100,000,000 (or close to it, accounting for any in-flight data).
Actual behavior
COUNT(*) returns 931,455 — which is exactly the number of rows in the active memtable. The other ~99M rows exist on disk but are not queried.
Evidence — data is on disk, just not queried
Flushed TS partitions (18 partitions, 18.8M rows)
$ ls -d ~/.local/share/nodedb/ts/dns_mega/ts-* | wc -l
18
$ # Row counts from partition.meta files:
ts-1774916767625_1774921964441: 1,048,782 rows
ts-1774921967625_1774927398621: 1,049,009 rows
ts-1774927398625_1774932638623: 1,049,040 rows
ts-1774932638625_1774937958623: 1,048,729 rows
...
(18 partitions, each ~1.05M rows)
TOTAL in flushed partitions: 18,880,226 rows
Each partition is ~15MB with proper columnar files (.col, .sym, schema.json, partition.meta, sparse_index.bin).
WAL segments (73 segments, ~81M rows)
$ ls ~/.local/share/nodedb/wal/*.seg | wc -l
73
$ ls -lh ~/.local/share/nodedb/wal/*.seg | head -3
-rw-rw-r-- 1 system system 65M ... wal-00000000000000021981.seg
-rw-rw-r-- 1 system system 65M ... wal-00000000000000033197.seg
-rw-rw-r-- 1 system system 65M ... wal-00000000000000040883.seg
73 WAL segments × 65MB each = 4.7GB in WAL. Each segment also has a corresponding .dwb file.
The math
| Location |
Rows |
Size |
| Flushed TS partitions |
18,880,226 |
256MB |
| WAL segments |
~81,000,000 |
4.7GB |
| Active memtable |
931,455 |
(in memory) |
| Total |
~100,000,000 |
~5GB |
Only the memtable (931,455 rows) is visible to queries.
Additional observations
WHERE predicates are completely ignored on COUNT
Every WHERE clause returns the same 931,455 count, regardless of the predicate — even logically impossible ones:
SELECT COUNT(*) FROM dns_mega WHERE rcode = 'SERVFAIL' → 931,455
SELECT COUNT(*) FROM dns_mega WHERE rcode = 'NONEXISTENT' → 931,455 -- impossible value
SELECT COUNT(*) FROM dns_mega WHERE elapsed_ms > 99999 → 931,455
SELECT COUNT(*) FROM dns_mega WHERE elapsed_ms < 0 → 931,455 -- no negative values exist
SELECT COUNT(*) FROM dns_mega WHERE qtype = 'A' → 931,455 -- should be ~40% of total
This suggests the timeseries COUNT(*) code path bypasses predicate evaluation entirely and just returns the memtable row count.
GROUP BY works but only on memtable data
GROUP BY queries do function correctly against the memtable — they return proper aggregations with correct distributions. But they only see the ~931K memtable rows, not the full 100M.
WAL segments not reclaimed after TS flush
The 73 WAL segments (4.7GB) persist on disk even though 18 TS partitions have been flushed from them. This appears to be a separate WAL garbage collection issue, but is related — the WAL contains the "missing" ~81M rows that are neither in TS partitions nor in the memtable.
Impact
- Critical for production use: Any dataset that exceeds memtable capacity becomes mostly unqueryable
- The 100M row ingest completed at 684K rows/sec with 0 errors, RSS stayed under 220MB — the ingest path is solid, but the query path doesn't follow the data to disk
- Users will see counts decrease over time as data flushes out of the memtable
Summary
When ingesting large volumes of data into a timeseries collection via ILP, queries only return results from the active memtable. Once data is flushed to TS partitions on disk or exists in WAL segments, it becomes completely invisible to all query operations —
SELECT,COUNT(*),WHERE, andGROUP BYall only see the unflushed memtable.This means at scale, the query engine is blind to 99%+ of ingested data.
Environment
main, commitfd8fb8f)CREATE TIMESERIES dns_mega)Reproduction steps
NODEDB_PORT_ILP=8086 ./target/release/nodedbCREATE TIMESERIES dns_megaclient_ip,qname,qtype,rcode,upstream,cached,interface,interface_ipand fieldelapsed_ms)SELECT COUNT(*) FROM dns_megaExpected behavior
COUNT(*)should return100,000,000(or close to it, accounting for any in-flight data).Actual behavior
COUNT(*)returns 931,455 — which is exactly the number of rows in the active memtable. The other ~99M rows exist on disk but are not queried.Evidence — data is on disk, just not queried
Flushed TS partitions (18 partitions, 18.8M rows)
Each partition is ~15MB with proper columnar files (
.col,.sym,schema.json,partition.meta,sparse_index.bin).WAL segments (73 segments, ~81M rows)
73 WAL segments × 65MB each = 4.7GB in WAL. Each segment also has a corresponding
.dwbfile.The math
Only the memtable (931,455 rows) is visible to queries.
Additional observations
WHERE predicates are completely ignored on COUNT
Every
WHEREclause returns the same 931,455 count, regardless of the predicate — even logically impossible ones:This suggests the timeseries
COUNT(*)code path bypasses predicate evaluation entirely and just returns the memtable row count.GROUP BY works but only on memtable data
GROUP BYqueries do function correctly against the memtable — they return proper aggregations with correct distributions. But they only see the ~931K memtable rows, not the full 100M.WAL segments not reclaimed after TS flush
The 73 WAL segments (4.7GB) persist on disk even though 18 TS partitions have been flushed from them. This appears to be a separate WAL garbage collection issue, but is related — the WAL contains the "missing" ~81M rows that are neither in TS partitions nor in the memtable.
Impact