feat(timeseries): propagate ILP-inferred schema to catalog for SQL column resolution by emanzx · Pull Request #3 · NodeDB-Lab/nodedb

emanzx · 2026-03-30T07:01:21Z

Summary

Propagate ILP-inferred schema to the catalog: On first ILP ingest, the Data Plane infers a columnar schema (tags → Symbol/VARCHAR, fields → Float64/Int64, timestamp). Previously this schema only existed in the in-memory memtable, invisible to the SQL planner which still saw the default (timestamp, value) fields from CREATE TIMESERIES. Now the inferred schema is returned in the ingest response and the ILP listener updates StoredCollection.fields in the catalog.
Fix Arrow schema for timeseries collections: Omit the synthetic id column (timeseries rows don't have document IDs) and map TIMESTAMP to DataType::Int64 (internal millis representation) instead of Utf8.

Unblocks queries like:

SELECT qtype, COUNT(*) FROM metrics GROUP BY qtype;
SELECT client, AVG(elapsed_ms) FROM metrics GROUP BY client;
SELECT MIN(elapsed_ms), MAX(elapsed_ms) FROM metrics;

Test plan

ILP ingest 1M rows from AdGuard Home DNS query logs at ~180K rows/sec
DESCRIBE collection shows all ILP-inferred columns after first ingest
GROUP BY on tag columns (qtype, client, status) returns correct counts
AVG/MIN/MAX on numeric fields (elapsed_ms) returns correct values
WHERE + GROUP BY works for aggregate queries
Non-timeseries collections unaffected (schema still includes id prefix)
Schema only sent on first ingest batch (no overhead on subsequent writes)
cargo clippy clean, cargo fmt --all applied

…lumn resolution On first ILP ingest the Data Plane infers a columnar schema from the batch (tags → Symbol, fields → Float64/Int64, timestamp). This schema was only stored in the in-memory memtable, invisible to the SQL planner which still saw the default (timestamp, value) fields from CREATE TIMESERIES. Propagate the inferred schema back to the Control Plane via the ingest response payload. The ILP listener updates StoredCollection.fields in the catalog so DataFusion resolves the actual column names and types. Additionally, adjust the Arrow schema builder for timeseries collections: omit the synthetic 'id' column (timeseries rows have no document ID) and map TIMESTAMP to Int64 (internal millis representation).

emanzx mentioned this pull request Mar 30, 2026

Timeseries: WHERE filters on scans not routed through TimeseriesOp::Scan #4

Closed

farhan-syah merged commit 9e37ac2 into NodeDB-Lab:main Mar 30, 2026

farhan-syah mentioned this pull request Apr 2, 2026

fix: WAL catch-up regression + response frame size limit #13

Merged

7 tasks

This was referenced Apr 16, 2026

SQL planner correctness & procedural execution safety (8 sub-items) #47

Closed

HTTP and WebSocket endpoints accept unauthenticated requests (WS RPC / stream poll / stream SSE) (3 sub-items) #55

Closed

This was referenced Apr 17, 2026

Graph DSL handlers — parser correctness, result correctness, tenant-scope leak (4 sub-items) #52

Closed

Graph engine internals — variable-length expansion, label u16 wrap, BFS RPC amplification, non-atomic graph index build (4 sub-items) #53

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(timeseries): propagate ILP-inferred schema to catalog for SQL column resolution#3

feat(timeseries): propagate ILP-inferred schema to catalog for SQL column resolution#3
farhan-syah merged 1 commit intoNodeDB-Lab:mainfrom
emanzx:main

emanzx commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

emanzx commented Mar 30, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants