Skip to content

feat(timeseries): propagate ILP-inferred schema to catalog for SQL column resolution#3

Merged
farhan-syah merged 1 commit intoNodeDB-Lab:mainfrom
emanzx:main
Mar 30, 2026
Merged

feat(timeseries): propagate ILP-inferred schema to catalog for SQL column resolution#3
farhan-syah merged 1 commit intoNodeDB-Lab:mainfrom
emanzx:main

Conversation

@emanzx
Copy link
Copy Markdown
Contributor

@emanzx emanzx commented Mar 30, 2026

Summary

  • Propagate ILP-inferred schema to the catalog: On first ILP ingest, the Data Plane infers a columnar schema (tags → Symbol/VARCHAR, fields → Float64/Int64, timestamp). Previously this schema only existed in the in-memory memtable, invisible to the SQL planner which still saw the default (timestamp, value) fields from CREATE TIMESERIES. Now the inferred schema is returned in the ingest response and the ILP listener updates StoredCollection.fields in the catalog.

  • Fix Arrow schema for timeseries collections: Omit the synthetic id column (timeseries rows don't have document IDs) and map TIMESTAMP to DataType::Int64 (internal millis representation) instead of Utf8.

Unblocks queries like:

SELECT qtype, COUNT(*) FROM metrics GROUP BY qtype;
SELECT client, AVG(elapsed_ms) FROM metrics GROUP BY client;
SELECT MIN(elapsed_ms), MAX(elapsed_ms) FROM metrics;

Test plan

  • ILP ingest 1M rows from AdGuard Home DNS query logs at ~180K rows/sec
  • DESCRIBE collection shows all ILP-inferred columns after first ingest
  • GROUP BY on tag columns (qtype, client, status) returns correct counts
  • AVG/MIN/MAX on numeric fields (elapsed_ms) returns correct values
  • WHERE + GROUP BY works for aggregate queries
  • Non-timeseries collections unaffected (schema still includes id prefix)
  • Schema only sent on first ingest batch (no overhead on subsequent writes)
  • cargo clippy clean, cargo fmt --all applied

…lumn resolution

On first ILP ingest the Data Plane infers a columnar schema from the
batch (tags → Symbol, fields → Float64/Int64, timestamp). This schema
was only stored in the in-memory memtable, invisible to the SQL
planner which still saw the default (timestamp, value) fields from
CREATE TIMESERIES.

Propagate the inferred schema back to the Control Plane via the ingest
response payload. The ILP listener updates StoredCollection.fields in
the catalog so DataFusion resolves the actual column names and types.

Additionally, adjust the Arrow schema builder for timeseries
collections: omit the synthetic 'id' column (timeseries rows have no
document ID) and map TIMESTAMP to Int64 (internal millis
representation).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants