Problem
SELECT * FROM ts_collection WHERE field = 'value' LIMIT N returns empty results for timeseries collections, while SELECT * FROM ts_collection LIMIT N (no WHERE) and GROUP BY aggregates work correctly.
Root Cause
The PlanConverter::convert() Filter handler in converter.rs does not check is_timeseries() for the Filter(TableScan) case. When DataFusion produces a Filter(predicate, TableScan) plan for a WHERE query, the converter falls through to the document scan path (which finds nothing in the document store since ILP data lives in the columnar memtable).
The bare TableScan arm (line ~343) correctly routes to TimeseriesOp::Scan, and the Aggregate path works because it reads from columnar_memtables directly. But the Filter(TableScan) path is missing.
Affected Queries
-- These return empty:
SELECT * FROM metrics WHERE qtype = 'AAAA' LIMIT 10;
SELECT * FROM metrics WHERE elapsed_ms > 1000 LIMIT 5;
SELECT * FROM metrics WHERE client = '10.11.12.103' LIMIT 3;
-- These work fine:
SELECT * FROM metrics LIMIT 10;
SELECT qtype, COUNT(*) FROM metrics GROUP BY qtype;
SELECT qname, COUNT(*) FROM metrics WHERE elapsed_ms > 5000 GROUP BY qname;
Suggested Fix
Three changes in converter.rs:
1. Filter → TableScan: add timeseries check
In the Filter handler, before the KV routing check (~line 190), add:
// Timeseries routing: extract time-range and filters.
if self.is_timeseries(tenant_id, &collection) {
let mut all_filters = vec![filter.predicate.clone()];
all_filters.extend(scan.filters.iter().cloned());
let (time_range, filter_bytes) =
super::converter_helpers::extract_timeseries_filters(&all_filters)?;
let limit = scan.fetch.unwrap_or(10_000);
return Ok(vec![PhysicalTask {
tenant_id,
vshard_id: vshard,
plan: PhysicalPlan::Timeseries(TimeseriesOp::Scan {
collection,
time_range,
projection: Vec::new(),
limit,
filters: filter_bytes,
bucket_interval_ms: 0,
rls_filters: Vec::new(),
}),
}]);
}
2. Limit handler: propagate to TimeseriesOp::Scan
In the Limit handler (~line 541), DataFusion wraps Limit(Filter(TableScan)) — the LIMIT is on an outer node that our Filter short-circuit never sees. Add a match arm:
PhysicalPlan::Timeseries(TimeseriesOp::Scan { limit, .. }) => *limit = n,
3. Schema propagation (PR #3)
The catalog must expose the ILP-inferred columns to DataFusion — otherwise WHERE qtype = 'AAAA' fails at planning with "No field named qtype". PR #3 addresses this with schema propagation from the ILP ingest response.
Testing
With these changes, tested against 1M real DNS query rows from AdGuard Home:
| Query |
Before |
After |
WHERE qtype='AAAA' LIMIT 3 |
empty |
3 rows, 96ms |
WHERE status='SERVFAIL' LIMIT 1000 |
empty |
437 rows, 96ms |
WHERE elapsed_ms > 1000 LIMIT 5 |
empty |
5 rows, works |
| All other queries |
unchanged |
unchanged |
The columnar_filter module already handles the actual row-level evaluation — this change only wires the planner routing so the filters reach the scan handler.
Problem
SELECT * FROM ts_collection WHERE field = 'value' LIMIT Nreturns empty results for timeseries collections, whileSELECT * FROM ts_collection LIMIT N(no WHERE) andGROUP BYaggregates work correctly.Root Cause
The
PlanConverter::convert()Filter handler inconverter.rsdoes not checkis_timeseries()for theFilter(TableScan)case. When DataFusion produces aFilter(predicate, TableScan)plan for a WHERE query, the converter falls through to the document scan path (which finds nothing in the document store since ILP data lives in the columnar memtable).The bare
TableScanarm (line ~343) correctly routes toTimeseriesOp::Scan, and theAggregatepath works because it reads fromcolumnar_memtablesdirectly. But theFilter(TableScan)path is missing.Affected Queries
Suggested Fix
Three changes in
converter.rs:1. Filter → TableScan: add timeseries check
In the
Filterhandler, before the KV routing check (~line 190), add:2. Limit handler: propagate to TimeseriesOp::Scan
In the
Limithandler (~line 541), DataFusion wrapsLimit(Filter(TableScan))— the LIMIT is on an outer node that our Filter short-circuit never sees. Add a match arm:3. Schema propagation (PR #3)
The catalog must expose the ILP-inferred columns to DataFusion — otherwise
WHERE qtype = 'AAAA'fails at planning with "No field named qtype". PR #3 addresses this with schema propagation from the ILP ingest response.Testing
With these changes, tested against 1M real DNS query rows from AdGuard Home:
WHERE qtype='AAAA' LIMIT 3WHERE status='SERVFAIL' LIMIT 1000WHERE elapsed_ms > 1000 LIMIT 5The
columnar_filtermodule already handles the actual row-level evaluation — this change only wires the planner routing so the filters reach the scan handler.