Skip to content

Timeseries: WHERE filters on scans not routed through TimeseriesOp::Scan #4

@emanzx

Description

@emanzx

Problem

SELECT * FROM ts_collection WHERE field = 'value' LIMIT N returns empty results for timeseries collections, while SELECT * FROM ts_collection LIMIT N (no WHERE) and GROUP BY aggregates work correctly.

Root Cause

The PlanConverter::convert() Filter handler in converter.rs does not check is_timeseries() for the Filter(TableScan) case. When DataFusion produces a Filter(predicate, TableScan) plan for a WHERE query, the converter falls through to the document scan path (which finds nothing in the document store since ILP data lives in the columnar memtable).

The bare TableScan arm (line ~343) correctly routes to TimeseriesOp::Scan, and the Aggregate path works because it reads from columnar_memtables directly. But the Filter(TableScan) path is missing.

Affected Queries

-- These return empty:
SELECT * FROM metrics WHERE qtype = 'AAAA' LIMIT 10;
SELECT * FROM metrics WHERE elapsed_ms > 1000 LIMIT 5;
SELECT * FROM metrics WHERE client = '10.11.12.103' LIMIT 3;

-- These work fine:
SELECT * FROM metrics LIMIT 10;
SELECT qtype, COUNT(*) FROM metrics GROUP BY qtype;
SELECT qname, COUNT(*) FROM metrics WHERE elapsed_ms > 5000 GROUP BY qname;

Suggested Fix

Three changes in converter.rs:

1. Filter → TableScan: add timeseries check

In the Filter handler, before the KV routing check (~line 190), add:

// Timeseries routing: extract time-range and filters.
if self.is_timeseries(tenant_id, &collection) {
    let mut all_filters = vec![filter.predicate.clone()];
    all_filters.extend(scan.filters.iter().cloned());
    let (time_range, filter_bytes) =
        super::converter_helpers::extract_timeseries_filters(&all_filters)?;
    let limit = scan.fetch.unwrap_or(10_000);
    return Ok(vec![PhysicalTask {
        tenant_id,
        vshard_id: vshard,
        plan: PhysicalPlan::Timeseries(TimeseriesOp::Scan {
            collection,
            time_range,
            projection: Vec::new(),
            limit,
            filters: filter_bytes,
            bucket_interval_ms: 0,
            rls_filters: Vec::new(),
        }),
    }]);
}

2. Limit handler: propagate to TimeseriesOp::Scan

In the Limit handler (~line 541), DataFusion wraps Limit(Filter(TableScan)) — the LIMIT is on an outer node that our Filter short-circuit never sees. Add a match arm:

PhysicalPlan::Timeseries(TimeseriesOp::Scan { limit, .. }) => *limit = n,

3. Schema propagation (PR #3)

The catalog must expose the ILP-inferred columns to DataFusion — otherwise WHERE qtype = 'AAAA' fails at planning with "No field named qtype". PR #3 addresses this with schema propagation from the ILP ingest response.

Testing

With these changes, tested against 1M real DNS query rows from AdGuard Home:

Query Before After
WHERE qtype='AAAA' LIMIT 3 empty 3 rows, 96ms
WHERE status='SERVFAIL' LIMIT 1000 empty 437 rows, 96ms
WHERE elapsed_ms > 1000 LIMIT 5 empty 5 rows, works
All other queries unchanged unchanged

The columnar_filter module already handles the actual row-level evaluation — this change only wires the planner routing so the filters reach the scan handler.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions