Skip to content

Range Predicates Unable to Avoid Filter Computation Due to Null Column Values #14694

@ankitsultana

Description

@ankitsultana

Say we have this column: event_timestamp, which may have null values. These values are expected to be in "mostly" non-decreasing order over time.

If we have queries with predicates such as event_timestamp IS NOT NULL AND event_timestamp BETWEEN x and y, then because the default value can practically only be an extremal value (0 or MAX_VALUE), we end up unnecessarily computing the range predicate for segments with non-null values in the range [x, y].

At high qps, this starts becoming a bottleneck even with a range index. For queries where the other filters are selective, I am planning to try and disable the range index altogether for one of our tables so we could rely on the Scan based filter which only runs for the filtered out docs.

Another related optimization is to early terminate the AndFilterOperator if any of the other filters have turned out empty. To do this, BlockDocIdSet can add a new method which can return the cardinality of the underlying docs. This may not always be possible, so the method could also return -1 indicating that cardinality is unknown at the moment.

Edit: I may be missing something since I didn't get a chance to take a deeper look into this.

protected BlockDocIdSet getTrues() {
Tracing.activeRecording().setNumChildren(_filterOperators.size());
List<BlockDocIdSet> blockDocIdSets = new ArrayList<>(_filterOperators.size());
for (BaseFilterOperator filterOperator : _filterOperators) {
blockDocIdSets.add(filterOperator.getTrues());
}
return new AndDocIdSet(blockDocIdSets, _queryOptions);
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions