Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support IN predicate in ColumnValue SegmentPruner #6756

Closed
GSharayu opened this issue Apr 8, 2021 · 1 comment
Closed

Support IN predicate in ColumnValue SegmentPruner #6756

GSharayu opened this issue Apr 8, 2021 · 1 comment

Comments

@GSharayu
Copy link
Contributor

GSharayu commented Apr 8, 2021

Server side segment pruning is currently supported for =, RANGE filter operators using min-max value stats (segment metadata). Similarly, bloom filter is also used for = filter.

For IN filter operator, we should add support for min-max value based pruning if the number of values in the IN clause are below a certain threshold.

Adding this support for large number of values in IN clause won't be helpful as the pruning may not happen (since values are likely to be spread across several segments) and the time to prune itself may negate the benefits. So, let's start with a configurable value with default being less than 10.

GSharayu pushed a commit to GSharayu/pinot that referenced this issue Apr 8, 2021
GSharayu pushed a commit to GSharayu/pinot that referenced this issue Apr 8, 2021
GSharayu pushed a commit to GSharayu/pinot that referenced this issue Apr 12, 2021
GSharayu pushed a commit to GSharayu/pinot that referenced this issue Apr 15, 2021
Jackie-Jiang pushed a commit that referenced this issue Apr 15, 2021
Server side segment pruning is currently supported for =, RANGE filter operators using min-max value stats (segment metadata). Similarly, bloom filter is also used for = filter.

For IN filter operator, we should add support for min-max value based pruning if the number of values in the IN clause are below a certain threshold.

Adding this support for large number of values in IN clause won't be helpful as the pruning may not happen (since values are likely to be spread across several segments) and the time to prune itself may negate the benefits. So, let's start with a configurable value with default being less than 10.
@siddharthteotia
Copy link
Contributor

Support added in PR - #6776

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants