Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use PeekableIntIterator for OR filter "partial index" value matchers #16300

Merged
merged 1 commit into from Apr 17, 2024

Conversation

clintropolis
Copy link
Member

Description

This PR switches the OR filter "partial index" value matchers to use PeekableIntIterator instead of IntIterator which can dramatically improve performance when these are used on top of an index offset which has a small number of set bits (since we can use advanceIfNeeded instead of looping to check next until we seek the correct offset). These matchers are used when some sub-filters support indexes we make a synthetic value matcher that checks the index instead of actually evaluating the matchers, which can be quite beneficial, but in certain scenarios can also be rather expensive.

The added benchmark is one such expensive scenario, where the OR filter is nested under an AND filter. The equality clause of the AND is rather selective, but the first clause of the OR matches all rows, so the while loop of the previous code needs to call hasNext/next quite a lot to seek to the next offset.

before:

Benchmark              (query)  (rowsPerSegment)  (schema)  (storageType)  (stringEncoding)  (vectorize)  Mode  Cnt   Score    Error  Units
SqlBenchmark.querySql       41           5000000  explicit           mmap              none        false  avgt    5  46.568 ± 18.771  ms/op
SqlBenchmark.querySql       41           5000000  explicit           mmap              none        force  avgt    5  13.940 ±  1.099  ms/op

after:

Benchmark              (query)  (rowsPerSegment)  (schema)  (storageType)  (stringEncoding)  (vectorize)  Mode  Cnt  Score   Error  Units
SqlBenchmark.querySql       41           5000000  explicit           mmap              none        false  avgt    5  2.877 ± 0.347  ms/op
SqlBenchmark.querySql       41           5000000  explicit           mmap              none        force  avgt    5  3.841 ± 1.413  ms/op

I should've known to do this in the first place given #8822 and done this as part of #15838, but the second best time is today or something.


This PR has:

  • been self-reviewed.
  • a release note entry in the PR description.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

@gianm gianm merged commit aa23064 into apache:master Apr 17, 2024
85 checks passed
@clintropolis clintropolis deleted the better-or-index-matcher branch April 17, 2024 20:04
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants