Skip to content

Conversation

@romseygeek
Copy link
Contributor

Numeric sorts against a field with DocValuesSkippers enabled currently use
DocValuesRangeIterator to implement competitive iterators. This has a number
of disadvantages:

  • DVRI cannot efficiently implement docIDRunEnd() or intoBitSet(), meaning that
    bulk conjunction filtering may end up falling into slower code paths
  • For field value distributions that are essentially random, DVRI falls back to
    doc-by-doc value checking, meaning that no skipping happens at all, but adding
    overhead.

This commit adds a new SkipBlockRangeIterator that only skips whole blocks
where no document will be competitive, avoiding any individual doc-by-doc value
checks. The docIDRunEnd() and intoBitSet() implementations are very fast and
mean that bulk conjunction filtering will be efficient. The overheads as a whole
are very low, so randomly distributed values are much less adversarial, while
queries against indexes where the document order is roughly correlated with the
query sort get significant boosts.

This commit introduces a SkipBlockRangeIterator, that performs better
than DocValuesRangeIterator as a competitive iterator due to more
useful docIdRunEnd() and intoBitSet() implementations.
@romseygeek
Copy link
Contributor Author

NB: I initially tried to extend this to TermOrdValComparator as well, but that causes test failures. This is because the existing tests use randomly-distributed data, so the SkipBlockRangeIterator can't actually filter out any values. This is probably still a better implementation than the existing one, however, because currently we are using a competitive iterator that ends up checking the value of every document in turn. I think we can address this in a follow-up by adjusting the test to use an index sort.

@romseygeek
Copy link
Contributor Author

Internal elasticsearch benchmarks show that switching to this implementation doubles the performance of wholly adversarial sorts (eg sort by descending timestamp against an index sorted by ascending timestamp) without regressions elsewhere.

@github-actions github-actions bot added this to the 10.4.0 milestone Jan 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant