LUCENE-10396: Add capability to jump to the next document with different ord in SortedDocValues #979
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR proposes to add a new method to SortedDocValues that helps users to advance an iterator to the next document that contains a different term that the current document, which can be specially useful when the index is sorted by this field.
The method contains a default implementation but this PR produces as well a fast implementation when the index is sorted by this field and it has low cardinality. In this case we write to disk a jump table that allows to quickly skip documents instead of manually iterating through the docs.
In https://issues.apache.org/jira/browse/LUCENE-10396 it is discussed some of the use cases where this method can be used, for example computing the number of unique values for documents that match a query. On the other hand, it diverges from the sparse index approach but as this ids less intrusive, it seems appealing.
Note that in order to handle backwards compatibility, I have increase the version of the codec instead of creating a new one.
#11432