-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-9484: Allow sorting an index after the fact #1789
Conversation
Today we need to decide on an index sorting before we create the index. In some situations it might make a lot of sense to sort an index afterwards when the index doesn't change anymore or to compress older indices. This comit adds the ability to wrap readers from an unsorted index and merge it into a sorted index by using IW#addIndices.
It's nice that this change is so straightforward. It makes me realize I don't know what happens today if we specify an index Sort and then open an existing index that is not sorted. Do we throw an error? Should we instead provide a sorted view on the index that can be used to rewrite it? |
yes we fail if you do that. I don't think we should do any magic here. rewriting can be very costly and taking lots of space. I think failing is the right thing to do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't remember we had kept this SortingLeafReader, which was the old way we implemented index sorting by wrapping readers with it at merge time. I can think of a couple enhancements like making it a SortingCodecReader rather than a SortingLeafReader but this looks like a nice incremental step and looks simple enough. So +1.
@jpountz I pushed some improvements and inherited from CodecReader, can you take another look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make sure to delegate to the points/dv/fields/... reader of the wrapped reader rather than to the wrapped reader itself (see some inline comments, I didn't tag all instances of this).
lucene/core/src/java/org/apache/lucene/index/SortingCodecReader.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/index/SortingCodecReader.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/index/SortingCodecReader.java
Outdated
Show resolved
Hide resolved
@jpountz I applied your comments and moved the inner classes from the reader to the classes where they are mainly used. We should have done that earlier then we would have not kept this unused class. Now that reader only holds private inner classes. |
lucene/core/src/java/org/apache/lucene/index/BinaryDocValuesWriter.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/index/BinaryDocValuesWriter.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/index/SortingCodecReader.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/index/NumericDocValuesWriter.java
Outdated
Show resolved
Hide resolved
|
||
@Override | ||
public boolean advanceExact(int target) throws IOException { | ||
throw new UnsupportedOperationException("use nextDoc instead"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized reading other comments that this method might be required when configuring index sorting on a binary field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
many others are needed but afaik if you wanna sort on a binary field it needs to be SortedDocValues
and not BinaryDocValues
- at least I wasn't able to make any test fail here..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Today we need to decide on an index sorting before we create the index. In some situations it might make a lot of sense to sort an index afterwards when the index doesn't change anymore or to compress older indices. This change adds the ability to wrap readers from an unsorted index and merge it into a sorted index by using IW#addIndices.
Today we need to decide on an index sorting before we create the index. In some situations it might make a lot of sense to sort an index afterwards when the index doesn't change anymore or to compress older indices. This change adds the ability to wrap readers from an unsorted index and merge it into a sorted index by using IW#addIndices.
Today we need to decide on an index sorting before we create the index.
In some situations it might make a lot of sense to sort an index afterwards
when the index doesn't change anymore or to compress older indices.
This commit adds the ability to wrap readers from an unsorted index and merge it
into a sorted index by using IW#addIndices.