Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-9484: Allow sorting an index after the fact #1789

Merged
merged 16 commits into from
Sep 3, 2020

Conversation

s1monw
Copy link
Member

@s1monw s1monw commented Aug 26, 2020

Today we need to decide on an index sorting before we create the index.
In some situations it might make a lot of sense to sort an index afterwards
when the index doesn't change anymore or to compress older indices.
This commit adds the ability to wrap readers from an unsorted index and merge it
into a sorted index by using IW#addIndices.

Today we need to decide on an index sorting before we create the index.
In some situations it might make a lot of sense to sort an index afterwards
when the index doesn't change anymore or to compress older indices.
This comit adds the ability to wrap readers from an unsorted index and merge it
into a sorted index by using IW#addIndices.
@msokolov
Copy link
Contributor

It's nice that this change is so straightforward. It makes me realize I don't know what happens today if we specify an index Sort and then open an existing index that is not sorted. Do we throw an error? Should we instead provide a sorted view on the index that can be used to rewrite it?

@s1monw
Copy link
Member Author

s1monw commented Aug 27, 2020

It's nice that this change is so straightforward. It makes me realize I don't know what happens today if we specify an index Sort and then open an existing index that is not sorted. Do we throw an error? Should we instead provide a sorted view on the index that can be used to rewrite it?

yes we fail if you do that. I don't think we should do any magic here. rewriting can be very costly and taking lots of space. I think failing is the right thing to do.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't remember we had kept this SortingLeafReader, which was the old way we implemented index sorting by wrapping readers with it at merge time. I can think of a couple enhancements like making it a SortingCodecReader rather than a SortingLeafReader but this looks like a nice incremental step and looks simple enough. So +1.

@s1monw
Copy link
Member Author

s1monw commented Aug 31, 2020

@jpountz I pushed some improvements and inherited from CodecReader, can you take another look

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make sure to delegate to the points/dv/fields/... reader of the wrapped reader rather than to the wrapped reader itself (see some inline comments, I didn't tag all instances of this).

@s1monw
Copy link
Member Author

s1monw commented Sep 1, 2020

@jpountz I applied your comments and moved the inner classes from the reader to the classes where they are mainly used. We should have done that earlier then we would have not kept this unused class. Now that reader only holds private inner classes.

@s1monw s1monw requested a review from jpountz September 2, 2020 06:56
@s1monw s1monw requested a review from jpountz September 2, 2020 13:31

@Override
public boolean advanceExact(int target) throws IOException {
throw new UnsupportedOperationException("use nextDoc instead");
Copy link
Contributor

@jpountz jpountz Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized reading other comments that this method might be required when configuring index sorting on a binary field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many others are needed but afaik if you wanna sort on a binary field it needs to be SortedDocValues and not BinaryDocValues - at least I wasn't able to make any test fail here..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@s1monw s1monw merged commit 430a435 into apache:master Sep 3, 2020
@s1monw s1monw deleted the LUCENE-9484 branch September 3, 2020 10:55
s1monw added a commit that referenced this pull request Sep 3, 2020
Today we need to decide on an index sorting before we create the index.
In some situations it might make a lot of sense to sort an index afterwards
when the index doesn't change anymore or to compress older indices.
This change adds the ability to wrap readers from an unsorted index and merge it
into a sorted index by using IW#addIndices.
gus-asf pushed a commit to gus-asf/lucene-solr that referenced this pull request Sep 4, 2020
Today we need to decide on an index sorting before we create the index.
In some situations it might make a lot of sense to sort an index afterwards
when the index doesn't change anymore or to compress older indices.
This change adds the ability to wrap readers from an unsorted index and merge it
into a sorted index by using IW#addIndices.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants