Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-8962: Merge small segments on commit #1617

Merged
merged 47 commits into from Jun 27, 2020
Merged

LUCENE-8962: Merge small segments on commit #1617

merged 47 commits into from Jun 27, 2020

Conversation

s1monw
Copy link
Member

@s1monw s1monw commented Jun 26, 2020

This PR revisits the merge-on-commit patch submitted by @msfroh a little while ago. The only change from that earlier PR is a fix for failures uncovered by TestIndexWriter.testRandomOperations, some whitespace cleanups, and a rebase on the current master branch. The problem was that updateSegmentInfosOnMergeFinish would incorrectly decRef a merged segments' files if that segment was modified by deletions (or updates) while it was being merged.

With this fix, I ran the failing test case several thousands of times with no failures, whereas before it would routinely fail after a few hundred test runs.

Co-authored-by: Michael Froh msfroh@apache.org
Co-authored-by: Michael Sokolov sokolov@falutin.net

This is the third try after the patch itself and #1552

Michael Sokolov and others added 30 commits June 3, 2020 15:17
This reverts commit 4501b3d.

This reverts commit 075adac.
IW might drop segments that are merged into a fully deleted segment on the floor
and deletes the newly created files right away. We should not include these segments
in a commit since we can't guarantee valid ref-counts on these files.
@s1monw s1monw requested review from dnhatn and msokolov and removed request for dnhatn June 26, 2020 14:05
@s1monw
Copy link
Member Author

s1monw commented Jun 26, 2020

@mikemccand can you take a look

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really close! I left a couple questions.

I've been beasting this on 128 core Ryzen Threadripper ... it is up to 135 iterations and no failures, yay!

I have some small fixes to comments/javadocs/whitespace; I'll push them shortly after passing tests/precommit.

}
} finally {
// ensure we assign this to close them in the case of an exception
this.mergeReaders = List.copyOf(readers);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm is there some (sneaky) reason why we couldn't just do this.mergeReaders = new ArrayList<>(segments.size()); above and then append to that list? Instead of appending to private ArrayList and then making a copy in the end?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment. I use immutable lists everywhere here it would be fatal if we modify this list outside of OneMerge. I think that justifies the copy.

}

/** We only allow sorting on these types */
private static final EnumSet<SortField.Type> ALLOWED_INDEX_SORT_TYPES = EnumSet.of(SortField.Type.STRING,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm how did this sneak in?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see this was removed earlier in mainline for LUCENE-9330 -- it must've snuck back in as a merge conflict.

I'll re-remove now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@msfroh
Copy link
Contributor

msfroh commented Jun 26, 2020

Wow! I didn't realize that capturing/cloning live docs at a point in time would be so complicated. I learned a lot from this PR about how deletes are handled. Thanks, Simon!

* Note: This settings has no effect unless {@link MergePolicy#findFullFlushMerges(MergeTrigger, SegmentInfos, MergePolicy.MergeContext)}
* has an implementation that actually returns merges which by default doesn't return any merges.
*/
public IndexWriterConfig setMaxCommitMergeWaitMillis(long maxCommitMergeWaitMillis) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am torn if we'd use a double or long here. I think we used double in other places but I do like a discrete value better...

@s1monw
Copy link
Member Author

s1monw commented Jun 26, 2020

Wow! I didn't realize that capturing/cloning live docs at a point in time would be so complicated. I learned a lot from this PR about how deletes are handled. Thanks, Simon!

@msfroh I didn't either :) it's been a while that I digged into merging code and I think the same is true for @mikemccand - I am impressed how far you got in this pretty complex class. I'd love to see you digging more and coming up with more stuff like this. Thank you for contributing this. I am always happy to help. This entire process from the first patch to the feature being integrated is a great example of how amazing OSS is.

@s1monw s1monw merged commit 7f352a9 into master Jun 27, 2020
@s1monw
Copy link
Member Author

s1monw commented Jun 27, 2020

merged into master. Thanks everybody involved!!

s1monw added a commit that referenced this pull request Jun 27, 2020
Add IndexWriter merge-on-commit feature to selectively merge small segments on commit,
subject to a configurable timeout, to improve search performance by reducing the number of small
segments for searching.

Co-authored-by: Michael Froh <msfroh@apache.org>
Co-authored-by: Michael Sokolov <sokolov@falutin.net>
Co-authored-by: Mike McCandless <mikemccand@apache.org>
@s1monw s1monw deleted the jira/lucene-8962 branch June 27, 2020 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants