New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-8962: Merge small segments on commit #1617
Conversation
…he presence of deletions
… work accordingly
IW might drop segments that are merged into a fully deleted segment on the floor and deletes the newly created files right away. We should not include these segments in a commit since we can't guarantee valid ref-counts on these files.
@mikemccand can you take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really close! I left a couple questions.
I've been beasting this on 128 core Ryzen Threadripper ... it is up to 135 iterations and no failures, yay!
I have some small fixes to comments/javadocs/whitespace; I'll push them shortly after passing tests/precommit.
} | ||
} finally { | ||
// ensure we assign this to close them in the case of an exception | ||
this.mergeReaders = List.copyOf(readers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm is there some (sneaky) reason why we couldn't just do this.mergeReaders = new ArrayList<>(segments.size());
above and then append to that list? Instead of appending to private ArrayList
and then making a copy in the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment. I use immutable lists everywhere here it would be fatal if we modify this list outside of OneMerge. I think that justifies the copy.
} | ||
|
||
/** We only allow sorting on these types */ | ||
private static final EnumSet<SortField.Type> ALLOWED_INDEX_SORT_TYPES = EnumSet.of(SortField.Type.STRING, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm how did this sneak in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I see this was removed earlier in mainline for LUCENE-9330 -- it must've snuck back in as a merge conflict.
I'll re-remove now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! I didn't realize that capturing/cloning live docs at a point in time would be so complicated. I learned a lot from this PR about how deletes are handled. Thanks, Simon! |
* Note: This settings has no effect unless {@link MergePolicy#findFullFlushMerges(MergeTrigger, SegmentInfos, MergePolicy.MergeContext)} | ||
* has an implementation that actually returns merges which by default doesn't return any merges. | ||
*/ | ||
public IndexWriterConfig setMaxCommitMergeWaitMillis(long maxCommitMergeWaitMillis) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am torn if we'd use a double or long here. I think we used double in other places but I do like a discrete value better...
@msfroh I didn't either :) it's been a while that I digged into merging code and I think the same is true for @mikemccand - I am impressed how far you got in this pretty complex class. I'd love to see you digging more and coming up with more stuff like this. Thank you for contributing this. I am always happy to help. This entire process from the first patch to the feature being integrated is a great example of how amazing OSS is. |
…e is doing; fix test case to use deterministic merging
merged into master. Thanks everybody involved!! |
Add IndexWriter merge-on-commit feature to selectively merge small segments on commit, subject to a configurable timeout, to improve search performance by reducing the number of small segments for searching. Co-authored-by: Michael Froh <msfroh@apache.org> Co-authored-by: Michael Sokolov <sokolov@falutin.net> Co-authored-by: Mike McCandless <mikemccand@apache.org>
This PR revisits the merge-on-commit patch submitted by @msfroh a little while ago. The only change from that earlier PR is a fix for failures uncovered by TestIndexWriter.testRandomOperations, some whitespace cleanups, and a rebase on the current master branch. The problem was that updateSegmentInfosOnMergeFinish would incorrectly decRef a merged segments' files if that segment was modified by deletions (or updates) while it was being merged.
With this fix, I ran the failing test case several thousands of times with no failures, whereas before it would routinely fail after a few hundred test runs.
Co-authored-by: Michael Froh msfroh@apache.org
Co-authored-by: Michael Sokolov sokolov@falutin.net
This is the third try after the patch itself and #1552