-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-8962: merge small segments on commit #1552
Conversation
…he presence of deletions
/** | ||
* Expert: sets the amount of time to wait for merges returned by MergePolicy.findFullFlushMerges(...). | ||
* If this time is reached, we proceed with the commit based on segments merged up to that point. | ||
* The merges are not cancelled, and may still run to completion independent of the commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for the last sentence: The merges are not cancelled, and will still run to completion independent of the commit like normal segment merges
?
Maybe also state that this setting has no effect unless the MergePolicy
actually returns merges from findFullFlushMerges
, which the default merge policy does not?
lucene/CHANGES.txt
Outdated
@@ -376,6 +376,8 @@ Improvements | |||
|
|||
* LUCENE-9253: KoreanTokenizer now supports custom dictionaries(system, unknown). (Namgyu Kim) | |||
|
|||
* LUCENE-8962: Add ability to selectively merge on commit (Michael Froh) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe Add IndexWriter merge-on-commit feature to selectively merge small segments on commit, subject to a configurable timeout, to improve search performance by reducing the number of small segments for searching
?
I'll beast this PR on my 128-core AMD Ryzen box :) |
I ran this overnight, consuming a lot of electricity and generating heat in my already too hot basement ;0 It finished all Lucene (core + modules) tests 889 times, and hit a few interesting failures! I have not checked 1) whether they reproduce, nor 2) whether these failures might also occur on clean master: 1st:
2nd:
3rd:
4th:
5th:
|
CC @bruno-roustant see TestPhraseWildcardQuery above |
The 2nd - TestPhraseWildcardQuery failure should be ignored, it's not a real issue. I'll fix that fragile test. |
Great, thanks @bruno-roustant! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments. I am still not a fan of the way this is implemented. There is too much magic happening that could hit folks by surprise. I personally think we should rethink if it's necessary to do this all behind the scenes or if we can simplify a lot of things by adding a single new method to IW. I also want to ask for more reuse of existing infrasturcutre in IW instead of special cases. We know how to register merges and trigger them, we can maybe first work on how to track their lifecycle generally and then reuse what do elsewhere?
@@ -4483,6 +4593,7 @@ public int length() { | |||
// Merge would produce a 0-doc segment, so we do nothing except commit the merge to remove all the 0-doc segments that we "merged": | |||
assert merge.info.info.maxDoc() == 0; | |||
commitMerge(merge, mergeState); | |||
success = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was a small pre-existing bug.
I.e. the merge has in fact succeeded on this path. Before this change we are calling closeMergeReaders
twice (once in the line above this, then again on line 4720 below. Maybe that is harmless, but code-wise I think this path did succeed.
If necessary, we could pull this out into its own PR? But I think it's a good, if subtle, catch. The merge did succeed in this path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we fix it in a dedicated PR with a dedicated test? that would also help if we look at the history of the bugfix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phew, I dit some git archaeology (thanks @msokolov for the pointers!) and uncovered the branch commit for this "merge small segments on commit" feature where we added this controversial success = true
: cab5ef5
+1 to pull the bugfix out into its own issue; I will open one.
The above commit has a dedicated test case, but the problem is that test case (in the above commit) relies on this new feature (it uses the new MergePolicy.findFullFlushMerges
). So we would need a new test case based on clean master
branch showing the bug ... it looks like a test that merged 100% deleted segments ought to then incorrectly double-call closeMergedReaders
(first with suppressExceptions = false
then again with true
) due to this missing success = true
so it really should be easy to reproduce. Though, actually I'm surprised none of our random testing uncovered this. Not sure I full understand the bug yet :) I will open an issue!
/** | ||
* Set the callback that gets invoked when IndexWriter performs various actions. | ||
*/ | ||
public IndexWriterConfig setIndexWriterEvents(IndexWriterEvents indexWriterEvents) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method is never called in this entire PR. Also the Interface seems to be unused or rather never implemented. I think it should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah for sure if we include it in this PR it should be tested by the PR.
See my comment above about this.
* If this time is reached, we proceed with the commit based on segments merged up to that point. | ||
* The merges are not cancelled, and may still run to completion independent of the commit. | ||
*/ | ||
public IndexWriterConfig setMaxCommitMergeWaitSeconds(double maxCommitMergeWaitSeconds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like this setting. While I am convinced we should be very very careful adding more settings here, we should if possible use a parameter on a method to pass information like this. I personally would feel much better if we had a new method on IW called prepareCommit(double maxCommitMergeWaitSeconds)
Maybe we can even go further and also pass a function to select the merges such that we don't need to add more stuff to mergePolicy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @msfroh had considered a separate IndexWriter
method before but something went wrong with that approach?
I don't think this should be a separate method, actually.
We have a MergePolicy
that governs which merges should happen upon which events/triggers and what this change is adding is a new trigger (on commit) at which merging could conceivably occur. If we added this method, the implication to fresh eyes would be that the existing prepareCommit
will also wait for merges with some default parameter, while this new method lets you change the default.
Anyway, let's hear from @msfroh if there was some wrinkle on making a dedicated method for this, but I still think that's a messy API. We should rather use our existing MergePolicy
API correctly.
return; | ||
} | ||
if (committed) { | ||
deleter.incRef(this.info.files()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to understand why we need to do any extra work here. What makes this special to any other merge such that we need to do all this work. If this needs to be done only if we include this merged segment in the commit, can't we do it outside of this mergeFinished and only use mergeFinished to signal which merge finished in time etc? Then we also might not need the latch construct and can use a simple callback that we can ignore on the commit end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @msfroh might remember some context on why he settled on this approach.
This is inherently a complex problem: we want to let MergePolicy
pick the "right" merges to do on commit (the smallish ones). But, it may pick poorly, or maybe the box is IO starved currently, and so we want to 1) ask MergeScheduler
to kick off the merges it had requested, but 2) those merges that complete it time will make it into the commit point, while those that do not will still be allowed to run to completion and be switched in the live SegmentInfos
(no wasted work), but will not be in the current commit point.
Anyway, +1 if we can re-use existing code that "merges the merged segments" into a live or the pending commit SegmentInfos
. Maybe we can factor out the common code to reduce the added complexity here?
@@ -3228,15 +3268,38 @@ private long prepareCommitInternal() throws IOException { | |||
// sneak into the commit point: | |||
toCommit = segmentInfos.clone(); | |||
|
|||
if (anyChanges) { | |||
// Find any merges that can execute on commit (per MergePolicy). | |||
MergePolicy.MergeSpecification mergeSpec = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wonder if we need all these changes here in such a fragile part of the code. Wouldn't it be possible to simply call maybeMerge(MergePolicy mergePolicy, MergeTrigger trigger, int maxNumSegments)
with a MergePolicy wrapper that does all the magic like wrapping segments etc. Then we could pick up the callback idea from above and just wait here until all merges called back? I think we should try to reuse most of the current infrastructure in IW instead of special casing. There was a lot of work put into this to reduce special casing I think we should try hard to reduce it more and try harder to not add any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what makes this tricky is that this is a combination of MergePolicy
(to pick the small merges) and MergeScheduler
(to run them and await their completion, subject to a time limit) purposes.
I do not think you can achieve this by just wrapping in MergePolicy
, but I agree it would be better if we could find a simpler way to achieve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to showcase this here master...s1monw:LUCENE-8962
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @s1monw. I'll review your new PR.
} | ||
} | ||
if (abandonedCount > 0) { | ||
config.getIndexWriterEvents().abandonedMergesOnCommit(abandonedCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this used for? Can we use testPoints for it if it's necesary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea with this is to be able to track externally to IndexWriter
how many merges completed during the commit window, how many did not, how long the commit waited for small merges to finish, etc. This is useful telemetry for understanding how the feature is actually working in your production cluster. I don't think we can get the equivalent telemetry by e.g. wrapping MergePolicy
or MergeScheduler
, because it is IndexWriter
that knows how long it will wait and knows which merges made it and which did not.
I think testPoints
only run under assertion, and are really designed for unit tests to do interesting things. But maybe we could somehow abuse it for this use case?
I agree it should be tested better if we want to include it here.
If it is really controversial then +1 to remove the IndexWriterEvents
entirely here and pursue it as a separate issue. It was useful for us (Amazon product search) to understand how this feature impacts our production clusters, and to help us tune a reasonable timeout value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should detach this discussion if we need metrics on IW from this PR. It distracts from it's actual core change IMO and if'd add metrics then we'd need some more or can consolidate them too. I'd rather have a stats object than a callback here to be honest but again that's a different discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK let's remove this part and leave it for another day. I'll open a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review @s1monw!
I realize this is adding more tricky code into what is already tricky code! And I am all for simplifying the approach taken here, if we can find a way. @msfroh had tried various challenging alternative approaches before settling on this one; I am hoping he can shed some light on that history for context.
Please remember that this feature is disabled here by default. It will not impact existing users who do not enable the feature.
Commits are already typically costly operations (writing lots of new segment files, waiting for all of them to fsync
, etc.), and so incurring a few seconds to give small segments a chance to merge is likely a good tradeoff for most applications, especially if they will push the resulting commit point to many replicas with high-ish search traffic.
We have been running this across Amazon's full product search production fleet for a few months now and have not hit any obvious issues.
Finally, please also remember the sizable impact of this change! We saw a ~25% decrease in segment counts in our production views due to this, and a big drop in the variance (see this chart in LUCENE-8962). It makes Lucene's segmented index more predictable.
I think we should offer a similar feature for refresh
(not in this PR! In a follow-on issue/PR...). It would be amazing if in pursuing simplifications to this approach, here, it would become easier / more obvious how to handle the refresh case in the future too.
} | ||
} | ||
if (abandonedCount > 0) { | ||
config.getIndexWriterEvents().abandonedMergesOnCommit(abandonedCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea with this is to be able to track externally to IndexWriter
how many merges completed during the commit window, how many did not, how long the commit waited for small merges to finish, etc. This is useful telemetry for understanding how the feature is actually working in your production cluster. I don't think we can get the equivalent telemetry by e.g. wrapping MergePolicy
or MergeScheduler
, because it is IndexWriter
that knows how long it will wait and knows which merges made it and which did not.
I think testPoints
only run under assertion, and are really designed for unit tests to do interesting things. But maybe we could somehow abuse it for this use case?
I agree it should be tested better if we want to include it here.
If it is really controversial then +1 to remove the IndexWriterEvents
entirely here and pursue it as a separate issue. It was useful for us (Amazon product search) to understand how this feature impacts our production clusters, and to help us tune a reasonable timeout value.
@@ -4483,6 +4593,7 @@ public int length() { | |||
// Merge would produce a 0-doc segment, so we do nothing except commit the merge to remove all the 0-doc segments that we "merged": | |||
assert merge.info.info.maxDoc() == 0; | |||
commitMerge(merge, mergeState); | |||
success = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was a small pre-existing bug.
I.e. the merge has in fact succeeded on this path. Before this change we are calling closeMergeReaders
twice (once in the line above this, then again on line 4720 below. Maybe that is harmless, but code-wise I think this path did succeed.
If necessary, we could pull this out into its own PR? But I think it's a good, if subtle, catch. The merge did succeed in this path.
@@ -109,6 +110,9 @@ | |||
|
|||
/** Default value for whether calls to {@link IndexWriter#close()} include a commit. */ | |||
public final static boolean DEFAULT_COMMIT_ON_CLOSE = true; | |||
|
|||
/** Default value for time to wait for merges on commit (when using a {@link MergePolicy} that implements findFullFlushMerges). */ | |||
public static final double DEFAULT_MAX_COMMIT_MERGE_WAIT_SECONDS = 30.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe 5.0 or 10.0 seconds instead? A "typical" commit, writing new Lucene segments and fsync
ing many files can often take several seconds, but 30 seems high.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe 0 as a default and if somebody want's to wait they can set it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The success=true
added above was needed in order to fix a test failure caught by @dnhatn 's new unit test (testRandomOperations), so they belong together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe 0 as a default and if somebody want's to wait they can set it?
+1
The feature is already disabled by default anyways (until you implement findFullFlushMerges
in your MergePolicy
), but making this 0
by default would make it even clearer that the feature is off by default.
The success=true added above was needed in order to fix a test failure caught by @dnhatn 's new unit test (testRandomOperations), so they belong together.
Ahh thanks for the context @msokolov! However, staring at the code (maybe for not long enough!), it looks like it really ought to be a pre-existing bug, and should be unit-testable without this new feature. But I am confused why none of our random tests have tripped up on this yet. I will open a separate Jira issue for this.
* If this time is reached, we proceed with the commit based on segments merged up to that point. | ||
* The merges are not cancelled, and may still run to completion independent of the commit. | ||
*/ | ||
public IndexWriterConfig setMaxCommitMergeWaitSeconds(double maxCommitMergeWaitSeconds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @msfroh had considered a separate IndexWriter
method before but something went wrong with that approach?
I don't think this should be a separate method, actually.
We have a MergePolicy
that governs which merges should happen upon which events/triggers and what this change is adding is a new trigger (on commit) at which merging could conceivably occur. If we added this method, the implication to fresh eyes would be that the existing prepareCommit
will also wait for merges with some default parameter, while this new method lets you change the default.
Anyway, let's hear from @msfroh if there was some wrinkle on making a dedicated method for this, but I still think that's a messy API. We should rather use our existing MergePolicy
API correctly.
/** | ||
* Set the callback that gets invoked when IndexWriter performs various actions. | ||
*/ | ||
public IndexWriterConfig setIndexWriterEvents(IndexWriterEvents indexWriterEvents) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah for sure if we include it in this PR it should be tested by the PR.
See my comment above about this.
return; | ||
} | ||
if (committed) { | ||
deleter.incRef(this.info.files()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @msfroh might remember some context on why he settled on this approach.
This is inherently a complex problem: we want to let MergePolicy
pick the "right" merges to do on commit (the smallish ones). But, it may pick poorly, or maybe the box is IO starved currently, and so we want to 1) ask MergeScheduler
to kick off the merges it had requested, but 2) those merges that complete it time will make it into the commit point, while those that do not will still be allowed to run to completion and be switched in the live SegmentInfos
(no wasted work), but will not be in the current commit point.
Anyway, +1 if we can re-use existing code that "merges the merged segments" into a live or the pending commit SegmentInfos
. Maybe we can factor out the common code to reduce the added complexity here?
@@ -3228,15 +3268,38 @@ private long prepareCommitInternal() throws IOException { | |||
// sneak into the commit point: | |||
toCommit = segmentInfos.clone(); | |||
|
|||
if (anyChanges) { | |||
// Find any merges that can execute on commit (per MergePolicy). | |||
MergePolicy.MergeSpecification mergeSpec = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what makes this tricky is that this is a combination of MergePolicy
(to pick the small merges) and MergeScheduler
(to run them and await their completion, subject to a time limit) purposes.
I do not think you can achieve this by just wrapping in MergePolicy
, but I agree it would be better if we could find a simpler way to achieve it.
Test case #1 above, at least does reproduce (and #4 looks like a similar stack trace; I did not try it):
My experience with #3 (testRandomOperations) is it doesn't tend to reproduce with a given seed. It is indeed quite random. And these did not reproduce for me. #5 reproduces:
|
@mikemccand thanks for replying to all these comments. I do understand that this change has an impact and I agree we should add this functionality. I just disagree with the how it's done and how much code is used. I will go an reply to some of your comments directly, in the meanwhile I went ahead to prototype some ideas in how this can be less intrusive and reuse code. I pushed one commit here master...s1monw:LUCENE-8962 to showcase what I mean. I even think we can get away without a new method on MergePolicy but that's too much for the prototype. I'd be ok with adding a setting to IWC if we can't agree on a different way. |
Thanks @s1monw! I would love if we could find a simple way to implement this feature as long as it keeps the "no wasted work" (merge either finishes in time, and is reflected in the commit point, or does not, but still runs to completion and is reflected later). I will review your prototype soon ... I'm mostly offline this weekend but will try to look soon. |
Re those test failures: I was able to fix by checking for an empty merge and not submitting it. |
thanks @mikemccand I pushed another commit to my prototype to make it almost the same as this change but with a bit more code-reuse I think. please take a look at this here master...s1monw:LUCENE-8962 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
} | ||
if (abandonedCount > 0) { | ||
config.getIndexWriterEvents().abandonedMergesOnCommit(abandonedCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK let's remove this part and leave it for another day. I'll open a separate issue.
@@ -4483,6 +4593,7 @@ public int length() { | |||
// Merge would produce a 0-doc segment, so we do nothing except commit the merge to remove all the 0-doc segments that we "merged": | |||
assert merge.info.info.maxDoc() == 0; | |||
commitMerge(merge, mergeState); | |||
success = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phew, I dit some git archaeology (thanks @msokolov for the pointers!) and uncovered the branch commit for this "merge small segments on commit" feature where we added this controversial success = true
: cab5ef5
+1 to pull the bugfix out into its own issue; I will open one.
The above commit has a dedicated test case, but the problem is that test case (in the above commit) relies on this new feature (it uses the new MergePolicy.findFullFlushMerges
). So we would need a new test case based on clean master
branch showing the bug ... it looks like a test that merged 100% deleted segments ought to then incorrectly double-call closeMergedReaders
(first with suppressExceptions = false
then again with true
) due to this missing success = true
so it really should be easy to reproduce. Though, actually I'm surprised none of our random testing uncovered this. Not sure I full understand the bug yet :) I will open an issue!
@@ -3228,15 +3268,38 @@ private long prepareCommitInternal() throws IOException { | |||
// sneak into the commit point: | |||
toCommit = segmentInfos.clone(); | |||
|
|||
if (anyChanges) { | |||
// Find any merges that can execute on commit (per MergePolicy). | |||
MergePolicy.MergeSpecification mergeSpec = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @s1monw. I'll review your new PR.
@@ -109,6 +110,9 @@ | |||
|
|||
/** Default value for whether calls to {@link IndexWriter#close()} include a commit. */ | |||
public final static boolean DEFAULT_COMMIT_ON_CLOSE = true; | |||
|
|||
/** Default value for time to wait for merges on commit (when using a {@link MergePolicy} that implements findFullFlushMerges). */ | |||
public static final double DEFAULT_MAX_COMMIT_MERGE_WAIT_SECONDS = 30.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe 0 as a default and if somebody want's to wait they can set it?
+1
The feature is already disabled by default anyways (until you implement findFullFlushMerges
in your MergePolicy
), but making this 0
by default would make it even clearer that the feature is off by default.
The success=true added above was needed in order to fix a test failure caught by @dnhatn 's new unit test (testRandomOperations), so they belong together.
Ahh thanks for the context @msokolov! However, staring at the code (maybe for not long enough!), it looks like it really ought to be a pre-existing bug, and should be unit-testable without this new feature. But I am confused why none of our random tests have tripped up on this yet. I will open a separate Jira issue for this.
OK I opened this issue to explore how to know what specifically |
@msokolov @mikemccand @msfroh I merged #1585 and updated this PR to use it. I also went ahead and removed the IndexWriterEvents interface, cut over to use long instead of double as a config value for the time to wait and set the default to 0. I will let you folks look at it again. I am happy to help further. |
|
||
|
||
if (onCommitMerges != null) { | ||
mergeScheduler.merge(mergeSource, MergeTrigger.COMMIT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the last thing that I am afraid about is what if we has a MergeScheduler configured that blocks on this call like SerialMergeScheduler? I think there are multiple options like: documentation, skipping COMMIT
merge triggers in SMS, adding a mergeAsync method to MS that has no impl in SMS... I think we should make sure that this is not trappy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be sufficient to document the behavior in the Javadoc for findFullFlushMerges
?
I was assuming that any implementation of findFullFlushMerges
would try to return merges that are very likely to complete within whatever timeout someone would reasonably set (e.g. a few seconds). The timeout was intended just as an extra safeguard in case a merge takes longer.
Given that lots of IndexWriter operations can have pauses with SerialMergeScheduler
(judging by the number of calls to maybeMerge
, especially the one from processEvents
, in IndexWriter), blocking on this particular merge
call doesn't feel like it introduces more risk (especially since it needs to be used in conjunction with a MergePolicy
that implements findFullFlushMerges
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I mean we don't have to do that and I think its rather a rare combination. My problem is that this entire configuration of max wait time is nonsense if SerialMS is used since we block until it has merged them all and potentially a bunch of other merges to a commit / refresh could take quite a long time. On the other hand, as you stated we will call maybeMerge anyway in the commit such that it's not really making any difference and the same is true for getReader so I think we are fine as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for organizing this so nicely, @s1monw -- I'll wait a bit, maybe @mikemccand has some comment, and then push
@msokolov you are more than welcome. I think it's a great example how OSS works or should work thanks for being so patient with me :) |
new MergePolicy.OneMerge(toWrap.segments) { | ||
@Override | ||
public void mergeFinished(boolean committed) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh -- I guess one minor complaint about moving this into prepareCommitInternal
is that we won't be able to reuse it (without moving it) if we decide to apply the same logic to IndexWriter.getReader()
.
That said, moving it if/when someone gets around to applying the logic there isn't a big deal. (I think the real work there is reconciling logic from StandardDirectoryReader.open() with logic in IndexWriter.prepareCommitInternal(), since the functionality is kind of duplicated.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like to move stuf once necessary I think we need to adjust it there anyway so we can move it in a followup. ok?
Co-authored-by: Michael Froh <msfroh@apache.org> Co-authored-by: Simon Willnauer <simonw@apache.org>
* upstream/master: (218 commits) LUCENE-9412 Do not validate jenkins HTTPS cert LUCENE-8962: add ability to selectively merge on commit (apache#1552) Replace DWPT.DocState with simple method parameters (apache#1594) LUCENE-9402: Let MultiCollector handle minCompetitiveScore (apache#1567) SOLR-14574: Fix or suppress warnings in solr/core/src/test (part 2) SOLR-14561 CoreAdminAPI's parameters instanceDir and dataDir are now validated (apache#1572) SOLR-14532: Add *.iml files to gitignore SOLR-14577: Return BAD REQUEST when field is missing in terms QP (apache#1588) SOLR-14574: Fix or suppress warnings in solr/core/src/test (part 1) remove debug code LUCENE-9408: roll back only called once enforcement LUCENE-8962: Allow waiting for all merges in a merge spec (apache#1585) SOLR-14572 document missing SearchComponents (apache#1581) LUCENE-9359: Avoid test failures when the extra file is a dir. SOLR-14573: Fix or suppress warnings in solrj/src/test LUCENE-9353: Move terms metadata to its own file. (apache#1473) Cleanup TermsHashPerField (apache#1573) SOLR-14558: Record all log lines in SolrLogPostTool (apache#1570) LUCENE-9404: simplify checksum calculation of ByteBuffersIndexOutput LUCENE-9403: tune BufferedChecksum.DEFAULT_BUFFERSIZE ...
LUCENE-8962: This PR revisits the merge-on-commit patch submitted by @msfroh a little while ago. The only change from that earlier PR is a fix for failures uncovered by TestIndexWriter.testRandomOperations, some whitespace cleanups, and a rebase on the current master branch. The problem was that updateSegmentInfosOnMergeFinish would incorrectly decRef a merged segments' files if that segment was modified by deletions (or updates) while it was being merged.
With this fix, I ran the failing test case several thousands of times with no failures, whereas before it would routinely fail after a few hundred test runs.