Alternative approach to LUCENE-8962 #1576

s1monw · 2020-06-13T11:17:26Z

this is just a prototype - only intended for discussion

s1monw · 2020-06-13T11:18:45Z

see #1552 for reference

dsmiley · 2020-06-15T16:46:11Z

lucene/core/src/java/org/apache/lucene/index/MergePolicy.java

+
+    boolean await(long timeout, TimeUnit unit) {
+      for (OneMerge merge : merges) {
+        if (merge.await(timeout, unit) == false) {


This looks suspicious when there is more than one merge. Shouldn't the timeout decrease as time is used up by earlier merges?

In practice, when is there more than one? I've been confused on this matter when I developed a custom MP/MS.

in a real change that's correct. in a prototype as this is it's really just there to visualize the idea. I didn't do this on purpose to not discuss impl details. that's not the point of this it's really just a PR to make commenting simpler.

I fixed this in the followup PR #1585

msfroh · 2020-06-16T17:19:09Z

This approach makes sense to me.

I like how much simpler the addition of MergeSpecification.await() makes things, versus the CountDownLatch hackery of the previous approach. Also, updatePendingMerges returning the MergeSpecification is much cleaner than explicitly computing a merge from within prepareCommitInternal.

s1monw · 2020-06-16T20:34:52Z

@msfroh I opened #1585 to make it easier to do this.

s1monw · 2020-06-17T14:23:23Z

superseded by #1585

dsmiley · 2020-06-21T23:20:39Z

lucene/core/src/java/org/apache/lucene/index/IndexWriter.java

        final int numMerges = spec.merges.size();
        for(int i=0;i<numMerges;i++) {
          final MergePolicy.OneMerge merge = spec.merges.get(i);
          merge.maxNumSegments = maxNumSegments;
        }
      }
    } else {
-      spec = mergePolicy.findMerges(trigger, segmentInfos, this);
+      switch (trigger) {
+        case COMMIT:


There is an inconsistency here that suggests something is wrong, or at least confusing enough to deserve a comment. For case COMMIT, we call a findFullFlushMerges. Shouldn't it be on case FULL_FLUSH to be consistent with the method we are calling? Or should findFullFlushMerges be called findCommitMerges?

Full flush happens for refresh and commit.

But we have only implemented "merge on commit" so far.

I would love to also add "merge on refresh", but until we do so, I think it's correct for IndexWriter to separate out the COMMIT case here like this.

alternative approach to LUCENE-8962

3f1e6fa

s1monw changed the title ~~lternative approach to LUCENE-8962~~ Alternative approach to LUCENE-8962 Jun 13, 2020

s1monw marked this pull request as draft June 13, 2020 11:19

mikemccand mentioned this pull request Jun 15, 2020

LUCENE-8962: merge small segments on commit #1552

Merged

dsmiley reviewed Jun 15, 2020

View reviewed changes

s1monw closed this Jun 17, 2020

dsmiley reviewed Jun 21, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative approach to LUCENE-8962 #1576

Alternative approach to LUCENE-8962 #1576

s1monw commented Jun 13, 2020 •

edited

s1monw commented Jun 13, 2020

dsmiley Jun 15, 2020

s1monw Jun 15, 2020

s1monw Jun 16, 2020

msfroh commented Jun 16, 2020

s1monw commented Jun 16, 2020

s1monw commented Jun 17, 2020

dsmiley Jun 21, 2020

mikemccand Jun 22, 2020

Alternative approach to LUCENE-8962 #1576

Alternative approach to LUCENE-8962 #1576

Conversation

s1monw commented Jun 13, 2020 • edited

s1monw commented Jun 13, 2020

dsmiley Jun 15, 2020

Choose a reason for hiding this comment

s1monw Jun 15, 2020

Choose a reason for hiding this comment

s1monw Jun 16, 2020

Choose a reason for hiding this comment

msfroh commented Jun 16, 2020

s1monw commented Jun 16, 2020

s1monw commented Jun 17, 2020

dsmiley Jun 21, 2020

Choose a reason for hiding this comment

mikemccand Jun 22, 2020

Choose a reason for hiding this comment

s1monw commented Jun 13, 2020 •

edited