Move to use serial merge schedule by default #5447

kimchy · 2014-03-17T17:23:44Z

Today, we use ConcurrentMergeScheduler, and this can be painful since it is concurrent on a shard level, with a max of 3 threads doing concurrent merges. If there are several shards being indexed, then there will be a minor explosion of threads trying to do merges, all being throttled by our merge throttling.
Moving to serial merge scheduler will still maintain concurrency of merges across shards, as we have the merge thread pool that schedules those merges. It will just be a serial one on a specific shard.
Also, on serial merge scheduler, we now have a limit of how many merges it will do at one go, so it will let other shards get their fair chance of merging. We use the pending merges on IW to check if merges are needed or not for it.
Note, that if a merge is happening, it will not block due to a sync on the maybeMerge call at indexing (flush) time, since we wrap our merge scheduler with the EnabledMergeScheduler, where maybeMerge is not activated during indexing, only with explicit calls to IW#maybeMerge (see Merges).

Today, we use ConcurrentMergeScheduler, and this can be painful since it is concurrent on a shard level, with a max of 3 threads doing concurrent merges. If there are several shards being indexed, then there will be a minor explosion of threads trying to do merges, all being throttled by our merge throttling. Moving to serial merge scheduler will still maintain concurrency of merges across shards, as we have the merge thread pool that schedules those merges. It will just be a serial one on a specific shard. Also, on serial merge scheduler, we now have a limit of how many merges it will do at one go, so it will let other shards get their fair chance of merging. We use the pending merges on IW to check if merges are needed or not for it. Note, that if a merge is happening, it will not block due to a sync on the maybeMerge call at indexing (flush) time, since we wrap our merge scheduler with the EnabledMergeScheduler, where maybeMerge is not activated during indexing, only with explicit calls to IW#maybeMerge (see Merges).

s1monw · 2014-03-17T20:17:31Z

src/main/java/org/apache/lucene/index/TrackingSerialMergeScheduler.java

@@ -46,8 +46,11 @@
    private final Set<OnGoingMerge> onGoingMerges = ConcurrentCollections.newConcurrentSet();
    private final Set<OnGoingMerge> readOnlyOnGoingMerges = Collections.unmodifiableSet(onGoingMerges);

-    public TrackingSerialMergeScheduler(ESLogger logger) {
+    private final int maxMergeCycles;


I don't like the the name to be honest what about merge_batch_size, max_merge_at_once, break_after_merges?

I like max_merge_at_once, will change

s1monw · 2014-03-17T20:18:28Z

I like the change the only thing that we need to figure out is the nameing. I also think we need a unittest for this.

- renamed the variable name - added a unit test for the serial merge scheduler limiting

kimchy · 2014-03-17T22:36:51Z

renamed + added unit test.

s1monw · 2014-03-18T10:39:20Z

LGTM

Today, we use ConcurrentMergeScheduler, and this can be painful since it is concurrent on a shard level, with a max of 3 threads doing concurrent merges. If there are several shards being indexed, then there will be a minor explosion of threads trying to do merges, all being throttled by our merge throttling. Moving to serial merge scheduler will still maintain concurrency of merges across shards, as we have the merge thread pool that schedules those merges. It will just be a serial one on a specific shard. Also, on serial merge scheduler, we now have a limit of how many merges it will do at one go, so it will let other shards get their fair chance of merging. We use the pending merges on IW to check if merges are needed or not for it. Note, that if a merge is happening, it will not block due to a sync on the maybeMerge call at indexing (flush) time, since we wrap our merge scheduler with the EnabledMergeScheduler, where maybeMerge is not activated during indexing, only with explicit calls to IW#maybeMerge (see Merges). closes #5447

s1monw reviewed Mar 17, 2014
View reviewed changes

first round of review

5df09f5

- renamed the variable name - added a unit test for the serial merge scheduler limiting

kimchy closed this in 0ef3b03 Mar 18, 2014

kimchy deleted the serial_merge_scheduler branch March 18, 2014 12:32

spinscale added v1.1.0 labels Mar 19, 2014

clintongormley added the enhancement label Mar 21, 2014

clintongormley added the :Core/Infra/Core Core issues without another label label Jun 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move to use serial merge schedule by default #5447

Move to use serial merge schedule by default #5447

kimchy commented Mar 17, 2014

s1monw Mar 17, 2014

kimchy Mar 17, 2014

s1monw commented Mar 17, 2014

kimchy commented Mar 17, 2014

s1monw commented Mar 18, 2014

Move to use serial merge schedule by default #5447

Move to use serial merge schedule by default #5447

Conversation

kimchy commented Mar 17, 2014

s1monw Mar 17, 2014

Choose a reason for hiding this comment

kimchy Mar 17, 2014

Choose a reason for hiding this comment

s1monw commented Mar 17, 2014

kimchy commented Mar 17, 2014

s1monw commented Mar 18, 2014