Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to use serial merge schedule by default #5447

Closed
wants to merge 2 commits into from

Conversation

kimchy
Copy link
Member

@kimchy kimchy commented Mar 17, 2014

Today, we use ConcurrentMergeScheduler, and this can be painful since it is concurrent on a shard level, with a max of 3 threads doing concurrent merges. If there are several shards being indexed, then there will be a minor explosion of threads trying to do merges, all being throttled by our merge throttling.
Moving to serial merge scheduler will still maintain concurrency of merges across shards, as we have the merge thread pool that schedules those merges. It will just be a serial one on a specific shard.
Also, on serial merge scheduler, we now have a limit of how many merges it will do at one go, so it will let other shards get their fair chance of merging. We use the pending merges on IW to check if merges are needed or not for it.
Note, that if a merge is happening, it will not block due to a sync on the maybeMerge call at indexing (flush) time, since we wrap our merge scheduler with the EnabledMergeScheduler, where maybeMerge is not activated during indexing, only with explicit calls to IW#maybeMerge (see Merges).

Today, we use ConcurrentMergeScheduler, and this can be painful since it is concurrent on a shard level, with a max of 3 threads doing concurrent merges. If there are several shards being indexed, then there will be a minor explosion of threads trying to do merges, all being throttled by our merge throttling.
Moving to serial merge scheduler will still maintain concurrency of merges across shards, as we have the merge thread pool that schedules those merges. It will just be a serial one on a specific shard.
Also, on serial merge scheduler, we now have a limit of how many merges it will do at one go, so it will let other shards get their fair chance of merging. We use the pending merges on IW to check if merges are needed or not for it.
Note, that if a merge is happening, it will not block due to a sync on the maybeMerge call at indexing (flush) time, since we wrap our merge scheduler with the EnabledMergeScheduler, where maybeMerge is not activated during indexing, only with explicit calls to IW#maybeMerge (see Merges).
@@ -46,8 +46,11 @@
private final Set<OnGoingMerge> onGoingMerges = ConcurrentCollections.newConcurrentSet();
private final Set<OnGoingMerge> readOnlyOnGoingMerges = Collections.unmodifiableSet(onGoingMerges);

public TrackingSerialMergeScheduler(ESLogger logger) {
private final int maxMergeCycles;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the the name to be honest what about merge_batch_size, max_merge_at_once, break_after_merges?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like max_merge_at_once, will change

@s1monw
Copy link
Contributor

s1monw commented Mar 17, 2014

I like the change the only thing that we need to figure out is the nameing. I also think we need a unittest for this.

- renamed the variable name
- added a unit test for the serial merge scheduler limiting
@kimchy
Copy link
Member Author

kimchy commented Mar 17, 2014

renamed + added unit test.

@s1monw
Copy link
Contributor

s1monw commented Mar 18, 2014

LGTM

@kimchy kimchy closed this in 0ef3b03 Mar 18, 2014
kimchy added a commit that referenced this pull request Mar 18, 2014
Today, we use ConcurrentMergeScheduler, and this can be painful since it is concurrent on a shard level, with a max of 3 threads doing concurrent merges. If there are several shards being indexed, then there will be a minor explosion of threads trying to do merges, all being throttled by our merge throttling.
Moving to serial merge scheduler will still maintain concurrency of merges across shards, as we have the merge thread pool that schedules those merges. It will just be a serial one on a specific shard.
Also, on serial merge scheduler, we now have a limit of how many merges it will do at one go, so it will let other shards get their fair chance of merging. We use the pending merges on IW to check if merges are needed or not for it.
Note, that if a merge is happening, it will not block due to a sync on the maybeMerge call at indexing (flush) time, since we wrap our merge scheduler with the EnabledMergeScheduler, where maybeMerge is not activated during indexing, only with explicit calls to IW#maybeMerge (see Merges).
closes #5447
@kimchy kimchy deleted the serial_merge_scheduler branch March 18, 2014 12:32
@clintongormley clintongormley added the :Core/Infra/Core Core issues without another label label Jun 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants