Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexingMemoryController should only update buffer settings of fully recovered shards #6667

Closed
wants to merge 3 commits into from

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Jul 1, 2014

At the moment the IndexingMemoryController can try to update the index buffer memory of shards at any give moment. This update involves a flush, which may cause a FlushNotAllowedEngineException to be thrown in a concurrently finalizing recovery.

Closes #6642

…red shards

At the moment the IndexingMemoryController can try to update the index buffer memory of shards at any give moment. This update involves a flush, which may cause a FlushNotAllowedEngineException to be thrown in a concurrently finalizing recovery.

Closes elastic#6642
@@ -314,6 +310,11 @@ public TimeValue defaultRefreshInterval() {
return new TimeValue(1, TimeUnit.SECONDS);
}


public ByteSizeValue indexingBufferSize() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is a test only setting please make it pkg private and move the test if necesary. If not we should add javadocs and pull it up to the interface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making it package private is tricky because the test that needs it lives in another package indeed, and I think it is in the right place. On the other hand this feels like an internal/implementation detail method and I don't think it should go on the interface. We call the class an InternalEngine so I think it's OK to have public methods which are not part of the official interface of it.

@s1monw
Copy link
Contributor

s1monw commented Jul 2, 2014

left a bunch of comments.... the functionality makes lots of sense IMO

@s1monw s1monw removed the review label Jul 2, 2014
@bleskes bleskes added the review label Jul 2, 2014
* returns the current budget for the total amount of indexing buffers of
* active shards on this node
*/
public ByteSizeValue indexingBufferSize() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be package private?

@s1monw
Copy link
Contributor

s1monw commented Jul 2, 2014

2 minor nitpicks -- LGTM

@s1monw s1monw removed the review label Jul 2, 2014
@bleskes bleskes closed this in 7119ffa Jul 2, 2014
bleskes added a commit that referenced this pull request Jul 2, 2014
…red shards

At the moment the IndexingMemoryController can try to update the index buffer memory of shards at any give moment. This update involves a flush, which may cause a FlushNotAllowedEngineException to be thrown in a concurrently finalizing recovery.

Closes #6642, closes #6667
@bleskes bleskes deleted the memory_update_active_shards branch July 2, 2014 10:25
@bleskes bleskes changed the title IndexingMemoryController should only update buffer settings of fully recovered shards [Infrastructure] IndexingMemoryController should only update buffer settings of fully recovered shards Jul 2, 2014
@clintongormley clintongormley changed the title [Infrastructure] IndexingMemoryController should only update buffer settings of fully recovered shards Internal: IndexingMemoryController should only update buffer settings of fully recovered shards Jul 16, 2014
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Mar 10, 2015
…g version map size.

To support real time gets, the engine keeps an in-memory map of recently index docs and their location in the translog. This is needed until documents become visible in Lucene. With 1.3.0, we have improved this map and made tightly integrated with refresh cycles in Lucene in order to keep the memory signature to a bare minimum. On top of that, if the version map grows above a 25% of the index buffer size, we proactively refresh in order to be able to trim the version map back to 0 (see elastic#6363) . In the same release, we have fixed an issue where an update to the indexing buffer could result in an unwanted exception during recovery (elastic#6667) . We solved this by waiting with updating the size of the index buffer until the shard was fully recovered. Sadly this two together can have a negative impact on the speed of translog recovery.

During the second phase of recovery we replay all operations that happened on the shard during the first phase of copying files. In parallel we start indexing new documents into the new created shard. At some point (phase 3 in the recovery), the translog replay starts to send operation which have already been indexed into the shard. The version map is crucial in being able to quickly detect this and skip the replayed operations, without hitting lucene. Sadly elastic#6667 (only updating the index memory buffer once shard is started) means that a shard is using the default 64MB for it's index buffer, and thus only 16MB (25%) for the version map. This much less then the default index buffer size 10% of machine memory (shared between shards).

Since we don't flush anymore when updating the memory buffer, we can remove elastic#6667 and update recovering shards as well. Also, we make the version map max size configurable, with the same default of 25% of the current index buffer.
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Mar 15, 2015
…g version map size.

To support real time gets, the engine keeps an in-memory map of recently index docs and their location in the translog. This is needed until documents become visible in Lucene. With 1.3.0, we have improved this map and made tightly integrated with refresh cycles in Lucene in order to keep the memory signature to a bare minimum. On top of that, if the version map grows above a 25% of the index buffer size, we proactively refresh in order to be able to trim the version map back to 0 (see elastic#6363) . In the same release, we have fixed an issue where an update to the indexing buffer could result in an unwanted exception during recovery (elastic#6667) . We solved this by waiting with updating the size of the index buffer until the shard was fully recovered. Sadly this two together can have a negative impact on the speed of translog recovery.

During the second phase of recovery we replay all operations that happened on the shard during the first phase of copying files. In parallel we start indexing new documents into the new created shard. At some point (phase 3 in the recovery), the translog replay starts to send operation which have already been indexed into the shard. The version map is crucial in being able to quickly detect this and skip the replayed operations, without hitting lucene. Sadly elastic#6667 (only updating the index memory buffer once shard is started) means that a shard is using the default 64MB for it's index buffer, and thus only 16MB (25%) for the version map. This much less then the default index buffer size 10% of machine memory (shared between shards).

Since we don't flush anymore when updating the memory buffer, we can remove elastic#6667 and update recovering shards as well. Also, we make the version map max size configurable, with the same default of 25% of the current index buffer.

Closes elastic#10046
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Mar 15, 2015
…g version map size.

To support real time gets, the engine keeps an in-memory map of recently index docs and their location in the translog. This is needed until documents become visible in Lucene. With 1.3.0, we have improved this map and made tightly integrated with refresh cycles in Lucene in order to keep the memory signature to a bare minimum. On top of that, if the version map grows above a 25% of the index buffer size, we proactively refresh in order to be able to trim the version map back to 0 (see elastic#6363) . In the same release, we have fixed an issue where an update to the indexing buffer could result in an unwanted exception during recovery (elastic#6667) . We solved this by waiting with updating the size of the index buffer until the shard was fully recovered. Sadly this two together can have a negative impact on the speed of translog recovery.

During the second phase of recovery we replay all operations that happened on the shard during the first phase of copying files. In parallel we start indexing new documents into the new created shard. At some point (phase 3 in the recovery), the translog replay starts to send operation which have already been indexed into the shard. The version map is crucial in being able to quickly detect this and skip the replayed operations, without hitting lucene. Sadly elastic#6667 (only updating the index memory buffer once shard is started) means that a shard is using the default 64MB for it's index buffer, and thus only 16MB (25%) for the version map. This much less then the default index buffer size 10% of machine memory (shared between shards).

Since we don't flush anymore when updating the memory buffer, we can remove elastic#6667 and update recovering shards as well. Also, we make the version map max size configurable, with the same default of 25% of the current index buffer.

Closes elastic#10046
@clintongormley clintongormley changed the title Internal: IndexingMemoryController should only update buffer settings of fully recovered shards IndexingMemoryController should only update buffer settings of fully recovered shards Jun 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed to start shard when restarting elasticsearch
3 participants