New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An inactive shard is activated by triggered synced flush #13802
Conversation
…ced flush When a shard becomes in active we trigger a sync flush in order to speed up future recoveries. The sync flush causes a new translog generation to be made, which in turn confuses the IndexingMemoryController making it think that the shard is active. If no documents comes along in the next 5m, the shard is made inactive again , triggering a sync flush and so forth. To avoid this, the IndexingMemoryController is changed to ignore empty translogs when checking if a shard became active. This comes with the price of potentially missing indexing operations which are followed by a flush. This is acceptable as if no more index operation come in, it's OK to leave the shard in active. A new unit test is introduced and comparable integration tests are removed.
@@ -144,7 +151,7 @@ public IndexingMemoryController(Settings settings, ThreadPool threadPool, Indice | |||
translogBuffer = maxTranslogBuffer; | |||
} | |||
} else { | |||
translogBuffer = ByteSizeValue.parseBytesSizeValue(translogBufferSetting, null); | |||
translogBuffer = ByteSizeValue.parseBytesSizeValue(translogBufferSetting, TRANSLOG_BUFFER_SIZE_SETTING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch.
This was a great catch @bleskes, and the cutover from IT to unit test is wonderful. I left a bunch of small comments, but otherwise LGTM. I still find that massive if statement in |
@mikemccand thx for the review. I pushed another round. Tried to simplify the 'big if' |
// index into both shards, move the clock and see that they are still active | ||
controller.setTranslog(shard1, randomInt(2), randomInt(2) + 1); | ||
controller.setTranslog(shard2, randomInt(2) + 1, randomInt(2)); | ||
// nthe controller doesn't know when the ops happened, so even if this is more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nthe -> the
LGTM, thanks @bleskes! |
push this to master. will give CI some time before pushing it all the way back to 1.7... |
I was thinking about this some more (morning run, he he) - I think we should fold all the decisions and state about being active or not into the indexshard. Then it will be easy to reset on every index operation (immediately). We also already have the notion of an inactivity listener - we can extend it to have an active event as well. We can also add an checkIfInactive method on IndexShard which check if there was any indexing ops (we already have stats) since the last time it was checked. To avoid making IndexShard more complex, we can push this functionality ShardIndexingService - which is already in charge of stats. |
+1, this would be cleaner. E.g. I don't like how I can try to tackle this.
Or, why not absorb |
I personally don't mind folding it into indexshard though having a dedicate stats/active component has benefit (clearer testing suite) . I know Simon has an opinion on this. On 27 sep. 2015 4:39 PM +0200, Michael McCandlessnotifications@github.com, wrote:
|
When a shard becomes in active we trigger a sync flush in order to speed up future recoveries. The sync flush causes a new translog generation to be made, which in turn confuses the IndexingMemoryController making it think that the shard is active. If no documents comes along in the next 5m, the shard is made inactive again , triggering a sync flush and so forth. To avoid this, the IndexingMemoryController is changed to ignore empty translogs when checking if a shard became active. This comes with the price of potentially missing indexing operations which are followed by a flush. This is acceptable as if no more index operation come in, it's OK to leave the shard in active. A new unit test is introduced and comparable integration tests are removed. Closes #13802
When a shard becomes in active we trigger a sync flush in order to speed up future recoveries. The sync flush causes a new translog generation to be made, which in turn confuses the IndexingMemoryController making it think that the shard is active. If no documents comes along in the next 5m, the shard is made inactive again , triggering a sync flush and so forth. To avoid this, the IndexingMemoryController is changed to ignore empty translogs when checking if a shard became active. This comes with the price of potentially missing indexing operations which are followed by a flush. This is acceptable as if no more index operation come in, it's OK to leave the shard in active. A new unit test is introduced and comparable integration tests are removed. Closes #13802
When a shard becomes in active we trigger a sync flush in order to speed up future recoveries. The sync flush causes a new translog generation to be made, which in turn confuses the IndexingMemoryController making it think that the shard is active. If no documents comes along in the next 5m, the shard is made inactive again , triggering a sync flush and so forth. To avoid this, the IndexingMemoryController is changed to ignore empty translogs when checking if a shard became active. This comes with the price of potentially missing indexing operations which are followed by a flush. This is acceptable as if no more index operation come in, it's OK to leave the shard in active. A new unit test is introduced and comparable integration tests are removed. Relates elastic#13802 Includes a backport of elastic#13784
When a shard becomes in active we trigger a sync flush in order to speed up future recoveries. The sync flush causes a new translog generation to be made, which in turn confuses the IndexingMemoryController making it think that the shard is active. If no documents comes along in the next 5m, the shard is made inactive again , triggering a sync flush and so forth. To avoid this, the IndexingMemoryController is changed to ignore empty translogs when checking if a shard became active. This comes with the price of potentially missing indexing operations which are followed by a flush. This is acceptable as if no more index operation come in, it's OK to leave the shard in active. A new unit test is introduced and comparable integration tests are removed. Relates elastic#13802 Includes a backport of elastic#13784 Closes elastic#13824
When a shard becomes inactive we trigger a sync flush in order to speed up future recoveries. The sync flush causes a new translog generation to be made, which in turn confuses the IndexingMemoryController making it think that the shard is active. If no documents come along in the next 5m, the shard is made inactive again , triggering a sync flush and so forth.
To avoid this, the IndexingMemoryController is changed to ignore empty translogs when checking if a shard became active. This comes with the price of potentially missing indexing operations which are followed by a flush. This is acceptable as if no more index operation come in, it's OK to leave the shard inactive.
A new unit test is introduced and comparable integration tests are removed.
I think we need to push this all the way to 1.7.3 ...