New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor RecoveryTarget state management #8092
Conversation
At the moment, we leave around temporary files if a peer (replica) recovery is canceled. Those files will normally be cleaned up once the shard is started else but in case of errors this can lead to trouble. If recovery are started and canceled often, we may cause nodes to run out of disk space. Closes #7893
…ere cleaned These are now background processes..
import org.elasticsearch.index.shard.service.InternalIndexShard; | ||
|
||
import java.util.Map; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we get a javadoc string what this class does?
private void closeInternal() { | ||
try { | ||
// clean open index outputs | ||
Iterator<Entry<String, IndexOutput>> iterator = openIndexOutputs.entrySet().iterator(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just do:
IOUtils.closeWhileHandlingException(openIndexOutputs.values());
openIndexOutputs.clear();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's strictly speaking thread unsafe. I think we can do this here in practice, but part of the refactoring was not to have to worry about these kind of things. I think we should keep it as is?
I left a bunch of comments - I really like that change btw :) 👍 |
@s1monw I pushed some commits to address your comments. Can we do another round? |
LGTM |
recoveryState.setSourceNode(sourceNode); | ||
recoveryState.setTargetNode(clusterService.localNode()); | ||
recoveryState.setPrimary(indexShard.routingEntry().primary()); | ||
final long recoveryId = onGoingRecoveries.startRecovery(indexShard, sourceNode, recoveryState, listener); | ||
threadPool.generic().execute(new Runnable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think we should use AbstractRunnable here
threadPool.generic().execute(new AbstractRunnable() {
@Override
public void onFailure(Throwable t) {
// call listener
}
@Override
protected void doRun() throws Exception {
RecoveriesCollection.StatusRef statusRef = onGoingRecoveries.getStatus(recoveryId);
if (statusRef == null) {
return;
}
try {
doRecovery(statusRef.status());
} finally {
// make sure we never interrupt the thread after we have released it back to the pool
statusRef.status().clearWaitingRecoveryThread(Thread.currentThread());
statusRef.close();
}
}
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
@s1monw I pushed another commit based on the latest feedback. I'm giving CI an hour or two to chew this and will then merge. |
This commit rewrites the state controls in the RecoveryTarget family classes to make it easier to guarantee that: - recovery resources are only cleared once there are no ongoing requests - recovery is automatically canceled when the target shard is closed/removed - canceled recoveries do not leave temp files behind when canceled. Highlights of the change: 1) All temporary files are cleared upon failure/cancel (see #7315 ) 2) All newly created files are always temporary 3) Doesn't list local files on the cluster state update thread (which throw unwanted exception) 4) Recoveries are canceled by a listener to IndicesLifecycle.beforeIndexShardClosed, so we don't need to explicitly call it. 5) Simplifies RecoveryListener to only notify when a recovery is done or failed. Removed subtleties like ignore and retry (they are dealt with internally) Closes #8092 , Closes #7315
merged. Thx @s1monw ! |
we current check that the recovery is not finished when people access the status local variables. This is wrong and we should check for the refcount being > 0 as it is OK to use the status after it has marked as finished but there are still on going, in-flight reference to it. Relates to #8092 Closes #8271
we current check that the recovery is not finished when people access the status local variables. This is wrong and we should check for the refcount being > 0 as it is OK to use the status after it has marked as finished but there are still on going, in-flight reference to it. Relates to #8092 Closes #8271
The PR rewrites the state controls in the RecoveryTarget family classes to make it easier to guarantee that:
Highlights of the change:
Relates to #7893