New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Engine: close snapshots before recovery counter #9760
Conversation
When we clean up after recoveries, we currently close the recovery counter first, followed up by the different snapshots. Since the recovery counter may issue a flush (introduced in elastic#9439) , the snapshot references prevent the flush from cleaning up the current translog file. This commit changes the order of the close command. Note: this is not a problem on master, as we moved the translog delete logic, making it kick in if needed when the ref counter goes to 0.
can we do next to the code the reason for the importance of close ordering? |
@kimchy comments added. |
phase2Snapshot, phase3Snapshot); // hmm why can't we use try-with here? | ||
// close the snapshots first to release the reference to the translog file, so a flush post recovery can delete it | ||
Releasables.close(success, phase1Snapshot, phase2Snapshot, phase3Snapshot, | ||
onGoingRecoveries, writeLock); // hmm why can't we use try-with here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can do try-with but we need to have 2 try blocks. since the write lock needs to be released last. but in a try / with blokc it's released before the finally block is executed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it in-fact might make sense to have the write lock in an outer try catch to ensure it's released last?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with try with resources clause is that we release the phase1 and phase2 snapshots which are defined earlier in the scope. I can make the write lock use a try-with but this requires another catch to convert the exception the close method throws into a RecoveryEngineException. I'm not sure it's worth it.
left a minor comment |
@s1monw ping? |
ok fair enough LGTM then |
When we clean up after recoveries, we currently close the recovery counter first, followed up by the different snapshots. Since the recovery counter may issue a flush (introduced in elastic#9439) , the snapshot references prevent the flush from cleaning up the current translog file. This commit changes the order of the close command. Note: this is not a problem on master, as we moved the translog delete logic, making it kick in if needed when the ref counter goes to 0. Closes elastic#9760
When we clean up after recoveries, we currently close the recovery counter first, followed up by the different snapshots. Since the recovery counter may issue a flush (introduced in elastic#9439) , the snapshot references prevent the flush from cleaning up the current translog file. This commit changes the order of the close command. Note: this is not a problem on master, as we moved the translog delete logic, making it kick in if needed when the ref counter goes to 0. Closes elastic#9760
When we clean up after recoveries, we currently close the recovery counter first, followed up by the different snapshots. Since the recovery counter may issue a flush (introduced in elastic#9439) , the snapshot references prevent the flush from cleaning up the current translog file. This commit changes the order of the close command. Note: this is not a problem on master, as we moved the translog delete logic, making it kick in if needed when the ref counter goes to 0. Closes elastic#9760
When we clean up after recoveries, we currently close the recovery counter first, followed up by the different snapshots. Since the recovery counter may issue a flush (introduced in #9439) , the snapshot references prevent the flush from deleting the current translog file once it's no longer needed.
Note: this is not a problem on master, as we moved the translog delete logic, making it kick in if needed
when the ref counter goes to 0.
relates to #9226 (comment)