Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine: close snapshots before recovery counter #9760

Closed
wants to merge 2 commits into from

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Feb 19, 2015

When we clean up after recoveries, we currently close the recovery counter first, followed up by the different snapshots. Since the recovery counter may issue a flush (introduced in #9439) , the snapshot references prevent the flush from deleting the current translog file once it's no longer needed.

Note: this is not a problem on master, as we moved the translog delete logic, making it kick in if needed
when the ref counter goes to 0.

relates to #9226 (comment)

When we clean up after recoveries, we currently close the recovery counter first, followed up by the different snapshots. Since the recovery counter may issue a flush (introduced in elastic#9439) , the snapshot references prevent the flush from cleaning up the current translog file. This commit changes the order of the close command.

Note: this is not a problem on master, as we moved the translog delete logic, making it kick in if needed when the ref counter goes to 0.
@kimchy
Copy link
Member

kimchy commented Feb 19, 2015

can we do next to the code the reason for the importance of close ordering?

@bleskes
Copy link
Contributor Author

bleskes commented Feb 19, 2015

@kimchy comments added.

phase2Snapshot, phase3Snapshot); // hmm why can't we use try-with here?
// close the snapshots first to release the reference to the translog file, so a flush post recovery can delete it
Releasables.close(success, phase1Snapshot, phase2Snapshot, phase3Snapshot,
onGoingRecoveries, writeLock); // hmm why can't we use try-with here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can do try-with but we need to have 2 try blocks. since the write lock needs to be released last. but in a try / with blokc it's released before the finally block is executed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it in-fact might make sense to have the write lock in an outer try catch to ensure it's released last?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with try with resources clause is that we release the phase1 and phase2 snapshots which are defined earlier in the scope. I can make the write lock use a try-with but this requires another catch to convert the exception the close method throws into a RecoveryEngineException. I'm not sure it's worth it.

@s1monw
Copy link
Contributor

s1monw commented Feb 24, 2015

left a minor comment

@bleskes
Copy link
Contributor Author

bleskes commented Feb 27, 2015

@s1monw ping?

@s1monw
Copy link
Contributor

s1monw commented Feb 27, 2015

ok fair enough LGTM then

bleskes added a commit to bleskes/elasticsearch that referenced this pull request Feb 27, 2015
When we clean up after recoveries, we currently close the recovery counter first, followed up by the different snapshots. Since the recovery counter may issue a flush (introduced in elastic#9439) , the snapshot references prevent the flush from cleaning up the current translog file. This commit changes the order of the close command.

Note: this is not a problem on master, as we moved the translog delete logic, making it kick in if needed when the ref counter goes to 0.

Closes elastic#9760
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Feb 27, 2015
When we clean up after recoveries, we currently close the recovery counter first, followed up by the different snapshots. Since the recovery counter may issue a flush (introduced in elastic#9439) , the snapshot references prevent the flush from cleaning up the current translog file. This commit changes the order of the close command.

Note: this is not a problem on master, as we moved the translog delete logic, making it kick in if needed when the ref counter goes to 0.

Closes elastic#9760
@bleskes bleskes closed this in 0f1c779 Feb 27, 2015
@bleskes bleskes deleted the leftover_translog branch February 27, 2015 19:25
@clintongormley clintongormley added :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. and removed review labels Mar 19, 2015
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
When we clean up after recoveries, we currently close the recovery counter first, followed up by the different snapshots. Since the recovery counter may issue a flush (introduced in elastic#9439) , the snapshot references prevent the flush from cleaning up the current translog file. This commit changes the order of the close command.

Note: this is not a problem on master, as we moved the translog delete logic, making it kick in if needed when the ref counter goes to 0.

Closes elastic#9760
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v1.4.5 v1.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants