Engine: close snapshots before recovery counter #9760

bleskes · 2015-02-19T12:25:42Z

When we clean up after recoveries, we currently close the recovery counter first, followed up by the different snapshots. Since the recovery counter may issue a flush (introduced in #9439) , the snapshot references prevent the flush from deleting the current translog file once it's no longer needed.

Note: this is not a problem on master, as we moved the translog delete logic, making it kick in if needed
when the ref counter goes to 0.

relates to #9226 (comment)

When we clean up after recoveries, we currently close the recovery counter first, followed up by the different snapshots. Since the recovery counter may issue a flush (introduced in elastic#9439) , the snapshot references prevent the flush from cleaning up the current translog file. This commit changes the order of the close command. Note: this is not a problem on master, as we moved the translog delete logic, making it kick in if needed when the ref counter goes to 0.

kimchy · 2015-02-19T12:36:43Z

can we do next to the code the reason for the importance of close ordering?

bleskes · 2015-02-19T12:45:10Z

@kimchy comments added.

s1monw · 2015-02-24T08:51:38Z

src/main/java/org/elasticsearch/index/engine/InternalEngine.java

-                    phase2Snapshot, phase3Snapshot); // hmm why can't we use try-with here?
+            // close the snapshots first to release the reference to the translog file, so a flush post recovery can delete it
+            Releasables.close(success, phase1Snapshot, phase2Snapshot, phase3Snapshot,
+                    onGoingRecoveries, writeLock); // hmm why can't we use try-with here?


we can do try-with but we need to have 2 try blocks. since the write lock needs to be released last. but in a try / with blokc it's released before the finally block is executed

it in-fact might make sense to have the write lock in an outer try catch to ensure it's released last?

The problem with try with resources clause is that we release the phase1 and phase2 snapshots which are defined earlier in the scope. I can make the write lock use a try-with but this requires another catch to convert the exception the close method throws into a RecoveryEngineException. I'm not sure it's worth it.

s1monw · 2015-02-24T08:53:37Z

left a minor comment

bleskes · 2015-02-27T14:17:11Z

@s1monw ping?

s1monw · 2015-02-27T16:24:38Z

ok fair enough LGTM then

When we clean up after recoveries, we currently close the recovery counter first, followed up by the different snapshots. Since the recovery counter may issue a flush (introduced in elastic#9439) , the snapshot references prevent the flush from cleaning up the current translog file. This commit changes the order of the close command. Note: this is not a problem on master, as we moved the translog delete logic, making it kick in if needed when the ref counter goes to 0. Closes elastic#9760

bleskes added v1.5.0 v1.4.4 >bug v1.4.5 and removed v1.4.4 labels Feb 19, 2015

add comments

1bac4ea

bleskes mentioned this pull request Feb 19, 2015

Shard stuck in relocating state with recovery stage=translog #9226

Closed

bleskes added the review label Feb 19, 2015

cbuescher mentioned this pull request Feb 20, 2015

date_histogram issue when using "pre_zone_adjust_large_interval" and a timezone with DST #9491

Closed

s1monw reviewed Feb 24, 2015
View reviewed changes

bleskes added the v2.0.0-beta1 label Feb 24, 2015

bleskes closed this in 0f1c779 Feb 27, 2015

bleskes deleted the leftover_translog branch February 27, 2015 19:25

bleskes removed the v2.0.0-beta1 label Feb 27, 2015

clintongormley added :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. and removed review labels Mar 19, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Engine: close snapshots before recovery counter #9760

Engine: close snapshots before recovery counter #9760

bleskes commented Feb 19, 2015

kimchy commented Feb 19, 2015

bleskes commented Feb 19, 2015

s1monw Feb 24, 2015

s1monw Feb 24, 2015

bleskes Feb 24, 2015

s1monw commented Feb 24, 2015

bleskes commented Feb 27, 2015

s1monw commented Feb 27, 2015

Engine: close snapshots before recovery counter #9760

Engine: close snapshots before recovery counter #9760

Conversation

bleskes commented Feb 19, 2015

kimchy commented Feb 19, 2015

bleskes commented Feb 19, 2015

s1monw Feb 24, 2015

Choose a reason for hiding this comment

s1monw Feb 24, 2015

Choose a reason for hiding this comment

bleskes Feb 24, 2015

Choose a reason for hiding this comment

s1monw commented Feb 24, 2015

bleskes commented Feb 27, 2015

s1monw commented Feb 27, 2015