-
Notifications
You must be signed in to change notification settings - Fork 7.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZOOKEEPER-2872: Interrupted snapshot sync causes data loss #333
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does AtomicFileOutputStream (or FilterOutputStream) have fsync semantics? It is not obvious to me..
AtomicFileOutputStream performs an fsync when the stream is closed with the following. |
I am now wondering why we should not fsync snapshot taking at all cases. It seems to be a useful property to have for snapshot serialization, and will make code simpler. Any performance considerations that lead to the conclusion of only applying fsync snapshot when it's a SNAP sync? |
@@ -364,6 +364,7 @@ protected void syncWithLeader(long newLeaderZxid) throws Exception{ | |||
readPacket(qp); | |||
LinkedList<Long> packetsCommitted = new LinkedList<Long>(); | |||
LinkedList<PacketInFlight> packetsNotCommitted = new LinkedList<PacketInFlight>(); | |||
boolean syncSnapshot = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can level this variable definition up so it's clustered with snapshotNeed
boolean.
Another possibility is to get ride of this variable and use existing snapshotNeeded
- but that will do fysnc snapshot for TRUNC sync, which the existing patch will not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another possibility as I just commented is to get rid of this variable and always Fsync snapshot serialization.
We contemplated doing an fsync for every snapshot and decided against. You're taking a guaranteed io spike each time. That's fine when you're just syncing with the quorum but during normal operation, it seems best to keep snapshot taking a lighter weight operation. |
I am unable to reproduce the test failure in Zab1_0Test |
Sounds reasonable.
I think it's a flaky test. Filed ZOOKEEPER-2877 for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, will merge.
Committed to master: 0706b40 Pending JIRA resolve after fixing merge conflicts and commit into branch-3.4 and 3.5. |
This requires the fix in ZOOKEEPER-2870: Improve the efficiency of AtomicFileOutputStream Author: Brian Nixon <nixon@fb.com> Reviewers: Michael Han <hanm@apache.org> Closes apache#333 from enixon/snap-sync
Add metric server to docker builds.
This requires the fix in ZOOKEEPER-2870: Improve the efficiency of AtomicFileOutputStream Author: Brian Nixon <nixon@fb.com> Reviewers: Michael Han <hanm@apache.org> Closes apache#333 from enixon/snap-sync
This requires the fix in ZOOKEEPER-2870: Improve the efficiency of AtomicFileOutputStream