Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-2872: Interrupted snapshot sync causes data loss #333

Closed
wants to merge 4 commits into from

Conversation

enixon
Copy link

@enixon enixon commented Aug 10, 2017

This requires the fix in ZOOKEEPER-2870: Improve the efficiency of AtomicFileOutputStream

Copy link
Contributor

@hanm hanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does AtomicFileOutputStream (or FilterOutputStream) have fsync semantics? It is not obvious to me..

@enixon
Copy link
Author

enixon commented Aug 11, 2017

AtomicFileOutputStream performs an fsync when the stream is closed with the following.
"((FileOutputStream) out).getFD().sync();"

@hanm
Copy link
Contributor

hanm commented Aug 13, 2017

I am now wondering why we should not fsync snapshot taking at all cases. It seems to be a useful property to have for snapshot serialization, and will make code simpler. Any performance considerations that lead to the conclusion of only applying fsync snapshot when it's a SNAP sync?

@@ -364,6 +364,7 @@ protected void syncWithLeader(long newLeaderZxid) throws Exception{
readPacket(qp);
LinkedList<Long> packetsCommitted = new LinkedList<Long>();
LinkedList<PacketInFlight> packetsNotCommitted = new LinkedList<PacketInFlight>();
boolean syncSnapshot = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can level this variable definition up so it's clustered with snapshotNeed boolean.

Another possibility is to get ride of this variable and use existing snapshotNeeded - but that will do fysnc snapshot for TRUNC sync, which the existing patch will not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possibility as I just commented is to get rid of this variable and always Fsync snapshot serialization.

@enixon
Copy link
Author

enixon commented Aug 17, 2017

We contemplated doing an fsync for every snapshot and decided against. You're taking a guaranteed io spike each time. That's fine when you're just syncing with the quorum but during normal operation, it seems best to keep snapshot taking a lighter weight operation.

@enixon
Copy link
Author

enixon commented Aug 17, 2017

I am unable to reproduce the test failure in Zab1_0Test

@hanm
Copy link
Contributor

hanm commented Aug 18, 2017

it seems best to keep snapshot taking a lighter weight operation.

Sounds reasonable.

I am unable to reproduce the test failure in Zab1_0Test

I think it's a flaky test. Filed ZOOKEEPER-2877 for this.

Copy link
Contributor

@hanm hanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will merge.

@asfgit asfgit closed this in 0706b40 Aug 18, 2017
@hanm
Copy link
Contributor

hanm commented Aug 18, 2017

Committed to master: 0706b40

Pending JIRA resolve after fixing merge conflicts and commit into branch-3.4 and 3.5.

lvfangmin pushed a commit to lvfangmin/zookeeper that referenced this pull request Jun 17, 2018
This requires the fix in ZOOKEEPER-2870: Improve the efficiency of AtomicFileOutputStream

Author: Brian Nixon <nixon@fb.com>

Reviewers: Michael Han <hanm@apache.org>

Closes apache#333 from enixon/snap-sync
@enixon enixon deleted the snap-sync branch March 19, 2019 22:08
smikes pushed a commit to smikes/zookeeper that referenced this pull request Feb 12, 2021
Add metric server to docker builds.
RokLenarcic pushed a commit to RokLenarcic/zookeeper that referenced this pull request Sep 3, 2022
This requires the fix in ZOOKEEPER-2870: Improve the efficiency of AtomicFileOutputStream

Author: Brian Nixon <nixon@fb.com>

Reviewers: Michael Han <hanm@apache.org>

Closes apache#333 from enixon/snap-sync
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants