Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-4646: Committed txns may still be lost if followers crash after replying ACK of NEWLEADER but before writing txns to disk #1993

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

AlphaCanisMajoris
Copy link

@AlphaCanisMajoris AlphaCanisMajoris commented Mar 26, 2023

In brief, committed logs might be lost due to the follower's asynchronous transaction logging when replying ACK of NEWLEADER during the SYNC phase.

See ZOOKEEPER-4646 for details on the symptom, example trace, diagnostic, and possible fix idea.

Actually, this problem had first been raised in ZOOKEEPER-3911. However, the fixing patch of ZOOKEEPER-3911 does not solve the problem at the root. Besides, ZOOKEEPER-4685 is also caused with similar root cause, i.e., non-deterministic execution orders between the QuorumPeer thread and the SyncThread when the follower replies ACK of NEWLEADER.

Possible Fix

Considering ZOOKEEPER-4646 and ZOOKEEPER-4685, one possible fix is to guarantee the following execution orders to be satisfied:

  • A follower replies ACK of NEWLEADER only after it has persisted the uncommitted transactions to disk in the SYNC phase (to avoid ZOOKEEPER-4646 & ZOOKEEPER-3911). In fact, in the ZAB protocol, it is clearly required that follower should sync its history before it replies ACK of NEWLEADER.
  • A follower replies ACK of PROPOSAL only after it has replied ACK of NEWLEADER (to avoid ZOOKEEPER-4685).

There are several ways of implementation.

Option 1 : Multi-threaded collaboration between the follower's QuorumPeer thread and the SyncThread

The follower's QuorumPeer thread will not reply ACK of NEWLEADER until it is notified that the SyncThread has persisted the uncommitted logs to the disk. Use a CountDownLatch ( named newleaderLatch, for example) to record the number of uncommitted transactions that should be logged before the follower replies ACK of NEWLEADER. Before replying ACK of NEWLEADER, the follower's QuorumPeer thread will be blocked at newleaderLatch.await(..) if the count of the newleaderLatch is still non-zero. Whenever the SyncThread logs a transaction and finds that newleaderLatch.getCount() > 0, it will call newleaderLatch.countDown() . Only after the count of the newleaderLatch turns to zero will the follower be able to reply ACK of NEWLEADER. The ACKs of above transaction proposals will be replied after the ACK of NEWLEADER is replied.

- Pros & Cons

  • (+) SyncThread is responsible to process the uncommitted logs in Sync phase. Reuse the logic of the chain of responsibility in implementation.
  • (-) Involve multi-threaded collaboration and synchronization between the QuorumPeer thread and the SyncThread. Need a semaphore like a CountDownLatch, and extra shared variable like a queue to store the queued ACKs that are generated before replying ACK of NEWLEADER. Thread-safety issues need to be carefully considered.
  • (-) Checking the status of learner.newleaderLatch in SendAckRequestProcessor.processRequest(..) may affect the performance of follower's SyncThread replying ACK of PROPOSAL during BROADCAST phase.

Option 2 (Selected) : All done by the follower's QuorumPeer thread

The uncommitted logs will be persisted to disk by the follower's QuorumPeer thread rather than SyncThread. That is, after receiving NEWLEADER, the follower's QuorumPeer thread will persist the uncommitted logs to disk, then reply ACK of NEWLEADER, and then reply pending ACKs of PROPOSALs.

- Pros & Cons

  • (+) All above actions are done by single thread. Don't need multi-threaded collaboration and synchronization between the follower's QuorumPeer thread and the SyncThread during SYNC phase. Don't need extra shared variables or semaphore. No thread-safety issues. Code logic is simplified compared to Option 1.
  • (+) Will not affect the performance of the follower during BROADCAST phase at all!
  • (-) The QuorumPeer thread simplifies the SyncThread's transaction logging logic during SYNC phase, which may slightly affect the statistics of server metrics.

The solution in this patch implements Option 2 mainly for the performance consideration. With little performance penalty, it promises the safety property that all committed logs will not be lost. The solution is able to avoid the issues of ZOOKEEPER-4646, ZOOKEEPER-3911 and ZOOKEEPER-4685 at the same time.

…fter replying ACK of NEWLEADER but before writing txns to disk
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant