PipeConsensus: Fix replication block when leader restart. by Pengzna · Pull Request #13028 · apache/iotdb

Pengzna · 2024-07-25T08:13:14Z

In PipeConsensus, in order to accept tsfile in parallel, the tsfileWriterPool mechanism is introduced: each time a tsfilePiece is written, it will first try to get a tsfileWriter from the pool. If it cannot get one, it will block the polling until it gets a writer. The number of tsfileWriters in the tsfileWriterPool is equal to the number of pipelines.
Under normal circumstances, this mechanism is fine. Because as long as the previous tsfile completes the sealing, it will return the writer to the pool. This ensures that the tsfiles waiting in line for writers will eventually get the writer and will not be blocked all the time.
However, there are two scenarios that may cause problems:

Some physical problems occur in the Follower machine, resulting in the tsfile sealing being unable to succeed. As a result, the tsfileWriter cannot be returned, and the subsequent tsfiles will be blocked waiting for the tsfileWriter. This scenario is a machine problem, not a code problem, and can be solved by physical restart, etc.
Leader restart. For example: pipeline size is 5.

Before restart, the leader sent three tsFileEvents 1, 2, and 3. Among them, 1 and 2 have been synchronized, and 3 is transmitting tsfilePiece.
At this time, the leader restarts. From the perspective of the Follower, the sealing request of event 3 is lost (although the leader will retransmit event 3 after restart, the leader will renumber the event, so it is lost from the Follower's point of view), causing this event to wait for the sealing request and not release tsfileWriter. At this time, there is a "zombie" event (that is, event 3) in the Follower's pipeline.
After that, the leader transmits events abcd again. These 4 events can normally carry out the tsfilePiece writing process, and together with event 3 above, they occupy the tsFileWriterPool.
If another tsfile event e is passed at this time, it will be blocked in the process of getting tsFileWriter. And fatally, event e will occupy the big lock of PipeConsensusReceiver, blocking the subsequent sealing request of abcd event.
e is waiting for abcd to release tsfileWriter, and abcd is waiting for e to release the big lock, thus forming a deadlock and blocking the entire process.
Fixed measures: When the leader restarts, the follower needs to manually release all tsfileWriter. Ensure that there are no "zombie" events in tsfileWriterPool.

Chinese Version：
在 PipeConsensus 中，为了并行接受 tsfile，引入了 tsfileWriterPool 的机制：每次写 tsfilePiece 时，会首先尝试从 pool 中拿一个 tsfileWriter。如果拿不到，会一直阻塞轮询，直到拿到 writer 为止。 tsfileWriterPool 中 tsfileWriter 的个数等于 pipeline 个数。
正常情况下，这套机制是没有问题的。因为只要前面的 tsfile 完成封口，它就会把 writer 还给 pool。从而保证后面排队等 writer 的 tsfile 最终一定能拿到 writer，不会一直阻塞。
然而，有两种场景可能会产生问题：

Follower 机器出现某种物理问题，导致 tsfile 封口一直无法成功。从而导致 tsfileWriter 一直无法归还，那么后面的 tsfile 就会阻塞在等 tsfileWriter。该场景属于机器问题，而不是代码问题，可通过物理重启等解决。
Leader 重启。举例说明：pipeline 大小是 5。

重启前，leader 发送了 1,2,3 三个 tsFileEvent。其中 1,2 已经同步完成，3 正在传 tsfilePiece。
此时 leader 重启，以 Follower 视角来看，事件 3 的封口请求就丢失了（尽管 leader 重启后会重传事件 3，但leader 对于该事件会重新编序号，因此对于 Follower 来看就丢失了），导致这个事件迟迟等不到封口请求，不释放 tsfileWriter。此时 Follower 的 pipeline 里有一个”僵尸“事件（即事件 3）
之后 leader 再传输事件 abcd，这 4 个事件能正常进行 tsfilePiece 写流程，并和上文的事件 3 一起占满了 tsFileWriterPool。
此时如果再传过来一个 tsfile 事件 e，那么它就会阻塞在拿 tsFileWriter 的流程中。并且致命的是，事件 e 会占据 PipeConsensusReceiver 的大锁，阻塞 abcd 事件后续的封口请求。
e 在等 abcd 释放 tsfileWriter，abcd 在等 e 释放大锁，从而形成死锁，阻塞整个流程。
修复措施：当 leader 重启时，Follower 需要手动释放所有 tsfileWriter。保证 tsfileWriterPool 中没有”僵尸“事件。

...ain/java/org/apache/iotdb/db/pipe/receiver/protocol/pipeconsensus/PipeConsensusReceiver.java

OneSizeFitsQuorum

LGTM

* fix 833 * fix review(rename typo) (cherry picked from commit 2a06d48)

fix 833

178a90b

Caideyipi approved these changes Jul 25, 2024

View reviewed changes

...ain/java/org/apache/iotdb/db/pipe/receiver/protocol/pipeconsensus/PipeConsensusReceiver.java Outdated Show resolved Hide resolved

fix review(rename typo)

3422ef7

OneSizeFitsQuorum approved these changes Jul 26, 2024

View reviewed changes

OneSizeFitsQuorum merged commit 2a06d48 into apache:master Jul 26, 2024

JackieTien97 pushed a commit that referenced this pull request Jul 29, 2024

PipeConsensus: Fix replication block when leader restart. (#13028)

ec40085

* fix 833 * fix review(rename typo) (cherry picked from commit 2a06d48)

JackieTien97 pushed a commit that referenced this pull request Jul 29, 2024

PipeConsensus: Fix replication block when leader restart. (#13028)

891c09c

* fix 833 * fix review(rename typo) (cherry picked from commit 2a06d48)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PipeConsensus: Fix replication block when leader restart.#13028

PipeConsensus: Fix replication block when leader restart.#13028
OneSizeFitsQuorum merged 2 commits intoapache:masterfrom
Pengzna:TIMECHO/874

Pengzna commented Jul 25, 2024

Uh oh!

Uh oh!

OneSizeFitsQuorum left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Pengzna commented Jul 25, 2024

Uh oh!

Uh oh!

OneSizeFitsQuorum left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants