-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-28037 Replication stuck after switching to new WAL but the queue is empty #5375
Conversation
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
IIRC it is impossible that a normal replication source has an empty queue since it will always has a wal file being written. Can you reproduce this problem with a UT? Or at least explain the sequence on how to reproduce this problem? Thanks. |
Thanks, @Apache9. There exists replication stuck on our production clusters which will recover after restarting the stuck regionserver. I digged the issue and found that in the jstack info there was no active source stream readers while the replication queue was not empty. |
If it is possible that the replication queue could be empty in a very shot time window, then there could be other serious problem, as we do not expect a non recovery replication queue could be empty... Thanks for reporting. |
Checked the code, on branch-2.x, we will only record the WAL file on zk in preLogRoll, this is for not losing the WAL after restarting, but we will not enqueue it. The enqueuing is done in postLogRoll. So it is possible that the replication queue is empty for a very short time window. On master and branch-3, we even do not have preLogRoll implemented any more, only enqueue the log in postLogRoll. So this is a problem. I think we can apply this PR for branch-2.5 and branch-2.4. I will open an issue for handling this problem for other branches, as the code has been refactored a lot... |
Oh, please change the comments? There is no sync replication for branch-2.x. |
bae61ec
to
ce6d55a
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
…e is empty (#5375) Signed-off-by: Duo Zhang <zhangduo@apache.org>
…e is empty (apache#5375) Signed-off-by: Duo Zhang <zhangduo@apache.org> (cherry picked from commit 4c3bffe) Change-Id: I4d8b6168ec533c7a5821f4be9d625b1e4f92b21e
No description provided.