-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HDFS-16557. BootstrapStandby failed because of checking gap for inprogress EditLogInputStream #4219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
💔 -1 overall
This message was automatically generated. |
|
Thanks @ashutoshcipher for your review. |
|
Hi @ayushtkn @Hexiaoqiao @ferhui , could you please also take a look? Thanks a lot. |
xkrogen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems right to me, but I don't fully understand what went wrong to cause the error. Can you explain more fully? Why did we previously make the assumption that INVALID_TXID meant in-progress, and what has changed to make that not true / what happened in your specific scenario to cause that not to be true?
...hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
Outdated
Show resolved
Hide resolved
Thank you @xkrogen very much for your review. After introducing [SBN READ], we updated the configuration: Then when we
The |
|
Hi @xkrogen , to make the change safe, we can change the condition from: Do you think it's necessary? |
|
💔 -1 overall
This message was automatically generated. |
…gress EditLogInputStream
|
Hi @xkrogen , please take a look if you have enough bandwidth. Thanks a lot. |
|
💔 -1 overall
This message was automatically generated. |
|
Thanks @tomscut for your report. Similar with HDFS-14806 ? |
Thanks @ZanderXu for your comments. Setting |
|
Hi @ayushtkn , could you please also take a look at this. Thanks. |
|
Hi @jojochuang @tasanuma @Hexiaoqiao , could you please also take a look. Thanks. |
|
Hi @xkrogen , if you have enough bandwidth, please take a look. Thank you. |
|
Thanks @tomscut , after tracing the code, I think we cannot add And I will explain my ideas trough questions and answers.
Question two: What is the difference between INVALID_TXID and is InProgress()?
Please correct me if anything is wrong. |
Thanks @ZanderXu for your comment. Please refer to the stack. When we set |
|
OK, back to BootstrapStandby GAP. Please correct me if anything is wrong. |
Please refer to the discussion with @xkrogen above. The root cause is the |
|
Oh, i know, the root cause is that getJournaledEdits returns up to 5000 txids by default. And 1049842441 - 1049837441 = 5000. It can't reached to 1050196644, so checkForGaps failed. |
|
So in this case, we should change bootstrap logic. |
|
As I explained above, change to About my explain, do you have any questions?😁 Discuss together and become more familiar with the relevant logic. |
I think there is a gap here because bootstrap expects to get 1050196644 txid, but can't find it in the result. So throwing GAP Exception is ok. |
|
We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |

JIRA: HDFS-16557.
The lastTxId of an inprogress EditLogInputStream lastTxId isn't necessarily HdfsServerConstants.INVALID_TXID. We can determine its status directly by EditLogInputStream#isInProgress.
We introduced [SBN READ], and set
dfs.ha.tail-edits.in-progress=true. Then bootstrapStandby a new Namenode, the EditLogInputStream of inProgress is misjudged, resulting in a gap check failure, which causes bootstrapStandby to fail.hdfs namenode -bootstrapStandby