-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-16689. Standby NameNode crashes when transitioning to Active with in-progress tailer #4744
Conversation
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
It looks like the current diff is disallowing in-progress edits during I'm also a little confused because your last comment says you solved the issue via |
@xkrogen Thanks for your review and comment. Sorry for the late replay. Before Unfortunately,
|
The same processing idea has also appeared in HDFS-14806. Here |
If the active crashed, then the segment won't be finalized, right? Actually my recollection is a little hazy here. I guess the new active (former standby) will finalize the old segment, since the old active (which crashed) didn't do it. If the finalization happens before
This is different. Bootstrap standby doesn't need to get all transactions, it's more like a best-effort to load most of the transactions. Later, when the standby transitions to active, then it will call Though generally I agree that the idea is similar. Perhaps we should add a way for callers of |
If the active crashed, during standby starting active services, the standby will recover unclosed streams via
Nice suggestion, I will fix this patch like this way. Thanks, Master! |
@xkrogen Master, I have update this patch via enable or disable inProgressTailing in |
🎊 +1 overall
This message was automatically generated. |
I don't think this will always be true. In Lines 1660 to 1670 in 63db1a8
So if we want to disable in-progress edits during But I don't think this is the right approach. I think you might have also understood what I was saying in my last comment here:
Looking at the current diff, I don't see a reason to add What I was trying to say is that what we really want to do, both in this Jira and in HDFS-14806, is disable the RPC mechanism and use the streaming mechanism. In HDFS-14806, to do this, we turned off in-progress edits, which was okay. But in this Jira, it's not okay -- we don't want the RPC mechanism, but we still need in-progress edits. So basically I am saying we need something like the I am thinking we can add a new parameter to |
Thinking more on this, maybe it does make sense to just modify |
@xkrogen Thanks.
Yes, I got it.
This is a good idea, I will fix this patch like this.
Sorry, I didn't notice this. But I think it's crazy.
I'm sorry, I just find this comment, but didn't find related code to finalize the previous inProgress segment. Can you share the related code? Thanks.
Yes, I agree. Standby should crash or fail to become active if it cannot finalize the old segment. About this case, How about fix it in a new PR? |
a5ade31
to
342e712
Compare
🎊 +1 overall
This message was automatically generated. |
when will this be merged to trunk ? |
@abhishekkarigar Thanks for your attention to this issue. @xkrogen and I will solve this problem as soon as possible. @xkrogen Sir, please review the latest patch. Thanks |
I'm referring to this: Lines 574 to 583 in 62c86ea
But a little more digging made me realize that I don't think what I described will actually happen, since in Lines 338 to 347 in 63db1a8
So it will actually throw an exception, rather than finalizing the old segment as I said previously. But this is after I briefly looked at the other usages of So are we agreed that the best way forward is to modify |
The
Yes, I totally agree with this and I will modify this patch with this idea. |
342e712
to
7d8b497
Compare
💔 -1 overall
This message was automatically generated. |
...hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
Outdated
Show resolved
Hide resolved
} catch (IOException ie) { | ||
GenericTestUtils.assertExceptionContains( | ||
"Unable to start log segment 1: too few journals", ee); | ||
assertTrue("Didn't terminate properly ", ExitUtil.terminateCalled()); | ||
"recoverUnfinalizedSegments failed for too many journals", ie); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should modify the caller to catch the IOException
and rethrow as ExitException
to match previous behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you made this change by catching it in FSNamesystem#loadFSImage()
but now I'm having second thoughts about it. The old behavior comes from this block in FSEditLog#startLogSegment()
:
Lines 1437 to 1447 in 63db1a8
try { | |
editLogStream = journalSet.startLogSegment(segmentTxId, layoutVersion); | |
} catch (IOException ex) { | |
final String msg = "Unable to start log segment " + segmentTxId | |
+ ": too few journals successfully started."; | |
LOG.error(msg, ex); | |
synchronized (journalSetLock) { | |
IOUtils.cleanupWithLogger(LOG, journalSet); | |
} | |
terminate(1, msg); | |
} |
Here we catch the
IOException
within FSEditLog itself and add some useful contextual information, plus we clean up the journalSet
. Seems useful. Given that recoverUnclosedStreams()
is called from a variety of places, it seems that we can't just copy this approach by adding the terminate(1)
logic within FSEditLog#recoverUnclosedStreams()
-- e.g. Checkpointer#doCheckpoint()
seems to assume it can catch an IOException
from this phase and try again, and NameNode#initializeSharedEdits()
/ NameNode#copyEditLogSegmentsToSharedDir()
have their own termination logic that we should probably respect. But it also doesn't really feel correct to me to add this special exception. What I am thinking now is to add a new flag to recoverUnclosedStreams()
, like boolean terminateOnFailure
, that we would set true in FSNamesystem#startActiveServices()
and FSImage#initEditLog()
and false for the three other callers I mentioned above. If true
, we just rethrow the original IOException
, but otherwise, we use similar logic as from startLogSegment()
to add some additional context and call the termination. WDYT?
...oject/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java
Outdated
Show resolved
Hide resolved
...oject/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java
Outdated
Show resolved
Hide resolved
...doop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestHAWithInProgressTail.java
Outdated
Show resolved
Hide resolved
...doop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestHAWithInProgressTail.java
Outdated
Show resolved
Hide resolved
...roject/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/client/SpyQJournalUtil.java
Outdated
Show resolved
Hide resolved
...roject/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/client/SpyQJournalUtil.java
Outdated
Show resolved
Hide resolved
...doop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestHAWithInProgressTail.java
Outdated
Show resolved
Hide resolved
...roject/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/client/SpyQJournalUtil.java
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
d45f42f
to
8237276
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
58002d0
to
43c73b2
Compare
🎊 +1 overall
This message was automatically generated. |
43c73b2
to
b3fb76f
Compare
💔 -1 overall
This message was automatically generated. |
b3fb76f
to
e821c5e
Compare
💔 -1 overall
This message was automatically generated. |
e821c5e
to
8cd18a1
Compare
🎊 +1 overall
This message was automatically generated. |
@xkrogen Sir, can you help me finally review it? @ashutoshcipher @tomscut @ayushtkn @Hexiaoqiao Sir, can help me to double-review it when you are available? |
Thanks @ZanderXu for involving me. I will look into it. |
...oject/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java
Outdated
Show resolved
Hide resolved
} catch (IOException ie) { | ||
GenericTestUtils.assertExceptionContains( | ||
"Unable to start log segment 1: too few journals", ee); | ||
assertTrue("Didn't terminate properly ", ExitUtil.terminateCalled()); | ||
"recoverUnfinalizedSegments failed for too many journals", ie); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you made this change by catching it in FSNamesystem#loadFSImage()
but now I'm having second thoughts about it. The old behavior comes from this block in FSEditLog#startLogSegment()
:
Lines 1437 to 1447 in 63db1a8
try { | |
editLogStream = journalSet.startLogSegment(segmentTxId, layoutVersion); | |
} catch (IOException ex) { | |
final String msg = "Unable to start log segment " + segmentTxId | |
+ ": too few journals successfully started."; | |
LOG.error(msg, ex); | |
synchronized (journalSetLock) { | |
IOUtils.cleanupWithLogger(LOG, journalSet); | |
} | |
terminate(1, msg); | |
} |
Here we catch the
IOException
within FSEditLog itself and add some useful contextual information, plus we clean up the journalSet
. Seems useful. Given that recoverUnclosedStreams()
is called from a variety of places, it seems that we can't just copy this approach by adding the terminate(1)
logic within FSEditLog#recoverUnclosedStreams()
-- e.g. Checkpointer#doCheckpoint()
seems to assume it can catch an IOException
from this phase and try again, and NameNode#initializeSharedEdits()
/ NameNode#copyEditLogSegmentsToSharedDir()
have their own termination logic that we should probably respect. But it also doesn't really feel correct to me to add this special exception. What I am thinking now is to add a new flag to recoverUnclosedStreams()
, like boolean terminateOnFailure
, that we would set true in FSNamesystem#startActiveServices()
and FSImage#initEditLog()
and false for the three other callers I mentioned above. If true
, we just rethrow the original IOException
, but otherwise, we use similar logic as from startLogSegment()
to add some additional context and call the termination. WDYT?
Sorry for the delay in getting back to you on this @ZanderXu . I think the current diff looks really good, just not sure about the exception handling. I will be happy to wrap that up and then help you merge this. |
...p-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
Outdated
Show resolved
Hide resolved
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
...hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
Show resolved
Hide resolved
c5743d6
to
9cc99df
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
…h in-progress tailer
39df927
to
0d25fbe
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
|
Merged, thanks @ZanderXu ! |
…itioning to Active with in-progress tailer (apache#194) * HDFS-16642. Moving the selecting inputstream from journalnode in EditLogTailer outof FSNLock (apache#4497) * HDFS-16689. Standby NameNode crashes when transitioning to Active with in-progress tailer (apache#4744) * remove annotation introduced in merge wrongly
Description of PR
Standby NameNode may crash when transitioning to Active with a in-progress tailer if there are some abnormal JNs.
And the crashed error message as blew:
After tracing and found there is a critical bug in
EditlogTailer#catchupDuringFailover()
withDFS_HA_TAILEDITS_INPROGRESS_KEY=true
.catchupDuringFailover()
may cannot replay all edits from JournalNodes withonlyDurableTxns=true
if there are some abnormal JournalNodes, because maybe JournalNodes will return a empty response causedmaxAllowedTxns=0
inQuorumJournalManager#selectRpcInputStreams
.Reproduce method, suppose:
fromTxnId=3
andonlyDurableTxns=true
, and the count txid of response is 0, 3 respectively. JN2 is abnormal, such as GC, bad network or restarted.fromTxnId=3
from JournalNodes because themaxAllowedTxns
is 0.So I think Standby NameNode should
catchupDuringFailover()
withonlyDurableTxns=false
, so that it can replay all missed edits from JournalNode.