-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HDFS-17267. Client send the same packet multiple times when method markSlowNode throws IOException. #6311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e817f5c to
6dea947
Compare
6dea947 to
c3694bf
Compare
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
c3694bf to
17f321a
Compare
|
💔 -1 overall
This message was automatically generated. |
|
the failed unit tests were passed in my local. It seems unreleated with this PR. |
17f321a to
ec10b91
Compare
|
@Hexiaoqiao @tomscut @zhangshuyan0 Sir, could you please help me review this code when you have free time? Thanks a lot. |
|
🎊 +1 overall
This message was automatically generated. |
ec10b91 to
e689dec
Compare
|
🎊 +1 overall
This message was automatically generated. |
e9c2121 to
3c0d085
Compare
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
318bb6c to
9b31b76
Compare
…rkSlowNode throws IOException.
9b31b76 to
67870e4
Compare
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
Description of PR
Since we have HDFS-16348, we can kick out SLOW node in pipeline when writing data to pipeline.
And I think it introduced a problem, that is the same packet will be sent twice or more times when we kick out SLOW node.
The flow are as below:
1、 DFSPacket p1 is pushed into dataQueue.
2、DataStreamer takes DFSPacket p1 from dataQueue.
3、Remove p1 from dataQueue and push p1 into ackQueue.
4、sendPacket(p1).
5、In ResponseProcessor#run, read pipelineAck for p1.
6、We meet SlOW node, so method markSlowNode throw IOException and does not execute
ackQueue.removeFirst();.7、In next loop of DataStreamer#run, we come into method processDatanodeOrExternalError and execute
dataQueue.addAll(0, ackQueue);.8、the p1 will be sent repeatedly.
We can debug the unit test method testPipelineRecoveryWithSlowNode to verify this PR.
Set breakpoint in DataStreamer#run :
LOG.debug("{} sending {}", this, one);.We can see the DFSPacket with seq=3 sends twice.
BTW, on datanode side. It will not write packet data twice, because if will compare the onDiskLen and offsetInBlock in method receivePacket(). if onDiskLen >= offsetInBlock, there will not happen writing data behavior.