HDFS-16601. DataTransfer should throw IOException to client #4369

ZanderXu · 2022-05-28T04:32:52Z

Detail info please refer to HDFS-16601.

Bug stack like:

java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[127.0.0.1:59687,DS-b803febc-7b22-4144-9b39-7bf521cdaa8d,DISK], DatanodeInfoWithStorage[127.0.0.1:59670,DS-0d652bc2-1784-430d-961f-750f80a290f1,DISK]], original=[DatanodeInfoWithStorage[127.0.0.1:59670,DS-0d652bc2-1784-430d-961f-750f80a290f1,DISK], DatanodeInfoWithStorage[127.0.0.1:59687,DS-b803febc-7b22-4144-9b39-7bf521cdaa8d,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
	at org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1418)
	at org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1478)
	at org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1704)
	at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1605)
	at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1587)
	at org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1371)
	at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:674)

hadoop-yetus · 2022-05-28T10:43:31Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 39s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	54m 30s		trunk passed
+1 💚	compile	1m 46s		trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚	compile	1m 37s		trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	checkstyle	1m 25s		trunk passed
+1 💚	mvnsite	1m 48s		trunk passed
+1 💚	javadoc	1m 26s		trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚	javadoc	1m 49s		trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	spotbugs	3m 39s		trunk passed
+1 💚	shadedclient	22m 53s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	1m 25s		the patch passed
+1 💚	compile	1m 26s		the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚	javac	1m 26s		the patch passed
+1 💚	compile	1m 21s		the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	javac	1m 21s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	1m 2s		the patch passed
+1 💚	mvnsite	1m 26s		the patch passed
+1 💚	javadoc	0m 57s		the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚	javadoc	1m 32s		the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	spotbugs	3m 22s		the patch passed
+1 💚	shadedclient	22m 32s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	243m 7s		hadoop-hdfs in the patch passed.
+1 💚	asflicense	1m 15s		The patch does not generate ASF License warnings.
		369m 21s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4369/1/artifact/out/Dockerfile
GITHUB PR	#4369
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux 2a881ab226da 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / ea78ff70fe4e1e527ebca5486eba4fd67203fa37
Default Java	Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4369/1/testReport/
Max. process+thread count	3466 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4369/1/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

ZanderXu · 2022-06-05T09:58:45Z

@Hexiaoqiao @MingXiangLi can you help me review this patch? thanks~

Hexiaoqiao · 2022-06-07T13:27:57Z

@ZanderXu Thanks for report and contribution. Sorry I don't get what scenario lead this issue. Would you like to offer more information such as version deployed and how to reproduce. Thanks.

ZanderXu · 2022-06-07T14:09:43Z

Thanks @Hexiaoqiao .
When client is recovering pipeline, the source dn of selected to transfer block to new DN may be abnormal, so that the source dn cannot transfer the block to the new node normally, but the failed exception not returned to the client, caused the client to think that the transfer is completed successfully. Because new DN not contains the block, so client will fail to build the pipeline and mark the new DN is bad. And then Client will add the new DN into exclude list to get a new DN for the new loop pipeline recovery.
The new pipeline recovery will still choose the abnormal dn as source dn to transfer block, and will failed again..

So Dn should return the failed transfer exception to client, so that client can choose anther existed dn as source dn to transfer the block to new DN.

Hexiaoqiao · 2022-06-09T14:01:47Z

Thanks for starting this proposal. I think there are still many issues for data transfer for pipeline recovery from my practice, which includes both basic function and performance. IIRC, there are only timeout exception and one no explicit meaning exception, thus client has no helpful information (such src node or target node meet issue, or other exceptions) to make decision.
Back to this PR, I totally agree to throw exception from datanode to client first(but I am not sure if it is enough at this PR, maybe we need more information) then add more fault-tolerant logic at client side.
IMO, we should file one new JIRA to design/refactor fault-tolerant for data transfer of pipeline recovery. Just my own suggestion, not blocker.

ZanderXu · 2022-06-09T14:56:08Z

Thanks @Hexiaoqiao for your suggestion. Yeah, your are right, we need more failed information for client, like transfer source failed or transfer target failed. If client have more information about failed transfer, It can accurately and efficiently remove abnormal nodes. But this would be a big feature.

Fortunately, at present, as long as failed exception throw to client, the client defaults to thinking that the new dn is abnormal, and will exclude it and retry transfer. During retrying transfer, Client will chose new source dn and new target dn. Therefor, the source and target dn in the previous failed transfer round will be replaced.
If it is target dn caused failed, excluded the target dn will be ok.
If it is source dn caused failed, it will be removed when building the new pipeline.

So I think simple process is just throw failed exception to client, and client can find and remove the real abnormal datanode.

Hexiaoqiao · 2022-06-10T10:01:49Z

Fortunately, at present, as long as failed exception throw to client, the client defaults to thinking that the new dn is abnormal, and will exclude it and retry transfer. During retrying transfer, Client will chose new source dn and new target dn.

Thanks for furthermore comment here. Agree that it will improve fault-tolerant for transfer, however, we have to accept the truth that the source datanode meets issue and choose the same one when retry, thus we could not avoid to fail. I am not sure if any way to expose exceptions to differ source Node or target Node exception? If it is true, it will be helpful for the following fault-tolerant improvement at client side.

ZanderXu · 2022-06-10T10:08:59Z

the source datanode meets issue and choose the same one when retry

It will chose the next datanode as source datanode when retry.

Code like blew, and tried will +1 when retry.

      final DatanodeInfo src = original[tried % original.length];
      final DatanodeInfo[] targets = {nodes[d]};
      final StorageType[] targetStorageTypes = {storageTypes[d]};

      try {
        transfer(src, targets, targetStorageTypes, lb.getBlockToken());
      } catch (IOException ioe) {
        DFSClient.LOG.warn("Error transferring data from " + src + " to " +
            nodes[d] + ": " + ioe.getMessage());
        caughtException = ioe;
        // add the allocated node to the exclude list.
        exclude.add(nodes[d]);
        setPipeline(original, originalTypes, originalIDs);
        tried++;
        continue;
      }

Hexiaoqiao · 2022-06-10T10:29:10Z

Sorry for not very clear comment. I know not it's round-robin way to pick the source node, and at third round it will pick the original node again (no matter if it is bad/slow node.), of course it will be a tiny probability. Actually, I mean, it will be helpful for client to do many fault-tolerant improvement later if we could differ the exception about transfer. Once more, this is not blocker comment. Thanks again.

ZanderXu · 2022-06-10T10:32:21Z

Got, thanks @Hexiaoqiao .

Actually, I mean, it will be helpful for client to do many fault-tolerant improvement later if we could differ the exception about transfer

I will try to work for it.

jojochuang

The patch makes sense and it's great to dig so deep and fix it. But I wish you can state the bug report in the jira more clearly.

Basically you reported a bug where if the replica is corrupt, DataNode should not attempt to transfer from that replica when attempting to recover from a write failure, because it will always fail. HDFS-4660 (the file offset bug) plus this one together caused the client to fail to recover.

jojochuang · 2022-07-26T21:10:57Z

.../hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java

+              // At this condition, transferBlock that happens during
+              // pipeline recovery would transfer extra bytes to make up to the
+              // end of the chunk. And this is when the block corruption
+              // described in HDFS-4660 would occur.


Oh HDFS-4660 brought back my worst nightmare when I spent a month chasing this bug.

ZanderXu · 2022-07-27T04:53:39Z

@jojochuang Thanks for you review. We encounter this bug in our prod, because the block‘s checksum file of the source DN is corrupted. It caused transfer failed. And client tried all DNs and failed.

So Client should sense the status of transfer. But it's difficult to differ the exception caused by source Node or target Node. Maybe we can first throw the failed exception to Client and let Client try to use the next DN as the source to transfer block.

cc @Hexiaoqiao

Hexiaoqiao

LGTM. +1 from my side.

ZanderXu · 2022-08-24T03:16:05Z

@jojochuang Master, I have rebased this patch based on the latest trunk. I'm looking forward to getting some of your thoughts on this issue.

BTW, Remote copy of HDFS-2139(FastCopy) will use DataTransfer and it also requires DataTransfer to throw IOException to Client to retry.

hadoop-yetus · 2022-08-24T08:59:53Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	1m 4s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	39m 0s		trunk passed
+1 💚	compile	1m 45s		trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚	compile	1m 33s		trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	checkstyle	1m 25s		trunk passed
+1 💚	mvnsite	1m 44s		trunk passed
+1 💚	javadoc	1m 25s		trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚	javadoc	1m 48s		trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	spotbugs	3m 41s		trunk passed
+1 💚	shadedclient	23m 2s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	1m 22s		the patch passed
+1 💚	compile	1m 20s		the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚	javac	1m 20s		the patch passed
+1 💚	compile	1m 18s		the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	javac	1m 18s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 56s		the patch passed
+1 💚	mvnsite	1m 24s		the patch passed
+1 💚	javadoc	0m 57s		the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚	javadoc	1m 24s		the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	spotbugs	3m 17s		the patch passed
+1 💚	shadedclient	22m 29s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	241m 12s		hadoop-hdfs in the patch passed.
+1 💚	asflicense	1m 1s		The patch does not generate ASF License warnings.
		350m 52s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4369/3/artifact/out/Dockerfile
GITHUB PR	#4369
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux 64a77da0c200 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `6eb9db3`
Default Java	Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4369/3/testReport/
Max. process+thread count	3105 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4369/3/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

jojochuang reviewed Jul 26, 2022

View reviewed changes

Hexiaoqiao approved these changes Jul 28, 2022

View reviewed changes

HDFS-16601. DataTransfer should throw Exception to client

6eb9db3

ZanderXu force-pushed the HDFS-16601 branch from ea78ff7 to 6eb9db3 Compare August 24, 2022 03:07

ZanderXu changed the title ~~HDFS-16601. Failed to replace a bad datanode on the existing pipeline…~~ HDFS-16601. DataTransfer should throw IOException to client Aug 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDFS-16601. DataTransfer should throw IOException to client #4369

HDFS-16601. DataTransfer should throw IOException to client #4369

ZanderXu commented May 28, 2022

hadoop-yetus commented May 28, 2022

ZanderXu commented Jun 5, 2022

Hexiaoqiao commented Jun 7, 2022 •

edited

ZanderXu commented Jun 7, 2022

Hexiaoqiao commented Jun 9, 2022

ZanderXu commented Jun 9, 2022

Hexiaoqiao commented Jun 10, 2022

ZanderXu commented Jun 10, 2022

Hexiaoqiao commented Jun 10, 2022

ZanderXu commented Jun 10, 2022

jojochuang left a comment

jojochuang Jul 26, 2022

ZanderXu commented Jul 27, 2022

Hexiaoqiao left a comment

ZanderXu commented Aug 24, 2022

hadoop-yetus commented Aug 24, 2022

HDFS-16601. DataTransfer should throw IOException to client #4369

Are you sure you want to change the base?

HDFS-16601. DataTransfer should throw IOException to client #4369

Conversation

ZanderXu commented May 28, 2022

hadoop-yetus commented May 28, 2022

ZanderXu commented Jun 5, 2022

Hexiaoqiao commented Jun 7, 2022 • edited

ZanderXu commented Jun 7, 2022

Hexiaoqiao commented Jun 9, 2022

ZanderXu commented Jun 9, 2022

Hexiaoqiao commented Jun 10, 2022

ZanderXu commented Jun 10, 2022

Hexiaoqiao commented Jun 10, 2022

ZanderXu commented Jun 10, 2022

jojochuang left a comment

Choose a reason for hiding this comment

jojochuang Jul 26, 2022

Choose a reason for hiding this comment

ZanderXu commented Jul 27, 2022

Hexiaoqiao left a comment

Choose a reason for hiding this comment

ZanderXu commented Aug 24, 2022

hadoop-yetus commented Aug 24, 2022

Hexiaoqiao commented Jun 7, 2022 •

edited