Skip to content

Conversation

@pkumarsinha
Copy link

@pkumarsinha pkumarsinha commented Sep 20, 2021

Description of PR

The exception handling in distcp code during the check seems improper. In a FileSystem implementation which doesn't override getXAttrs(), it would throw UnsupportedOperationException(). However, in distCp while handling the code, it tries to catch Exception class. That would result in normal IOException also treated as if the feature is not supported. This will lead to wrong behavior.

How was this patch tested?

👀

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@steveloughran
Copy link
Contributor

  1. Is the current behaviour actually causing a problem?
  2. Have you looked at all the implementations of fs.getXAttrs() and verified that they always throw UnsupportedOperationException if unsupported.

I can see from a purism perspective this is the correct thing to do. But I worry that changing it now will break things.

After all, if an IOE is raised here for some reason (network, auth etc) distcp will go on to fail shortly afterwards

@pkumarsinha
Copy link
Author

1. Is the current behaviour actually causing a problem?

2. Have you looked at all the implementations of fs.getXAttrs() and verified that they always throw UnsupportedOperationException if unsupported.

I can see from a purism perspective this is the correct thing to do. But I worry that changing it now will break things.

After all, if an IOE is raised here for some reason (network, auth etc) distcp will go on to fail shortly afterwards

Thanks for the review @steveloughran !!

  1. Yes, experienced recently this exception:

021-08-02 10:05:17,150 ERROR org.apache.hadoop.tools.DistCp: [Thread-1973153]: XAttrs not supported on at least one file system:
org.apache.hadoop.tools.CopyListing$XAttrsNotSupportedException: XAttrs not supported for file system: hdfs://someNN_host.tests.cloudera.com:8020
at org.apache.hadoop.tools.util.DistCpUtils.checkFileSystemXAttrSupport(DistCpUtils.java:513) ~[hadoop-distcp-3.1.1.7.1.7.0-392.jar:?]
at org.apache.hadoop.tools.CopyListing.validateFinalListing(CopyListing.java:202) ~[hadoop-distcp-3.1.1.7.1.7.0-392.jar:?]
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:93) ~[hadoop-distcp-3.1.1.7.1.7.0-392.jar:?]
at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:379) ~[hadoop-distcp-3.1.1.7.1.7.0-392.jar:?]
at org.apache.hadoop.tools.DistCp.prepareFileListing(DistCp.java:96) ~[hadoop-distcp-3.1.1.7.1.7.0-392.jar:?]
at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:216) ~[hadoop-distcp-3.1.1.7.1.7.0-392.jar:?]
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:193) ~[hadoop-distcp-3.1.1.7.1.7.0-392.jar:?]
at org.apache.hadoop.tools.DistCp.run(DistCp.java:155) [hadoop-distcp-3.1.1.7.1.7.0-392.jar:?]

Didn't expect this call to fail for hdfs. And the only reason I could think of is the improper exception handling. It could probably be an intermittent network issue or so in this case.

  1. The default behavior in FileSystem.java is to throw UnsupportedOperationException. Couldn't notice any override where it is silently ignored.

/**

  • Get all of the xattr name/value pairs for a file or directory.
  • Only those xattrs which the logged-in user has permissions to view
  • are returned.
  • Refer to the HDFS extended attributes user documentation for details.
  • @param path Path to get extended attributes
  • @return Map describing the XAttrs of the file or directory
  • @throws IOException IO failure
  • @throws UnsupportedOperationException if the operation is unsupported
  •     (default outcome).
    

*/
public Map<String, byte[]> getXAttrs(Path path) throws IOException {
throw new UnsupportedOperationException(getClass().getSimpleName()
+ " doesn't support getXAttrs");
}

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 43s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 37m 59s trunk passed
+1 💚 compile 0m 36s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 0m 33s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 0m 27s trunk passed
+1 💚 mvnsite 0m 38s trunk passed
+1 💚 javadoc 0m 31s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 28s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 0m 57s trunk passed
+1 💚 shadedclient 24m 50s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 26s the patch passed
+1 💚 compile 0m 26s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 0m 26s the patch passed
+1 💚 compile 0m 21s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 0m 21s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 15s the patch passed
+1 💚 mvnsite 0m 24s the patch passed
-1 ❌ javadoc 0m 21s /results-javadoc-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 1 new + 41 unchanged - 0 fixed = 42 total (was 41)
-1 ❌ javadoc 0m 19s /results-javadoc-javadoc-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu120.04-b10 with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu120.04-b10 generated 1 new + 41 unchanged - 0 fixed = 42 total (was 41)
+1 💚 spotbugs 0m 53s the patch passed
+1 💚 shadedclient 24m 20s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 72m 26s /patch-unit-hadoop-tools_hadoop-distcp.txt hadoop-distcp in the patch passed.
+1 💚 asflicense 0m 33s The patch does not generate ASF License warnings.
171m 1s
Reason Tests
Failed junit tests hadoop.tools.TestDistCpWithXAttrs
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3463/1/artifact/out/Dockerfile
GITHUB PR #3463
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 8633cd315b3f 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 850a926
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3463/1/testReport/
Max. process+thread count 722 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3463/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@cnauroth
Copy link
Contributor

One other consideration is the behavior as it stands now for HDFS clusters that have disabled xattrs with dfs.namenode.xattrs.enabled=false. (The default for this property is true in recent versions, and with a lot of people wanting at-rest encryption, it's become less common to set to false.)

In current behavior, the NameNode will throw an IOException (not a more specific subclass) that propagates to the client. DistCp will catch that and throw its own XAttrsNotSupportedException.

With this patch, the NameNode still throws IOException, but now DistCp propagates that straight to the caller instead of throwing XAttrsNotSupportedException. There is a risk that this could cause subtle behavioral differences for higher-layer tooling (Oozie? Airflow?) that coordinates DistCp jobs.

I'm not certain there is a problem, but I agree with @steveloughran that there is reason to be cautious. It's also difficult to anticipate if custom FileSystem implementers have overridden and thrown a different exception, despite the base class providing UnsupportedOperationException out of the box.

@pkumarsinha
Copy link
Author

One other consideration is the behavior as it stands now for HDFS clusters that have disabled xattrs with dfs.namenode.xattrs.enabled=false. (The default for this property is true in recent versions, and with a lot of people wanting at-rest encryption, it's become less common to set to false.)

In current behavior, the NameNode will throw an IOException (not a more specific subclass) that propagates to the client. DistCp will catch that and throw its own XAttrsNotSupportedException.

With this patch, the NameNode still throws IOException, but now DistCp propagates that straight to the caller instead of throwing XAttrsNotSupportedException. There is a risk that this could cause subtle behavioral differences for higher-layer tooling (Oozie? Airflow?) that coordinates DistCp jobs.

I'm not certain there is a problem, but I agree with @steveloughran that there is reason to be cautious. It's also difficult to anticipate if custom FileSystem implementers have overridden and thrown a different exception, despite the base class providing UnsupportedOperationException out of the box.

Thanks for the review @cnauroth. I would agree on your observation on behavioral difference on dfs.namenode.xattrs.enabled being 'false'.
@steveloughran / @cnauroth Is there any recommendation here for the patch? My only concern is and the one I have faced as well is when dfs.namenode.xattrs.enabled='true' , the Xattr was supported by FS impl (DistributedFileSystem.java) and despite I got the error that Xattr is not supported. This hides the actual cause (probably an intermittent network issue or so).

@steveloughran
Copy link
Contributor

  1. What was the FS playing up? HDFS?
  2. Given chris's feedback, I think we need to leave things as is.

We have to consider exceptions to be part of that public API. Worse, sometimes the text message is part of that API. Distcp is sometimes wrapped by applications (hive,...) so they may be looking for the explicit exception type.

@cnauroth
Copy link
Contributor

cnauroth commented Oct 1, 2021

My only other thought is that this sort of logic is the kind of thing that should eventually make its way over to using PathCapabilities. For complete accuracy though, the HDFS client implementation would need its hasPathCapability implementation to have knowledge of whether or not the remote NameNode has xattrs enabled. Considering federation, you might even have different namespaces enabled or disabled, so the final answer is dependent on the path presented by the caller. We could avoid that complexity if we removed the configuration property for disabling xattrs. (It's a stable feature at this point.) That would be a backward-incompatible change though.

I've tried more brainstorming on this, and unfortunately, I can't think of a solution with high confidence in guaranteeing backward compatibility.

We've focused discussion on xattrs so far, but there is similar logic for ACLs too that is susceptible to the same problems.

@github-actions
Copy link
Contributor

We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working on it, please feel free to re-open it and ask for a committer to remove the stale tag and review again.
Thanks all for your contribution.

@github-actions github-actions bot added the Stale label Nov 23, 2025
@github-actions github-actions bot closed this Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants