Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18869: [ABFS] Fixing Behavior of a File System APIs on root path #6003

Merged
merged 5 commits into from Oct 9, 2023

Conversation

anujmodi2021
Copy link
Contributor

@anujmodi2021 anujmodi2021 commented Aug 30, 2023

Description of PR

Jira Ticket: https://issues.apache.org/jira/browse/HADOOP-18869

Following changes are included in this PR:

  1. HDFS.create() was failing on root path with Invalid URL. As a fix a check is added to avoid network call if root path is passed. Instead, an Exception will be thrown with relevant message.
  2. HDFS.createNonRecursive() was failing on root path with NullPointerException. As a fix a check is added to avoid network call if root path is passed. Instead, an Exception will be thrown with relevant message.
  3. HDFS.getXAttr() was failing for root path with Invalid URL. As a fix support for setting File System Properties added if root path is passed.
  4. HDFS.setXAttr() was failing for root path with Invalid URL. As a fix support for getting File System Properties added if root path is passed.
  5. Tests for getsetXAttr() were trying to set Unicode characters in attribute value encoded in ISO_8859-1 which does not support unicode, hence the value set by setXAttr() were going as undefined character. getXAttr() also retrieved them as undefined characters only. As a fix, assertions were changed to compare the both encoded and decoded values.
    Also, ABFS used DFS endpoint to interact with Azure services. As per the official documentation, these operations on DFS endpoint supports only ASCII characters. As a ix tests were modified to work only with ASCII characters.

How was this patch tested?

Existing tests were modified and more tests were added.
Whole test suite was ran to verify the changes. Test results pasted below.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 13m 22s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 42m 53s trunk passed
+1 💚 compile 0m 42s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 39s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 36s trunk passed
+1 💚 mvnsite 0m 45s trunk passed
+1 💚 javadoc 0m 42s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 38s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 9s trunk passed
-1 ❌ shadedclient 32m 43s branch has errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 31s the patch passed
+1 💚 compile 0m 30s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 30s the patch passed
+1 💚 compile 0m 28s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 28s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 21s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1)
+1 💚 mvnsite 0m 31s the patch passed
+1 💚 javadoc 0m 27s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 27s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 2s the patch passed
-1 ❌ shadedclient 33m 7s patch has errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 27s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 38s The patch does not generate ASF License warnings.
138m 38s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6003/1/artifact/out/Dockerfile
GITHUB PR #6003
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux e1b72d335f63 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 12dab81
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6003/1/testReport/
Max. process+thread count 412 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6003/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed; all comments cover the tests. production code LGTM

final AzureBlobFileSystem fs = getFileSystem();
Path testFile = path(AbfsHttpConstants.ROOT_PATH);
try {
fs.create(testFile, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use LambaTestUtils.intercept.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken

@@ -146,6 +149,26 @@ public void testCreateNonRecursive2() throws Exception {
assertIsFile(fs, testFile);
}

@Test
public void testCreateOnRoot() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AbstractContractRootDirectoryTest.testCreateFileOverRoot() shoud be doing some of this.

  1. are those tests running? that's ITestAbfsFileSystemContractRootDirectory
  2. can you add the createNonRecursive test there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and for some reason these tests are getting skipped even after adding the required configuration. Will debug more on this and get back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I checked further and found the tests were blocked because of the setting here: https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/test/resources/abfs.xml#L20

We have to set the required config fs.contract.test.root-tests-enabled as true here for these tests to run

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah. well, do that then. it's off by default to stop people accidentally deleting their local disk and then complaining. (this has never happened, but...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense..
Set the config to true.

Also, I was trying to add createNonRecursive in AbstractContractRootDirectoryTest.
Looks like ContractTestUtils does not have support for this HDFS API.
Directly getting File System instance and calling createNonRecursive could cause failure when this test wil run with file systems other than ABFS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. but at the same time, adding it will see what breaks...if you touch a file in the hdfs source tree (say its contract xml file...it can be left out the final commit) then we see what it does, and for s3a we will find out later and fix.

// Check if the attribute is retrievable
fs.setListenerOperation(FSOperationType.GET_ATTR);
byte[] rv = fs.getXAttr(testPath, attributeName1);
assertTrue(Arrays.equals(rv, attributeValue1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AssertJ assertThat() supports array equality, so use it. and add a description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken

fs.setListenerOperation(FSOperationType.GET_ATTR);
byte[] rv = fs.getXAttr(testPath, attributeName1);
assertTrue(Arrays.equals(rv, attributeValue1));
assertEquals(decodedAttributeValue1, fs.getAbfsStore().decodeAttribute(rv));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added


// Check all the attributes present and previous Attribute not overridden
rv = fs.getXAttr(testPath, attributeName1);
assertTrue(Arrays.equals(rv, attributeValue1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments as above. remember i always insist on a description, and want assertJ for anything where it makes sense. save time by doing this from the start

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for missing this out...
Will remember this before creating a PR in future

if (f.isRoot()) {
throw new AbfsRestOperationException(HTTP_CONFLICT,
AzureServiceErrorCode.PATH_CONFLICT.getErrorCode(),
"Cannot create file on the root path",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a constant string used in both places

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 40s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
-1 ❌ mvninstall 56m 4s /branch-mvninstall-root.txt root in trunk failed.
+1 💚 compile 0m 43s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 39s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 35s trunk passed
+1 💚 mvnsite 0m 42s trunk passed
+1 💚 javadoc 0m 42s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 37s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 10s trunk passed
+1 💚 shadedclient 33m 54s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 0m 32s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 32s the patch passed
+1 💚 compile 0m 28s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 28s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 21s the patch passed
+1 💚 mvnsite 0m 31s the patch passed
+1 💚 javadoc 0m 28s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 26s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 4s the patch passed
+1 💚 shadedclient 34m 4s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 19s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 38s The patch does not generate ASF License warnings.
140m 59s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6003/2/artifact/out/Dockerfile
GITHUB PR #6003
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 4f3f529c3498 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / b9c2eac
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6003/2/testReport/
Max. process+thread count 672 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6003/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggested test changes

@@ -146,6 +149,26 @@ public void testCreateNonRecursive2() throws Exception {
assertIsFile(fs, testFile);
}

@Test
public void testCreateOnRoot() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah. well, do that then. it's off by default to stop people accidentally deleting their local disk and then complaining. (this has never happened, but...)

public void testCreateOnRoot() throws Exception {
final AzureBlobFileSystem fs = getFileSystem();
Path testFile = path(AbfsHttpConstants.ROOT_PATH);
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you shouldn't need the double catch. intercept() will return the caught exception, type case to the class of arg 1, so can be asserted on. and if an exception is not the one expected, the stack trace is too important to lose. so rethrow it or use as the cause of an assertion error

AbfsRestOperationException e = intercept(...)
if (e.getStatusCode()!=HTTP_CONFLICT) {
 // rethrow if its not the expected one.
 throw e;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 38s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 44m 19s trunk passed
+1 💚 compile 0m 42s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 39s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 36s trunk passed
+1 💚 mvnsite 0m 44s trunk passed
+1 💚 javadoc 0m 43s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 38s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 13s trunk passed
+1 💚 shadedclient 34m 2s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 31s the patch passed
+1 💚 compile 0m 31s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 31s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 29s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 21s the patch passed
+1 💚 mvnsite 0m 31s the patch passed
+1 💚 javadoc 0m 28s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 26s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 4s the patch passed
+1 💚 shadedclient 33m 52s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 18s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 41s The patch does not generate ASF License warnings.
129m 33s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6003/3/artifact/out/Dockerfile
GITHUB PR #6003
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux f667ee7c797e 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / acf18d3
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6003/3/testReport/
Max. process+thread count 680 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6003/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
+1

@@ -146,6 +149,26 @@ public void testCreateNonRecursive2() throws Exception {
assertIsFile(fs, testFile);
}

@Test
public void testCreateOnRoot() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. but at the same time, adding it will see what breaks...if you touch a file in the hdfs source tree (say its contract xml file...it can be left out the final commit) then we see what it does, and for s3a we will find out later and fix.

@steveloughran steveloughran merged commit 6c6df40 into apache:trunk Oct 9, 2023
4 checks passed
@steveloughran
Copy link
Contributor

merge to trunk..as usual do a pr and retest against branch-3.3 for a merge there

@steveloughran
Copy link
Contributor

ok, i just gave the wrong credit. will revert and reappy with the correct author. sorry, its just I had Anmol on clipboard

asfgit pushed a commit that referenced this pull request Oct 9, 2023
…ot path (#6003)"

This reverts commit 6c6df40.

...so as to give the correct credit
asfgit pushed a commit that referenced this pull request Oct 9, 2023
anujmodi2021 added a commit to anujmodi2021/hadoop that referenced this pull request Nov 15, 2023
jiajunmao pushed a commit to jiajunmao/hadoop-MLEC that referenced this pull request Feb 6, 2024
jiajunmao pushed a commit to jiajunmao/hadoop-MLEC that referenced this pull request Feb 6, 2024
…ot path (apache#6003)"

This reverts commit 6c6df40.

...so as to give the correct credit
jiajunmao pushed a commit to jiajunmao/hadoop-MLEC that referenced this pull request Feb 6, 2024
steveloughran pushed a commit that referenced this pull request Mar 25, 2024
…r() on root path (#6592)


This reverts most of
HADOOP-18869: [ABFS] Fix behavior of a File System APIs on root path (#6003).

Calling getXAttr("/") or setXAttr("/") on an abfs container will fail with

`Operation failed: "The request URI is invalid.", HTTP 400 Bad Request`

 
This change is to ensure:
* Consistency across ADLS clients
* Consistency across authentication mechanisms.

Contributed by Anuj Modi
steveloughran pushed a commit that referenced this pull request Mar 25, 2024
…r() on root path (#6592)

This reverts most of
HADOOP-18869: [ABFS] Fix behavior of a File System APIs on root path (#6003).

Calling getXAttr("/") or setXAttr("/") on an abfs container will fail with

`Operation failed: "The request URI is invalid.", HTTP 400 Bad Request`

This change is to ensure:
* Consistency across ADLS clients
* Consistency across authentication mechanisms.

Contributed by Anuj Modi
@anujmodi2021 anujmodi2021 deleted the HADOOP-18869-FixRootOp branch April 17, 2024 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants