Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18656: [Backport to 3.4] [ABFS] Adding Support for Paginated Delete for Large Directories in HNS Account #6718

Merged

Conversation

anujmodi2021
Copy link
Contributor

@anujmodi2021 anujmodi2021 commented Apr 10, 2024

Description of PR
Jira Ticket: https://issues.apache.org/jira/browse/HADOOP-18656
Commit from trunk: 6ed7389

Today, when a recursive delete is issued for a large directory in ADLS Gen2 (HNS) account, the directory deletion happens in O(1) but in backend ACL Checks are done recursively for each object inside that directory which in case of large directory could lead to request time out. Pagination is introduced in the Azure Storage Backend for these ACL checks.

More information on how pagination works can be found on public documentation of Azure Delete Path API.

This PR contains changes to support this from client side. To trigger pagination, client needs to add a new query parameter "paginated" and set it to true along with recursive set to true. In return if the directory is large, server might return a continuation token back to the caller. If caller gets back a continuation token, it has to call the delete API again with continuation token along with recursive and pagination set to true. This is similar to directory delete of FNS account.

Pagination is available only in versions "2023-08-03" onwards.
PR also contains functional tests to verify driver works well with different combinations of recursive and pagination features for both HNS and FNS account.
Full E2E testing of pagination requires large dataset to be created and hence not added as part of driver test suite. But extensive E2E testing has been performed.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 32s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 6 new or modified test files.
_ branch-3.4 Compile Tests _
+1 💚 mvninstall 44m 29s branch-3.4 passed
+1 💚 compile 0m 38s branch-3.4 passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 compile 0m 36s branch-3.4 passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 checkstyle 0m 32s branch-3.4 passed
+1 💚 mvnsite 0m 41s branch-3.4 passed
+1 💚 javadoc 0m 39s branch-3.4 passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 0m 36s branch-3.4 passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 1m 7s branch-3.4 passed
+1 💚 shadedclient 33m 11s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 28s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javac 0m 29s the patch passed
+1 💚 compile 0m 27s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 javac 0m 27s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 20s the patch passed
+1 💚 mvnsite 0m 30s the patch passed
+1 💚 javadoc 0m 26s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 0m 25s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 1m 5s the patch passed
+1 💚 shadedclient 33m 19s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 30s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 50s The patch does not generate ASF License warnings.
128m 24s
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6718/1/artifact/out/Dockerfile
GITHUB PR #6718
JIRA Issue HADOOP-18656
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 9920691b15e4 5.15.0-101-generic #111-Ubuntu SMP Tue Mar 5 20:16:58 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.4 / b96fbd7
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6718/1/testReport/
Max. process+thread count 629 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6718/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@anujmodi2021
Copy link
Contributor Author


:::: AGGREGATED TEST RESULT ::::

============================================================
HNS-OAuth

[ERROR] testListPathWithValueGreaterThanServerMaximum(org.apache.hadoop.fs.azurebfs.ITestAbfsClient) Time elapsed: 290.912 s <<< FAILURE!

[ERROR] test_120_terasort(org.apache.hadoop.fs.azurebfs.commit.ITestAbfsTerasort) Time elapsed: 4.531 s <<< ERROR!

[WARNING] Tests run: 137, Failures: 0, Errors: 0, Skipped: 2
[ERROR] Tests run: 623, Failures: 1, Errors: 0, Skipped: 73
[ERROR] Tests run: 380, Failures: 0, Errors: 1, Skipped: 55

============================================================
HNS-SharedKey

[ERROR] testListPathWithValueGreaterThanServerMaximum(org.apache.hadoop.fs.azurebfs.ITestAbfsClient) Time elapsed: 237.663 s <<< FAILURE!

[WARNING] Tests run: 137, Failures: 0, Errors: 0, Skipped: 3
[ERROR] Tests run: 623, Failures: 1, Errors: 0, Skipped: 28
[WARNING] Tests run: 380, Failures: 0, Errors: 0, Skipped: 41

============================================================
NonHNS-SharedKey

[WARNING] Tests run: 137, Failures: 0, Errors: 0, Skipped: 9
[WARNING] Tests run: 607, Failures: 0, Errors: 0, Skipped: 269
[WARNING] Tests run: 380, Failures: 0, Errors: 0, Skipped: 44

============================================================
AppendBlob-HNS-OAuth

[ERROR] testCloseOfDataBlockOnAppendComplete(org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAppend) Time elapsed: 9.219 s <<< FAILURE!
[ERROR] testListPathWithValueGreaterThanServerMaximum(org.apache.hadoop.fs.azurebfs.ITestAbfsClient) Time elapsed: 226.54 s <<< FAILURE!
[ERROR] testAbfsStreamOps(org.apache.hadoop.fs.azurebfs.ITestAbfsStreamStatistics) Time elapsed: 5.942 s <<< FAILURE!

[ERROR] testExpect100ContinueFailureInAppend(org.apache.hadoop.fs.azurebfs.services.ITestAbfsOutputStream) Time elapsed: 5.002 s <<< ERROR!
[ERROR] testAppendWithChecksumAtDifferentOffsets(org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemChecksum) Time elapsed: 6.037 s <<< ERROR!
[ERROR] testTwoWritersCreateAppendNoInfiniteLease(org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemLease) Time elapsed: 3.717 s <<< ERROR!
[ERROR] test_120_terasort(org.apache.hadoop.fs.azurebfs.commit.ITestAbfsTerasort) Time elapsed: 4.503 s <<< ERROR!

[WARNING] Tests run: 137, Failures: 0, Errors: 0, Skipped: 2
[ERROR] Tests run: 623, Failures: 2, Errors: 3, Skipped: 73
[ERROR] Tests run: 380, Failures: 1, Errors: 1, Skipped: 79

Time taken: 60 mins 25 secs.

@anujmodi2021
Copy link
Contributor Author

@steveloughran
This is good to merge.
The test failures here are known and fixed in PR: #6676

@anujmodi2021 anujmodi2021 marked this pull request as ready for review April 11, 2024 08:53
@anujmodi2021
Copy link
Contributor Author

@steveloughran, @mukund-thakur...
Requesting you to please get this merged.

@mukund-thakur
Copy link
Contributor

there are conflicts here after the test patch has been merged.

@anujmodi2021
Copy link
Contributor Author


:::: AGGREGATED TEST RESULT ::::

============================================================
HNS-OAuth

[WARNING] Tests run: 137, Failures: 0, Errors: 0, Skipped: 2
[WARNING] Tests run: 623, Failures: 0, Errors: 0, Skipped: 73
[WARNING] Tests run: 380, Failures: 0, Errors: 0, Skipped: 54

============================================================
HNS-SharedKey

[WARNING] Tests run: 137, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 623, Failures: 0, Errors: 0, Skipped: 28
[WARNING] Tests run: 380, Failures: 0, Errors: 0, Skipped: 41

============================================================
NonHNS-SharedKey

[WARNING] Tests run: 137, Failures: 0, Errors: 0, Skipped: 9
[WARNING] Tests run: 607, Failures: 0, Errors: 0, Skipped: 268
[WARNING] Tests run: 380, Failures: 0, Errors: 0, Skipped: 44

============================================================
AppendBlob-HNS-OAuth

[WARNING] Tests run: 137, Failures: 0, Errors: 0, Skipped: 2
[WARNING] Tests run: 623, Failures: 0, Errors: 0, Skipped: 75
[WARNING] Tests run: 380, Failures: 0, Errors: 0, Skipped: 78

Time taken: 57 mins 2 secs.

@anujmodi2021
Copy link
Contributor Author

there are conflicts here after the test patch has been merged.

Resolved conflicts.
Would be good to have it merged.

Thanks a lot.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 11m 45s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 5 new or modified test files.
_ branch-3.4 Compile Tests _
+1 💚 mvninstall 44m 11s branch-3.4 passed
+1 💚 compile 0m 38s branch-3.4 passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 compile 0m 35s branch-3.4 passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 checkstyle 0m 32s branch-3.4 passed
+1 💚 mvnsite 0m 42s branch-3.4 passed
+1 💚 javadoc 0m 40s branch-3.4 passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 0m 34s branch-3.4 passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 1m 7s branch-3.4 passed
+1 💚 shadedclient 34m 31s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 34m 52s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 0m 30s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javac 0m 30s the patch passed
+1 💚 compile 0m 27s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 javac 0m 27s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 20s the patch passed
+1 💚 mvnsite 0m 30s the patch passed
+1 💚 javadoc 0m 26s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 0m 25s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 1m 5s the patch passed
+1 💚 shadedclient 33m 12s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 30s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 37s The patch does not generate ASF License warnings.
140m 37s
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6718/2/artifact/out/Dockerfile
GITHUB PR #6718
JIRA Issue HADOOP-18656
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 1d45262834f9 5.15.0-101-generic #111-Ubuntu SMP Tue Mar 5 20:16:58 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.4 / a9f4e10
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6718/2/testReport/
Max. process+thread count 705 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6718/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@mukund-thakur mukund-thakur merged commit 4e96b8e into apache:branch-3.4 Apr 22, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants