Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18679. Add API for bulk/paged object deletion #6738

Conversation

steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Apr 16, 2024

This is #6726 with an extra patch

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

steveloughran and others added 7 commits April 15, 2024 12:36
A more minimal design that is easier to use and implement.

Caller creates a BulkOperation; they get the page size of
it and then submit batches to delete of less than that size.

The outcome of each call contains a list of failures.

S3A implementation to show how straightforward it is.

Even with the single entry page size, it is still more
efficient to use this as it doesn't try to recreate a
parent dir or perform any probes to see if it is a directory:
it maps straight to a DELETE call.

Change-Id: Ibe8737e7933fe03d39070e7138a710b50c3d60c2
Add methods in FileUtil to take an FS, cast to a BulkDeleteSource
then perform the pageSize/bulkDelete operations.

This is to make reflection based access straightforward: no new interfaces
or types to work with, just two new methods with type-erased lists.

Change-Id: I2d7b1bf8198422de635253fc087ca551a8bc6609
Change-Id: Ib098c07cc1f7747ed1a3131b252656c96c520a75
Using this PR to start with the initial design, implementation
and services offered by having lower-level interaction with S3
pushed down into an S3AStore class, with interface/impl split.

The bulk delete callbacks now to talk to the store, *not* s3afs,
with some minor changes in behaviour (IllegalArgumentException is
raised if root paths / are to be deleted)

Mock tests are failing; I expected that: they are always brittle.

What next? get this in and then move lower level fs ops
over a method calling s3client at a time, or in groups, as appropriate.

The metric of success are:
* all callback classes created in S3A FS can work through the store
* no s3client direct invocation in S3AFS

Change-Id: Ib5bc58991533fd5d13e78426e88b884b6ae5205c
Changing results of method calls, using Tuples.pair() to
return Map.Entry() instances as immutable tuples.

Change-Id: Ibdd5a5b11fe0a57b293b9cb623272e272c8bab69
We are going to need a default FS impl which just
invokes delete(path, false) and maps any IOE to a failure.

Change-Id: If56bca7cb8529ccbfbb1dfa29cedc8287ec980d4
@steveloughran
Copy link
Contributor Author

This is #6726 with another commit

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 20s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 6 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 14s Maven dependency ordering for branch
+1 💚 mvninstall 20m 17s trunk passed
+1 💚 compile 9m 0s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 compile 8m 13s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 checkstyle 2m 6s trunk passed
+1 💚 mvnsite 2m 8s trunk passed
+1 💚 javadoc 1m 43s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 1m 33s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 3m 2s trunk passed
+1 💚 shadedclient 20m 49s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for patch
+1 💚 mvninstall 1m 10s the patch passed
+1 💚 compile 8m 42s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javac 8m 42s the patch passed
+1 💚 compile 8m 17s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 javac 8m 17s the patch passed
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-0 ⚠️ checkstyle 2m 2s /results-checkstyle-root.txt root: The patch generated 206 new + 41 unchanged - 0 fixed = 247 total (was 41)
+1 💚 mvnsite 2m 9s the patch passed
-1 ❌ javadoc 0m 44s /results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0)
-1 ❌ javadoc 0m 29s /results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu120.04-b06 with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu120.04-b06 generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0)
+1 💚 spotbugs 3m 24s the patch passed
+1 💚 shadedclient 20m 48s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 4m 20s hadoop-common in the patch passed.
-1 ❌ unit 2m 39s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch passed.
+1 💚 unit 2m 4s hadoop-azure in the patch passed.
-1 ❌ asflicense 0m 40s /results-asflicense.txt The patch generated 1 ASF License warnings.
148m 19s
Reason Tests
Failed junit tests hadoop.fs.s3a.commit.staging.TestStagingCommitter
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6738/1/artifact/out/Dockerfile
GITHUB PR #6738
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint
uname Linux 17a9c3d6b31b 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 744a643
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6738/1/testReport/
Max. process+thread count 558 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws hadoop-tools/hadoop-azure U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6738/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran steveloughran marked this pull request as draft April 16, 2024 15:58
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
_ Prechecks _
+1 💚 dupname 0m 05s No case conflicting files found.
+0 🆗 codespell 0m 00s codespell was not available.
+0 🆗 detsecrets 0m 00s detect-secrets was not available.
+0 🆗 xmllint 0m 00s xmllint was not available.
+0 🆗 spotbugs 0m 00s spotbugs executables are not available.
+0 🆗 markdownlint 0m 00s markdownlint was not available.
+1 💚 @author 0m 00s The patch does not contain any @author tags.
+1 💚 test4tests 0m 00s The patch appears to include 6 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 3m 11s Maven dependency ordering for branch
+1 💚 mvninstall 90m 04s trunk passed
+1 💚 compile 39m 11s trunk passed
+1 💚 checkstyle 5m 51s trunk passed
-1 ❌ mvnsite 4m 20s /branch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in trunk failed.
+1 💚 javadoc 13m 35s trunk passed
+1 💚 shadedclient 167m 43s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 2m 18s Maven dependency ordering for patch
+1 💚 mvninstall 10m 40s the patch passed
+1 💚 compile 37m 05s the patch passed
+1 💚 javac 37m 05s the patch passed
-1 ❌ blanks 0m 00s /blanks-eol.txt The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 checkstyle 6m 04s the patch passed
-1 ❌ mvnsite 4m 25s /patch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
+1 💚 javadoc 13m 59s the patch passed
+1 💚 shadedclient 177m 47s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ asflicense 5m 31s /results-asflicense.txt The patch generated 1 ASF License warnings.
540m 54s
Subsystem Report/Notes
GITHUB PR #6738
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint
uname MINGW64_NT-10.0-17763 b4a02a5f9adc 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool maven
Personality /c/hadoop/dev-support/bin/hadoop.sh
git revision trunk / 744a643
Default Java Azul Systems, Inc.-1.8.0_332-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/1/testReport/
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws hadoop-tools/hadoop-azure U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/1/console
versions git=2.44.0.windows.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
_ Prechecks _
+1 💚 dupname 0m 05s No case conflicting files found.
+0 🆗 codespell 0m 00s codespell was not available.
+0 🆗 detsecrets 0m 00s detect-secrets was not available.
+0 🆗 xmllint 0m 00s xmllint was not available.
+0 🆗 spotbugs 0m 01s spotbugs executables are not available.
+0 🆗 markdownlint 0m 01s markdownlint was not available.
+1 💚 @author 0m 00s The patch does not contain any @author tags.
+1 💚 test4tests 0m 00s The patch appears to include 6 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 2m 17s Maven dependency ordering for branch
+1 💚 mvninstall 90m 51s trunk passed
+1 💚 compile 40m 09s trunk passed
+1 💚 checkstyle 5m 58s trunk passed
-1 ❌ mvnsite 4m 31s /branch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in trunk failed.
+1 💚 javadoc 13m 57s trunk passed
+1 💚 shadedclient 168m 48s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 2m 17s Maven dependency ordering for patch
+1 💚 mvninstall 10m 38s the patch passed
+1 💚 compile 38m 07s the patch passed
+1 💚 javac 38m 07s the patch passed
-1 ❌ blanks 0m 01s /blanks-eol.txt The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 checkstyle 5m 56s the patch passed
-1 ❌ mvnsite 4m 27s /patch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
+1 💚 javadoc 13m 49s the patch passed
+1 💚 shadedclient 182m 29s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ asflicense 7m 44s /results-asflicense.txt The patch generated 1 ASF License warnings.
550m 15s
Subsystem Report/Notes
GITHUB PR #6738
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint
uname MINGW64_NT-10.0-17763 374a372225c9 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool maven
Personality /c/hadoop/dev-support/bin/hadoop.sh
git revision trunk / 744a643
Default Java Azul Systems, Inc.-1.8.0_332-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/2/testReport/
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws hadoop-tools/hadoop-azure U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/2/console
versions git=2.44.0.windows.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants