Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18013. ABFS: add cloud trash policy with per-schema policy selection #4729

Draft
wants to merge 4 commits into
base: trunk
Choose a base branch
from

Conversation

steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Aug 10, 2022

New trash policies, and a schema specific trash policy set by

fs.SCHEMA.trash.policy.

This lets clusters declare different policies for different stores
in the same cluster.

  • CloudTrashPolicy: for abfs with rename failure resilience and auto cleanup of old checkpoints.
  • DeleteFilesTrashPolicy: for versioned s3 buckets; delete the files.
  • maybe for s3 we should get an enum of all the files + versions, i.e. deep list and save that to trash, so we know what to restore? list() gives version info if you cast; we can build/save a manifest (avro?) so that restore is a matter of using the relevant recovery API or explicitly copying somewhere else

How was this patch tested?

what do you mean, tested?

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@steveloughran steveloughran marked this pull request as draft August 10, 2022 18:45
New trash policies, and a schema specific trash policy set by
fs.SCHEMA.trash.policy.

This lets clusters declare different policies for different stores
in the same cluster.

Change-Id: I8f4c478ca4d7b763a4499e80b2fe76f4777af054
ResilientTrashPolicy: for abfs with rename failure resilience
DeleteFilesTrashPolicy: for versioned s3 buckets; delete the files.
...as RawLocalFS doesn't have a schema

Change-Id: I2f0983e7ea67f6cef71e24ebe666e0ff652a85b4
@steveloughran steveloughran force-pushed the azure/HADOOP-18013-resilient-trash-policy branch from 054a651 to 4cd392b Compare September 7, 2022 16:48
Change-Id: Ie3e676ed023f573162c2de9dd9518d1273ab1215
Stats collection; option to cleanup.
ABFS configured to collect the matching stats.

no tests/docs

Change-Id: I324bf687da1841354748a2d479c287594486dc58
@steveloughran steveloughran changed the title HADOOP-18013. ABFS: add resilient trash policy. HADOOP-18013. ABFS: add cloud trash policy with per-schema policy selection Sep 8, 2022
@steveloughran
Copy link
Contributor Author

Still a WiP but should interest people working with abfs/gcs and to a lesser degree s3a

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 45s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+0 🆗 mvndep 15m 15s Maven dependency ordering for branch
+1 💚 mvninstall 29m 18s trunk passed
+1 💚 compile 25m 25s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 compile 21m 37s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 4m 14s trunk passed
+1 💚 mvnsite 3m 17s trunk passed
+1 💚 javadoc 2m 44s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 2m 6s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 4m 55s trunk passed
+1 💚 shadedclient 22m 7s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 31s Maven dependency ordering for patch
+1 💚 mvninstall 1m 44s the patch passed
+1 💚 compile 24m 9s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
-1 ❌ javac 24m 9s /results-compile-javac-root-jdkUbuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04.txt root-jdkUbuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 generated 5 new + 2850 unchanged - 4 fixed = 2855 total (was 2854)
+1 💚 compile 21m 23s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
-1 ❌ javac 21m 23s /results-compile-javac-root-jdkPrivateBuild-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07.txt root-jdkPrivateBuild-1.8.0_342-8u342-b07-0ubuntu120.04-b07 with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu120.04-b07 generated 4 new + 2649 unchanged - 3 fixed = 2653 total (was 2652)
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 4m 6s /results-checkstyle-root.txt root: The patch generated 5 new + 73 unchanged - 4 fixed = 78 total (was 77)
+1 💚 mvnsite 3m 8s the patch passed
-1 ❌ javadoc 1m 23s /results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04.txt hadoop-common-project_hadoop-common-jdkUbuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0)
+1 💚 javadoc 2m 8s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
-1 ❌ spotbugs 3m 3s /new-spotbugs-hadoop-common-project_hadoop-common.html hadoop-common-project/hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 shadedclient 22m 20s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 19m 12s hadoop-common in the patch passed.
+1 💚 unit 2m 34s hadoop-azure in the patch passed.
+1 💚 asflicense 1m 24s The patch does not generate ASF License warnings.
246m 20s
Reason Tests
SpotBugs module:hadoop-common-project/hadoop-common
Dead store to dir in org.apache.hadoop.fs.TrashPolicyDefault.deleteCheckpoint(Path) At TrashPolicyDefault.java:org.apache.hadoop.fs.TrashPolicyDefault.deleteCheckpoint(Path) At TrashPolicyDefault.java:[line 412]
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4729/3/artifact/out/Dockerfile
GITHUB PR #4729
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 0cebf7eb06c2 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 5a16dea
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4729/3/testReport/
Max. process+thread count 2928 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-azure U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4729/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

trashRoot -> {
try {
count.addAndGet(deleteCheckpoint(trashRoot.getPath(), false));
createCheckpoint(trashRoot.getPath(), new Date(now));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moveToTrash() will be called by thousands of clients. IIRC, a new snapshot will be created, as long as the CURRENT dir exists. super.moveToTrash() will create the CURRENT dir if it does not exist. So, I'd image every moveToTrash() would create a new checkpoint which is probably not ideal.

Each client will also try to delete the same set of snapshots. I'd image some of clients will fail due to FILE_NOT_FOUND exception, because a checkpoint dir is removed by other clients.

Cleaning is something we need to handle for trash and if we could make this approach work, I would think that would be great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm. good explanation of the problems you see.

that auto cleanup was added because of the reported problem of user home dirs being full. maybe we need to think of better strategies here, even if just hadoop fs -expunge updated to work better in this world.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants