-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18013. ABFS: add cloud trash policy with per-schema policy selection #4729
base: trunk
Are you sure you want to change the base?
HADOOP-18013. ABFS: add cloud trash policy with per-schema policy selection #4729
Conversation
New trash policies, and a schema specific trash policy set by fs.SCHEMA.trash.policy. This lets clusters declare different policies for different stores in the same cluster. Change-Id: I8f4c478ca4d7b763a4499e80b2fe76f4777af054 ResilientTrashPolicy: for abfs with rename failure resilience DeleteFilesTrashPolicy: for versioned s3 buckets; delete the files.
...as RawLocalFS doesn't have a schema Change-Id: I2f0983e7ea67f6cef71e24ebe666e0ff652a85b4
054a651
to
4cd392b
Compare
Change-Id: Ie3e676ed023f573162c2de9dd9518d1273ab1215
Stats collection; option to cleanup. ABFS configured to collect the matching stats. no tests/docs Change-Id: I324bf687da1841354748a2d479c287594486dc58
Still a WiP but should interest people working with abfs/gcs and to a lesser degree s3a |
💔 -1 overall
This message was automatically generated. |
trashRoot -> { | ||
try { | ||
count.addAndGet(deleteCheckpoint(trashRoot.getPath(), false)); | ||
createCheckpoint(trashRoot.getPath(), new Date(now)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moveToTrash() will be called by thousands of clients. IIRC, a new snapshot will be created, as long as the CURRENT dir exists. super.moveToTrash() will create the CURRENT dir if it does not exist. So, I'd image every moveToTrash() would create a new checkpoint which is probably not ideal.
Each client will also try to delete the same set of snapshots. I'd image some of clients will fail due to FILE_NOT_FOUND exception, because a checkpoint dir is removed by other clients.
Cleaning is something we need to handle for trash and if we could make this approach work, I would think that would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm. good explanation of the problems you see.
that auto cleanup was added because of the reported problem of user home dirs being full. maybe we need to think of better strategies here, even if just hadoop fs -expunge updated to work better in this world.
New trash policies, and a schema specific trash policy set by
fs.SCHEMA.trash.policy.
This lets clusters declare different policies for different stores
in the same cluster.
CloudTrashPolicy
: for abfs with rename failure resilience and auto cleanup of old checkpoints.DeleteFilesTrashPolicy
: for versioned s3 buckets; delete the files.How was this patch tested?
what do you mean, tested?
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?