-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18656: [ABFS] Adding Support for Paginated Delete for Large Directories in HNS Account #6409
HADOOP-18656: [ABFS] Adding Support for Paginated Delete for Large Directories in HNS Account #6409
Conversation
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
...oop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsPaginatedDelete.java
Outdated
Show resolved
Hide resolved
...oop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsPaginatedDelete.java
Outdated
Show resolved
Hide resolved
...oop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsPaginatedDelete.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsConfiguration.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java
Outdated
Show resolved
Hide resolved
...oop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsPaginatedDelete.java
Outdated
Show resolved
Hide resolved
testRecursiveDeleteWithPaginationInternal(false, true, AUGUST_2023_API_VERSION); | ||
testRecursiveDeleteWithPaginationInternal(false, false, DECEMBER_2019_API_VERSION); | ||
testRecursiveDeleteWithPaginationInternal(false, false, AUGUST_2023_API_VERSION); | ||
testRecursiveDeleteWithPaginationInternal(true, true, DECEMBER_2019_API_VERSION); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the true true combination missing for AUGUST_2023 and true false for DECEMBER 2019 or not added intentionally ? false false as well for AUGUST_2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are not added intentionally. These combinations are enough to test if xMsVersion is set properly or not.
...oop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsPaginatedDelete.java
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsConfiguration.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java
Outdated
Show resolved
Hide resolved
...oop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsPaginatedDelete.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the comments. LGTM!
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Hi @steveloughran @mukund-thakur Thanks |
hey, in #6494 i'm drafting a bulk delete API where the caller (iceberg etc) can give a list of file paths for deletion with no guarantees about safety checks, parent dirs existing afterwards etc. Would this work here too? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does the pagination work here? do repeated calls need to be made? if so, where is this done? it wasn't immediately obvious to me.
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/AbfsHttpConstants.java
Show resolved
Hide resolved
@@ -259,9 +259,9 @@ AbfsThrottlingIntercept getIntercept() { | |||
return intercept; | |||
} | |||
|
|||
List<AbfsHttpHeader> createDefaultHeaders() { | |||
List<AbfsHttpHeader> createDefaultHeaders(ApiVersion xMsVersion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can you add a Javadoc here?
- as most invocations use the xMsVersion value, why not retain the original signature as an overloaded invocation -avoids having to change so much of the code.
- is this package private just for testing? if so, let's mark it as @VisibleForTesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken all
TracingContext tracingContext) | ||
throws AzureBlobFileSystemException { | ||
final List<AbfsHttpHeader> requestHeaders = createDefaultHeaders(); | ||
|
||
final List<AbfsHttpHeader> requestHeaders |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment explaining what is happening, and move the = sign to this line, leaving
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment
final AbfsUriQueryBuilder abfsUriQueryBuilder = createDefaultUriQueryBuilder(); | ||
|
||
if (isPaginatedDelete(tracingContext, recursive)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if this is a paginated delete but the L1126 api version test doesn't hold? Is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a paginated delete, then API version change condition will only fail if current version is greater than AUG_03_2023.
In that case we can go ahead with Current API Version only as Azure APIs are backward compatible,
@@ -31,4 +33,13 @@ public static void setIsNamespaceEnabled(final AbfsClient abfsClient, final Bool | |||
public static void setEncryptionContextProvider(final AbfsClient abfsClient, final EncryptionContextProvider provider) { | |||
abfsClient.setEncryptionContextProvider(provider); | |||
} | |||
|
|||
public static String getHeaderValue(List<AbfsHttpHeader> reqHeaders, String headerName) { | |||
for (AbfsHttpHeader header : reqHeaders) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could probably do something involving java 8 streaming/filtering here if you wanted to...
...oop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsPaginatedDelete.java
Show resolved
Hide resolved
...oop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsPaginatedDelete.java
Show resolved
Hide resolved
AbfsRestOperationException e = intercept(AbfsRestOperationException.class, () -> | ||
spiedClient.getPathStatus(testPath.toString(), false, testTracingContext, null)); | ||
Assertions.assertThat(e.getStatusCode()) | ||
.describedAs("Path should have been deleted").isEqualTo(HTTP_NOT_FOUND); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Factor this assertion out into a method here and below, something like
assertStatusCode(AbfsRestOperationException, int)
- include e.toString() in message, or just rethrow the exception with its full stack trace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
Factored out to new method.
Added the e.toString() in message so that it will be a part of assertion error in case assertion fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken
...oop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsPaginatedDelete.java
Show resolved
Hide resolved
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/AbfsHttpConstants.java
Show resolved
Hide resolved
Paginated delete will work somehow similar to how recursive delete works for FNS accounts. For HNS Accounts, recursive delete is supposed to be a O(1) operation i.e deleting the folder itself. But before deleting the whole folder, server needs to do ACL checks on all the children of that folder which is not O(1). If the directory is large, this ACL check can take some time and request can timeout. To avoid this, server will return a continuation token if ACL check is still pending. Client need to loop delete on this continuation token similar to how it loop delete for FNS account recursive delete. Difference is that for FNS every call does delete some objects, in HNS it only performs ACL checks and actual delete of directory happens on last delete call of loop. Hope that explains. Repeated calls are made in abfsStore.deletePath() where continuation token check is present. This is common code for FNS and HNS. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
From what I understood from your PR, this seems a bit different from bulk delete. Paginated delete will be supported here but caller won't be able to specify a list of paths that they want to delete. Only one path that too a directory path can be passes and everything inside that directory will be deleted. Pagination here is only for performing ACL checks and not actual delete. Delete will still be a single operation performed after ACL check is completed on whole directory listing. In case ACL checks fails in between after a few pages, whole delete operation will fail it won't delete any object. This way this is an atomic delete. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy made a couple of comments, but not enough to block the merge
@@ -1240,8 +1244,8 @@ public boolean getRenameResilience() { | |||
return renameResilience; | |||
} | |||
|
|||
void setRenameResilience(boolean actualResilience) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason to cut this? I presume it means no tests are using it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this code was not used any where so removed it
Assertions.assertThat(urlUsed) | ||
.describedAs("Url must have paginated = true as query param") | ||
.contains(QUERY_PARAM_PAGINATED); | ||
if (xMsVersion.compareTo(AbfsHttpConstants.ApiVersion.AUG_03_2023) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is going to need revision in future. just be aware
merged to trunk. @anujmodi2021 can you do a PR and retest with branch -then I will merge it there too. |
…tories in HNS Account (apache#6409) Contributed by Anuj Modi
Description of PR
Jira Ticket: https://issues.apache.org/jira/browse/HADOOP-18656
Today, when a recursive delete is issued for a large directory in ADLS Gen2 (HNS) account, the directory deletion happens in O(1) but in backend ACL Checks are done recursively for each object inside that directory which in case of large directory could lead to request time out. Pagination is introduced in the Azure Storage Backend for these ACL checks.
More information on how pagination works can be found on public documentation of Azure Delete Path API.
This PR contains changes to support this from client side. To trigger pagination, client needs to add a new query parameter "paginated" and set it to true along with recursive set to true. In return if the directory is large, server might return a continuation token back to the caller. If caller gets back a continuation token, it has to call the delete API again with continuation token along with recursive and pagination set to true. This is similar to directory delete of FNS account.
Pagination is available only in versions "2023-08-03" onwards.
PR also contains functional tests to verify driver works well with different combinations of recursive and pagination features for both HNS and FNS account.
Full E2E testing of pagination requires large dataset to be created and hence not added as part of driver test suite. But extensive E2E testing has been performed.
How was this patch tested?
New tests added and existing tests ran.
:::: AGGREGATED TEST RESULT ::::
HNS-OAuth
[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 41
HNS-SharedKey
[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 41
NonHNS-SharedKey
[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 11
[INFO] Results:
[INFO]
[WARNING] Tests run: 589, Failures: 0, Errors: 0, Skipped: 266
[INFO] Results:
[INFO]
[WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 44
AppendBlob-HNS-OAuth
[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 41
Time taken: 25 mins 38 secs.