Skip to content

Conversation

@manika137
Copy link
Contributor

Description of PR

JIRA: https://issues.apache.org/jira/browse/HADOOP-19795

We do a getPathStatus call during file open for read. This call is primarily used to fetch the file’s metadata properties before the actual read begins.
We are now introducing an optional, config-driven read flow that avoids the getPathStatus call during open and instead derives required metadata from the read response itself.

How was this patch tested?

New tests were added and test suite was run. Adding the results in comments below

@manika137
Copy link
Contributor Author

Test Results

============================================================
HNS-OAuth-DFS

[WARNING] Tests run: 250, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 908, Failures: 0, Errors: 0, Skipped: 222
[WARNING] Tests run: 158, Failures: 0, Errors: 0, Skipped: 8
[WARNING] Tests run: 271, Failures: 0, Errors: 0, Skipped: 23

============================================================
HNS-SharedKey-DFS

[WARNING] Tests run: 250, Failures: 0, Errors: 0, Skipped: 4
[WARNING] Tests run: 911, Failures: 0, Errors: 0, Skipped: 168
[WARNING] Tests run: 158, Failures: 0, Errors: 0, Skipped: 8
[WARNING] Tests run: 271, Failures: 0, Errors: 0, Skipped: 10

============================================================
AppendBlob-HNS-OAuth-DFS

[WARNING] Tests run: 250, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 908, Failures: 0, Errors: 0, Skipped: 233
[WARNING] Tests run: 135, Failures: 0, Errors: 0, Skipped: 9
[WARNING] Tests run: 271, Failures: 0, Errors: 0, Skipped: 23

============================================================
NonHNS-SharedKey-Blob

[WARNING] Tests run: 250, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 758, Failures: 0, Errors: 0, Skipped: 155
[WARNING] Tests run: 158, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 271, Failures: 0, Errors: 0, Skipped: 11

============================================================
NonHNS-OAuth-Blob

[ERROR] Tests run: 250, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 755, Failures: 0, Errors: 0, Skipped: 156
[WARNING] Tests run: 158, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 271, Failures: 0, Errors: 0, Skipped: 24

============================================================
AppendBlob-NonHNS-OAuth-Blob

[WARNING] Tests run: 250, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 750, Failures: 0, Errors: 0, Skipped: 202
[WARNING] Tests run: 135, Failures: 0, Errors: 0, Skipped: 4
[WARNING] Tests run: 271, Failures: 0, Errors: 0, Skipped: 24

============================================================
HNS-Oauth-DFS-IngressBlob

[WARNING] Tests run: 250, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 782, Failures: 0, Errors: 0, Skipped: 231
[WARNING] Tests run: 158, Failures: 0, Errors: 0, Skipped: 8
[WARNING] Tests run: 271, Failures: 0, Errors: 0, Skipped: 23

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 5m 41s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 54m 33s trunk passed
+1 💚 compile 1m 6s trunk passed with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04
+1 💚 compile 1m 4s trunk passed with JDK Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
+1 💚 checkstyle 0m 54s trunk passed
+1 💚 mvnsite 1m 12s trunk passed
-1 ❌ javadoc 1m 2s /branch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt hadoop-azure in trunk failed with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04.
-1 ❌ javadoc 0m 58s /branch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt hadoop-azure in trunk failed with JDK Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04.
+1 💚 spotbugs 1m 46s trunk passed
+1 💚 shadedclient 33m 55s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 34m 29s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 38s the patch passed
+1 💚 compile 0m 36s the patch passed with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 36s the patch passed
+1 💚 compile 0m 38s the patch passed with JDK Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 38s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 23s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 16 new + 2 unchanged - 0 fixed = 18 total (was 2)
+1 💚 mvnsite 0m 42s the patch passed
-1 ❌ javadoc 0m 30s /patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt hadoop-azure in the patch failed with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04.
-1 ❌ javadoc 0m 30s /patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt hadoop-azure in the patch failed with JDK Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04.
-1 ❌ spotbugs 1m 28s /new-spotbugs-hadoop-tools_hadoop-azure.html hadoop-tools/hadoop-azure generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 shadedclient 32m 42s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 3m 22s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
146m 41s
Reason Tests
SpotBugs module:hadoop-tools/hadoop-azure
Inconsistent synchronization of org.apache.hadoop.fs.azurebfs.services.AbfsInputStream.contentLength; locked 80% of time Unsynchronized access at AbfsInputStream.java:80% of time Unsynchronized access at AbfsInputStream.java:[line 629]
Subsystem Report/Notes
Docker ClientAPI=1.53 ServerAPI=1.53 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8212/1/artifact/out/Dockerfile
GITHUB PR #8212
JIRA Issue HADOOP-19795
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux a1c89b292bca 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / e13fd56
Default Java Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8212/1/testReport/
Max. process+thread count 570 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8212/1/console
versions git=2.25.1 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

getClient().getEncryptionType() != EncryptionType.ENCRYPTION_CONTEXT
|| ((VersionedFileStatus) fileStatus).getEncryptionContext()
!= null)) {
getClient().getEncryptionType() != EncryptionType.ENCRYPTION_CONTEXT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additional space changes can be reverted

contextEncryptionAdapter = new ContextProviderEncryptionAdapter(
getClient().getEncryptionContextProvider(), getRelativePath(path),
fileEncryptionContext.getBytes(StandardCharsets.UTF_8));
getClient().getEncryptionContextProvider(), getRelativePath(path),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

encryptionContext.getBytes(StandardCharsets.UTF_8));
}
} else {
if (parseIsDirectory(resourceType)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be moved to common part as is getting checked in both the cases

* - restrictGpsOnOpenFile config is enabled with null FileStatus and encryptionType not as ENCRYPTION_CONTEXT
* In this case, we don't need to call GetPathStatus API.
*/
else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this lead to going ahead and opening the stream without checks? Do we fail later for this case ?

INVALID_APPEND_OPERATION("InvalidAppendOperation", HttpURLConnection.HTTP_CONFLICT, null),
UNAUTHORIZED_BLOB_OVERWRITE("UnauthorizedBlobOverwrite", HttpURLConnection.HTTP_FORBIDDEN,
"This request is not authorized to perform blob overwrites."),
INVALID_RANGE("InvalidRange", 416,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

416 should come from a constant defined in HttpURLConnection class

// Reset Read Type back to normal and set again based on code flow.
getTracingContext().setReadType(ReadType.NORMAL_READ);
if (shouldAlwaysReadBufferSize()) {
if(shouldRestrictGpsOnOpenFile() && isFirstRead()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after if

// Reset Read Type back to normal and set again based on code flow.
getTracingContext().setReadType(ReadType.NORMAL_READ);
if (shouldAlwaysReadBufferSize()) {
if(shouldRestrictGpsOnOpenFile() && isFirstRead()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment for this condition as well

private final int footerReadSize; // default buffer size to read when reading footer
private final int readAheadQueueDepth; // initialized in constructor
private final String eTag; // eTag of the path when InputStream are created
private String eTag; // eTag of the path when InputStream are created
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: InputStream is created

}
}

String getRelativePath(final Path path) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadoc missing

tracingContext,
contextEncryptionAdapter).getResult();

String resourceType =
Copy link
Contributor

@anmolanmol1234 anmolanmol1234 Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use client.checkIsDir method which handles null case as well

}
contentLength = Long.parseLong(op.getResult().getResponseHeader(HttpHeaderConfigurations.CONTENT_RANGE).
split(AbfsHttpConstants.FORWARD_SLASH)[1]);
eTag = op.getResult().getResponseHeader("ETag");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use extractEtagHeader method

if (Objects.equals(resourceType, DIRECTORY)) {
throw directoryReadException();
}
contentLength = Long.parseLong(op.getResult().getResponseHeader(HttpHeaderConfigurations.CONTENT_RANGE).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same use extractContentLen method

if (ere.getStatusCode() == HttpURLConnection.HTTP_NOT_FOUND) {
throw new FileNotFoundException(ere.getMessage());
int status = ere.getStatusCode();
if(ere.getErrorMessage().contains(readOnDirectoryErrorMsg)){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after if


} catch (AzureBlobFileSystemException gpsEx) {
AbfsRestOperationException gpsEre = (AbfsRestOperationException) gpsEx;
if(gpsEre.getErrorMessage().contains(readOnDirectoryErrorMsg)){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after if

}

// Default: propagate original error
throw new IOException(ex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be done once only on line 715

This is required since contentLength is not available yet to determine prefetch block size.
*/
bytesRead = readInternal(getFCursor(), getBuffer(), 0, getBufferSize(), false);
if(shouldRestrictGpsOnOpenFile() && isFirstRead()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after if

@steveloughran
Copy link
Contributor

  • Document that it means that if the file isn't found, there isn't a failure until the first read.
  • Make sure that openFile() builder code does this too if no status (or the wrong status type) is passed in, and do it automatically. That filesystem api spec says "existence checks may be delayed".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants