Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18873. ABFS: AbfsOutputStream doesnt close DataBlocks object. #6010

Merged
merged 9 commits into from
Sep 20, 2023

Conversation

saxenapranav
Copy link
Contributor

@saxenapranav saxenapranav commented Aug 31, 2023

AbfsOutputStream doesnt close the dataBlock object created for the upload.

What is the implication of not doing that:
DataBlocks has three implementations:

ByteArrayBlock

  1. This creates an object of DataBlockByteArrayOutputStream (child of ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the array.
  2. This gets GCed.
    ByteBufferBlock:
  3. There is a defined DirectBufferPool from which it tries to request the directBuffer.
  4. If nothing in the pool, a new directBuffer is created.
  5. the close method on the this object has the responsiblity of returning back the buffer to pool so it can be reused.
  6. Since we are not calling the close:
    1. The pool is rendered of less use, since each request creates a new directBuffer from memory.
    2. All the object can be GCed and the direct-memory allocated may be returned on the GC. What if the process crashes, the memory never goes back and cause memory issue on the machine.
      DiskBlock:
  7. This creates a file on disk on which the data-to-upload is written. This file gets deleted in startUpload().close().

startUpload() gives an object of BlockUploadData which gives method of toByteArray() which is used in abfsOutputStream to get the byteArray in the dataBlock.

Method which uses the DataBlock object:

private void uploadBlockAsync(DataBlocks.DataBlock blockToUpload,

jira: https://issues.apache.org/jira/browse/HADOOP-18873

:::: AGGREGATED TEST RESULT ::::

HNS-OAuth

[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:344->lambda$testAcquireRetry$6:345 » TestTimedOut
[INFO]
[ERROR] Tests run: 572, Failures: 0, Errors: 1, Skipped: 54
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41

HNS-SharedKey

[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:336 » TestTimedOut test timed o...
[ERROR] ITestAzureBlobFileSystemLease.testTwoWritersCreateAppendWithInfiniteLeaseEnabled:186->twoWriters:154 » TestTimedOut
[INFO]
[ERROR] Tests run: 572, Failures: 0, Errors: 2, Skipped: 54
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41

NonHNS-SharedKey

[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 11
[INFO] Results:
[INFO]
[WARNING] Tests run: 589, Failures: 0, Errors: 0, Skipped: 277
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 44

AppendBlob-HNS-OAuth

[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 572, Failures: 0, Errors: 0, Skipped: 54
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41

Time taken: 45 mins 31 secs.
azureuser@Hadoop-VM-EAST2:~/hadoop/hadoop-tools/hadoop-azure$ git log
commit 16455f8 (HEAD -> HADOOP-18873, origin/HADOOP-18873)
Author: Pranav Saxena <>
Date: Thu Aug 31 21:25:51 2023 -0700

getBlockFactory to be package-protected

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 31s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 30m 48s trunk passed
+1 💚 compile 0m 29s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 28s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 25s trunk passed
+1 💚 mvnsite 0m 31s trunk passed
+1 💚 javadoc 0m 30s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 28s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 0m 49s trunk passed
-1 ❌ shadedclient 19m 53s branch has errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 22s the patch passed
+1 💚 compile 0m 23s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 23s the patch passed
+1 💚 compile 0m 20s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 20s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 15s the patch passed
+1 💚 mvnsite 0m 23s the patch passed
+1 💚 javadoc 0m 20s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 18s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 0m 43s the patch passed
-1 ❌ shadedclient 20m 8s patch has errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 1m 55s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 28s The patch does not generate ASF License warnings.
83m 12s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6010/1/artifact/out/Dockerfile
GITHUB PR #6010
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux dff4508a923d 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 7770dfa
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6010/1/testReport/
Max. process+thread count 412 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6010/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 51s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 46m 45s trunk passed
+1 💚 compile 0m 38s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 33s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 29s trunk passed
+1 💚 mvnsite 0m 38s trunk passed
+1 💚 javadoc 0m 37s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 31s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 4s trunk passed
-1 ❌ shadedclient 37m 16s branch has errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 28s the patch passed
+1 💚 compile 0m 31s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 31s the patch passed
+1 💚 compile 0m 26s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 26s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 18s the patch passed
+1 💚 mvnsite 0m 28s the patch passed
+1 💚 javadoc 0m 25s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 24s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 3s the patch passed
-1 ❌ shadedclient 36m 56s patch has errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 1m 55s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
136m 28s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6010/2/artifact/out/Dockerfile
GITHUB PR #6010
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux fefcb2fc141c 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 7770dfa
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6010/2/testReport/
Max. process+thread count 328 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6010/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 32m 16s trunk passed
+1 💚 compile 0m 30s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 28s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 26s trunk passed
+1 💚 mvnsite 0m 31s trunk passed
+1 💚 javadoc 0m 31s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 27s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 0m 51s trunk passed
+1 💚 shadedclient 21m 7s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 22s the patch passed
+1 💚 compile 0m 22s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 22s the patch passed
+1 💚 compile 0m 20s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 20s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 14s the patch passed
+1 💚 mvnsite 0m 22s the patch passed
+1 💚 javadoc 0m 20s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 19s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 0m 44s the patch passed
+1 💚 shadedclient 20m 58s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 1m 55s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 29s The patch does not generate ASF License warnings.
86m 52s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6010/3/artifact/out/Dockerfile
GITHUB PR #6010
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux cc695cae12dc 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 306cac2
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6010/3/testReport/
Max. process+thread count 558 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6010/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 49s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 48m 15s trunk passed
+1 💚 compile 0m 38s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 33s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 29s trunk passed
+1 💚 mvnsite 0m 38s trunk passed
+1 💚 javadoc 0m 36s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 30s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 4s trunk passed
+1 💚 shadedclient 38m 46s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 28s the patch passed
+1 💚 compile 0m 31s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 32s the patch passed
+1 💚 compile 0m 25s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 25s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 18s the patch passed
+1 💚 mvnsite 0m 29s the patch passed
+1 💚 javadoc 0m 25s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 23s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 3s the patch passed
+1 💚 shadedclient 39m 2s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 1m 55s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
141m 30s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6010/4/artifact/out/Dockerfile
GITHUB PR #6010
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 9153814070df 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 16455f8
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6010/4/testReport/
Max. process+thread count 530 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6010/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@saxenapranav saxenapranav marked this pull request as ready for review September 1, 2023 09:56
@saxenapranav saxenapranav changed the title Hadoop 18873. ABFS: AbfsOutputStream doesnt close DataBlocks object. Hadoop-18873. ABFS: AbfsOutputStream doesnt close DataBlocks object. Sep 1, 2023
@saxenapranav saxenapranav changed the title Hadoop-18873. ABFS: AbfsOutputStream doesnt close DataBlocks object. HADOOP-18873. ABFS: AbfsOutputStream doesnt close DataBlocks object. Sep 1, 2023
Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks good, though i want that review to make sure all is good (it should be)

@@ -345,6 +345,7 @@ private void uploadBlockAsync(DataBlocks.DataBlock blockToUpload,
return null;
} finally {
IOUtils.close(blockUploadData);
blockToUpload.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, because the line above calls BlockUploadData.close(), a lot of the cleanup should take place already; that .startUpload() call on L315 sets the blockToUpload.buffer ref to null, so that reference doesn't retain a hold on the bytebuffer.

this is why we haven't seen problems yet like OOM/disk storage...cleanup was happening.

can you review the code to make sure the sequence of L348 always does the right thing and not fail due to some attempted double delete of the file...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the comment.

can you review the code to make sure the sequence of L348 always does the right thing and not fail due to some attempted double delete of the file...:

  1. disk:
    1. BlockUploadData.close() -> deletes the file:
      LOG.debug("File deleted in BlockUploadData close: {}", file.delete());
    2. blockToUpload.close() -> innerClose() -> closeBlock() -> tries to delete the file (java.io.File#delete returns true in case path in the obj deleted succesfully, false otherwise, ref: https://docs.oracle.com/javase/8/docs/api/java/io/File.html#delete--) :
  2. bytebuffer:
    1. What BlockUploadData contains: ByteBufferInputStream of blockBuffer :
    2. BlockUploadData.close() -> closes the ByteBufferInputSteam : -> unset the bytebuffer variable in the inputStream :
    3. BlockToUpload.close() -> releases the buffer : -> adds back the buffer in pool:
  3. ByteArrayBlock:
    1. BlockUploadData.close() -> closes the byteArrayInputStream(its wrapper around the buffer array): does nothing.
    2. BlockToUpload.close() -> does nothing:
      protected void innerClose() {
      buffer = null;
      blockReleased();
      }

For checking that this flow works correctly on all the types of dataBuffers, have made change in the testCloseOfDataBlockOnAppendComplete. This will also help keep check the sanity in future on any change in the databuffer code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is safe in case of file delete, as java.io.File will return false in case file is not there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add blockToUpload in L347 alongside blockUploadData, for consistency and also checks for null.

@saxenapranav
Copy link
Contributor Author

:::: AGGREGATED TEST RESULT ::::

HNS-OAuth

[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:329 » TestTimedOut test timed o...
[INFO]
[ERROR] Tests run: 572, Failures: 0, Errors: 1, Skipped: 54
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41

HNS-SharedKey

[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:329 » TestTimedOut test timed o...
[INFO]
[ERROR] Tests run: 572, Failures: 0, Errors: 1, Skipped: 54
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41

NonHNS-SharedKey

[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 11
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:344->lambda$testAcquireRetry$6:345 » TestTimedOut
[INFO]
[ERROR] Tests run: 589, Failures: 0, Errors: 1, Skipped: 277
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 44

AppendBlob-HNS-OAuth

[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] ITestAzureBlobFileSystemRandomRead.testSkipBounds:218->Assert.assertTrue:42->Assert.fail:89 There should not be any network I/O (elapsedTimeMs=123).
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:329 » TestTimedOut test timed o...
[INFO]
[ERROR] Tests run: 572, Failures: 1, Errors: 1, Skipped: 54
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41

Time taken: 46 mins 36 secs.
azureuser@Hadoop-VM-EAST2:~/hadoop/hadoop-tools/hadoop-azure$ git log
commit 1f8b7d8 (HEAD -> HADOOP-18873, origin/HADOOP-18873)
Author: Pranav Saxena <>
Date: Fri Sep 1 05:25:21 2023 -0700

made testCloseOfDataBlockOnAppendComplete exhaustive of datablock types

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 31s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 32m 1s trunk passed
+1 💚 compile 0m 30s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 27s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 26s trunk passed
+1 💚 mvnsite 0m 31s trunk passed
+1 💚 javadoc 0m 31s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 27s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 0m 51s trunk passed
+1 💚 shadedclient 21m 10s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 22s the patch passed
+1 💚 compile 0m 23s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 23s the patch passed
+1 💚 compile 0m 20s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 20s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 15s the patch passed
+1 💚 mvnsite 0m 23s the patch passed
+1 💚 javadoc 0m 20s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 19s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 0m 42s the patch passed
+1 💚 shadedclient 20m 46s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 1m 54s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 29s The patch does not generate ASF License warnings.
86m 30s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6010/5/artifact/out/Dockerfile
GITHUB PR #6010
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 9e73c5082de5 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 1f8b7d8
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6010/5/testReport/
Max. process+thread count 553 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6010/5/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@mehakmeet mehakmeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good catch! Interesting to know if this was a source of any problems you saw in prod (Sorry for that 😅).
As Steve has mentioned we did do some cleanup for uploadBlock that may have saved us in a way (but could have hidden this issue).

Mostly happy with the change, just small nits.

@@ -345,6 +345,7 @@ private void uploadBlockAsync(DataBlocks.DataBlock blockToUpload,
return null;
} finally {
IOUtils.close(blockUploadData);
blockToUpload.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add blockToUpload in L347 alongside blockUploadData, for consistency and also checks for null.

BlockUploadStatistics.class));
return factory;
}).when(store).getBlockFactory();
OutputStream os = fs.create(new Path("/file_" + blockBufferType));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try with resources or try-finally to close the stream (Off chance we see failure after creating, we clean up)

BlockUploadStatistics.class));
return factory;
}).when(store).getBlockFactory();
OutputStream os = fs.create(new Path("/file_" + blockBufferType));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use "getMethodName()" instead of hardcoding "/file"

OutputStream os = fs.create(new Path("/file_" + blockBufferType));
os.write(new byte[1]);
os.close();
Mockito.verify(dataBlock[0], Mockito.times(1)).close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an assertion here that dataBlock[0] is closed.
*Suggestion, getState() can be used to verify the state of the block is in Closed state (May need to make the getter public). We can even add an assertion after L135 to assert it's in Writing state as well.

@saxenapranav
Copy link
Contributor Author


:::: AGGREGATED TEST RESULT ::::

HNS-OAuth

[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:329 » TestTimedOut test timed o...
[INFO]
[ERROR] Tests run: 589, Failures: 0, Errors: 1, Skipped: 54
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41

HNS-SharedKey

[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] ITestAzureBlobFileSystemRandomRead.testSkipBounds:218->Assert.assertTrue:42->Assert.fail:89 There should not be any network I/O (elapsedTimeMs=24).
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:336 » TestTimedOut test timed o...
[INFO]
[ERROR] Tests run: 589, Failures: 1, Errors: 1, Skipped: 54
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41

NonHNS-SharedKey

[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 11
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:344->lambda$testAcquireRetry$6:345 » TestTimedOut
[INFO]
[ERROR] Tests run: 589, Failures: 0, Errors: 1, Skipped: 277
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 44

AppendBlob-HNS-OAuth

[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:344->lambda$testAcquireRetry$6:345 » TestTimedOut
[INFO]
[ERROR] Tests run: 589, Failures: 0, Errors: 1, Skipped: 54
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41

Time taken: 48 mins 26 secs.
azureuser@Hadoop-VM-EAST2:~/hadoop/hadoop-tools/hadoop-azure$ git log
commit d9da57b (HEAD -> HADOOP-18873, origin/HADOOP-18873)
Author: Pranav Saxena <>
Date: Wed Sep 13 06:02:03 2023 -0700

close with IoUtils; test refactors;

@saxenapranav
Copy link
Contributor Author

Thanks @mehakmeet for the review. I have taken the comments. Thank you.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 34s Maven dependency ordering for branch
+1 💚 mvninstall 21m 0s trunk passed
+1 💚 compile 11m 2s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 compile 9m 33s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 2m 26s trunk passed
+1 💚 mvnsite 1m 52s trunk passed
+1 💚 javadoc 1m 37s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 1m 19s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 2m 30s trunk passed
+1 💚 shadedclient 23m 21s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 26s Maven dependency ordering for patch
+1 💚 mvninstall 0m 56s the patch passed
+1 💚 compile 11m 18s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javac 11m 18s the patch passed
+1 💚 compile 9m 46s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 9m 46s the patch passed
+1 💚 blanks 0m 1s The patch has no blanks issues.
+1 💚 checkstyle 2m 24s the patch passed
+1 💚 mvnsite 1m 49s the patch passed
+1 💚 javadoc 1m 30s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 1m 21s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 2m 55s the patch passed
+1 💚 shadedclient 22m 27s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 16m 34s hadoop-common in the patch passed.
+1 💚 unit 2m 11s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 49s The patch does not generate ASF License warnings.
169m 27s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6010/6/artifact/out/Dockerfile
GITHUB PR #6010
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 4090ed52a0ae 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / d9da57b
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6010/6/testReport/
Max. process+thread count 1738 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-azure U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6010/6/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor

...handing off review to @mehakmeet; if he's happy it's good

Copy link
Contributor

@mehakmeet mehakmeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@mehakmeet mehakmeet merged commit f24b73e into apache:trunk Sep 20, 2023
4 checks passed
@mehakmeet
Copy link
Contributor

Thanks for this @saxenapranav, can you please cherry-pick the same in branch-3.3 and run the whole test suite again, then we'll merge it there as well.

saxenapranav added a commit to saxenapranav/hadoop that referenced this pull request Sep 20, 2023
…pache#6010)


AbfsOutputStream to close the dataBlock object created for the upload.

Contributed By: Pranav Saxena
@saxenapranav
Copy link
Contributor Author

Thanks @mehakmeet for the review. I have cherry-picked the same in branch-3.3 in PR: #6105.

Thanks.

jiajunmao pushed a commit to jiajunmao/hadoop-MLEC that referenced this pull request Feb 6, 2024
…pache#6010)


AbfsOutputStream to close the dataBlock object created for the upload.

Contributed By: Pranav Saxena
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants