-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18873. ABFS: AbfsOutputStream doesnt close DataBlocks object. #6010
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code looks good, though i want that review to make sure all is good (it should be)
@@ -345,6 +345,7 @@ private void uploadBlockAsync(DataBlocks.DataBlock blockToUpload, | |||
return null; | |||
} finally { | |||
IOUtils.close(blockUploadData); | |||
blockToUpload.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, because the line above calls BlockUploadData.close(), a lot of the cleanup should take place already; that .startUpload() call on L315 sets the blockToUpload.buffer ref to null, so that reference doesn't retain a hold on the bytebuffer.
this is why we haven't seen problems yet like OOM/disk storage...cleanup was happening.
can you review the code to make sure the sequence of L348 always does the right thing and not fail due to some attempted double delete of the file...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the comment.
can you review the code to make sure the sequence of L348 always does the right thing and not fail due to some attempted double delete of the file...
:
- disk:
- BlockUploadData.close() -> deletes the file:
hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/store/DataBlocks.java
Line 266 in f6fa5bd
LOG.debug("File deleted in BlockUploadData close: {}", file.delete()); - blockToUpload.close() -> innerClose() -> closeBlock() -> tries to delete the file (java.io.File#delete returns true in case path in the obj deleted succesfully, false otherwise, ref: https://docs.oracle.com/javase/8/docs/api/java/io/File.html#delete--) :
hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/store/DataBlocks.java
Line 1118 in f6fa5bd
if (!bufferFile.delete() && bufferFile.exists()) {
- BlockUploadData.close() -> deletes the file:
- bytebuffer:
- What BlockUploadData contains: ByteBufferInputStream of
blockBuffer
:hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/store/DataBlocks.java
Line 732 in f6fa5bd
new ByteBufferInputStream(dataSize, blockBuffer)); - BlockUploadData.close() -> closes the ByteBufferInputSteam :
hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/store/DataBlocks.java
Line 263 in f6fa5bd
cleanupWithLogger(LOG, uploadStream); bytebuffer
variable in the inputStream :hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/store/DataBlocks.java
Line 806 in f6fa5bd
byteBuffer = null; - BlockToUpload.close() -> releases the buffer :
hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/store/DataBlocks.java
Line 764 in f6fa5bd
releaseBuffer(blockBuffer); hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/store/DataBlocks.java
Line 667 in f6fa5bd
private void releaseBuffer(ByteBuffer buffer) {
- What BlockUploadData contains: ByteBufferInputStream of
- ByteArrayBlock:
- BlockUploadData.close() -> closes the byteArrayInputStream(its wrapper around the buffer array): does nothing.
- BlockToUpload.close() -> does nothing:
hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/store/DataBlocks.java
Lines 623 to 626 in f6fa5bd
protected void innerClose() { buffer = null; blockReleased(); }
For checking that this flow works correctly on all the types of dataBuffers, have made change in the testCloseOfDataBlockOnAppendComplete
. This will also help keep check the sanity in future on any change in the databuffer code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is safe in case of file delete, as java.io.File will return false in case file is not there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add blockToUpload
in L347 alongside blockUploadData
, for consistency and also checks for null.
:::: AGGREGATED TEST RESULT :::: HNS-OAuth[INFO] Results: HNS-SharedKey[INFO] Results: NonHNS-SharedKey[INFO] Results: AppendBlob-HNS-OAuth[INFO] Results: Time taken: 46 mins 36 secs.
|
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really good catch! Interesting to know if this was a source of any problems you saw in prod (Sorry for that 😅).
As Steve has mentioned we did do some cleanup for uploadBlock that may have saved us in a way (but could have hidden this issue).
Mostly happy with the change, just small nits.
@@ -345,6 +345,7 @@ private void uploadBlockAsync(DataBlocks.DataBlock blockToUpload, | |||
return null; | |||
} finally { | |||
IOUtils.close(blockUploadData); | |||
blockToUpload.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add blockToUpload
in L347 alongside blockUploadData
, for consistency and also checks for null.
BlockUploadStatistics.class)); | ||
return factory; | ||
}).when(store).getBlockFactory(); | ||
OutputStream os = fs.create(new Path("/file_" + blockBufferType)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try with resources or try-finally to close the stream (Off chance we see failure after creating, we clean up)
BlockUploadStatistics.class)); | ||
return factory; | ||
}).when(store).getBlockFactory(); | ||
OutputStream os = fs.create(new Path("/file_" + blockBufferType)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use "getMethodName()" instead of hardcoding "/file"
OutputStream os = fs.create(new Path("/file_" + blockBufferType)); | ||
os.write(new byte[1]); | ||
os.close(); | ||
Mockito.verify(dataBlock[0], Mockito.times(1)).close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an assertion here that dataBlock[0] is closed.
*Suggestion, getState() can be used to verify the state of the block is in Closed
state (May need to make the getter public). We can even add an assertion after L135 to assert it's in Writing
state as well.
:::: AGGREGATED TEST RESULT :::: HNS-OAuth[INFO] Results: HNS-SharedKey[INFO] Results: NonHNS-SharedKey[INFO] Results: AppendBlob-HNS-OAuth[INFO] Results: Time taken: 48 mins 26 secs.
|
Thanks @mehakmeet for the review. I have taken the comments. Thank you. |
🎊 +1 overall
This message was automatically generated. |
...handing off review to @mehakmeet; if he's happy it's good |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks for this @saxenapranav, can you please cherry-pick the same in branch-3.3 and run the whole test suite again, then we'll merge it there as well. |
…pache#6010) AbfsOutputStream to close the dataBlock object created for the upload. Contributed By: Pranav Saxena
Thanks @mehakmeet for the review. I have cherry-picked the same in branch-3.3 in PR: #6105. Thanks. |
…pache#6010) AbfsOutputStream to close the dataBlock object created for the upload. Contributed By: Pranav Saxena
AbfsOutputStream doesnt close the dataBlock object created for the upload.
What is the implication of not doing that:
DataBlocks has three implementations:
ByteArrayBlock
ByteBufferBlock:
close
method on the this object has the responsiblity of returning back the buffer to pool so it can be reused.close
:DiskBlock:
startUpload() gives an object of BlockUploadData which gives method of
toByteArray()
which is used in abfsOutputStream to get the byteArray in the dataBlock.Method which uses the DataBlock object:
hadoop/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java
Line 298 in fac7d26
jira: https://issues.apache.org/jira/browse/HADOOP-18873
:::: AGGREGATED TEST RESULT ::::
HNS-OAuth
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:344->lambda$testAcquireRetry$6:345 » TestTimedOut
[INFO]
[ERROR] Tests run: 572, Failures: 0, Errors: 1, Skipped: 54
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41
HNS-SharedKey
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:336 » TestTimedOut test timed o...
[ERROR] ITestAzureBlobFileSystemLease.testTwoWritersCreateAppendWithInfiniteLeaseEnabled:186->twoWriters:154 » TestTimedOut
[INFO]
[ERROR] Tests run: 572, Failures: 0, Errors: 2, Skipped: 54
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41
NonHNS-SharedKey
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 11
[INFO] Results:
[INFO]
[WARNING] Tests run: 589, Failures: 0, Errors: 0, Skipped: 277
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 44
AppendBlob-HNS-OAuth
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 572, Failures: 0, Errors: 0, Skipped: 54
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41
Time taken: 45 mins 31 secs.
azureuser@Hadoop-VM-EAST2:~/hadoop/hadoop-tools/hadoop-azure$ git log
commit 16455f8 (HEAD -> HADOOP-18873, origin/HADOOP-18873)
Author: Pranav Saxena <>
Date: Thu Aug 31 21:25:51 2023 -0700