New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-19102. FooterReadBufferSize should not be greater than readBufferSize #6617
HADOOP-19102. FooterReadBufferSize should not be greater than readBufferSize #6617
Conversation
🎊 +1 overall
This message was automatically generated. |
:::: AGGREGATED TEST RESULT :::: HNS-OAuth[INFO] Results: HNS-SharedKey[INFO] Results: NonHNS-SharedKey[INFO] Results: AppendBlob-HNS-OAuth[INFO] Results: Time taken: 46 mins 15 secs.
|
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Production code looks good.
Some thoughts around tests added.
byte[] fileContent = getRandomBytesArray(fileSize); | ||
Path testFilePath = createFileWithContent(spiedFs, fileName, | ||
fileContent); | ||
for (int i = 0; i <= 4; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering that since we are now changing the read buffer size as well.
Do we really need file size and read buffer size to have 5 different values...
Objective here is to test all possible scenarios of relative values of three sizes (File size, Read buffer size, Footer read buffer size)
So, permutations of three values of three sizes can be used to achieve that.
All the three sizes can range in set [256KB, 512KB, 1MB] and we should be good to go.
Earlier file sizes were made to go till 4MB because read buffer size was kept at default 4MB.
This will reduce the test runtime as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReadBufferSize has default equal to 4 MB. Will keep this in range. Though have removed 2 MB from readBufferSize and fileSize. Following are the ranges:
fileSize: 256 KB, 512 KB, 1 MB, 4 MB
readBufferSize: 256 KB, 512 KB, 1 MB, 4 MB
footerReadBufferSize: 256 KB, 512 KB, 1 MB,
|
||
private void changeFooterConfigs(final AzureBlobFileSystem spiedFs, | ||
final boolean optimizeFooterRead, final int fileSize, | ||
final int footerReadBufferSize, final int readBufferSize) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: IOException is never thrown in the method body, can be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it.
final int fileId = fileIdx++; | ||
Future future = executorService.submit(() -> { | ||
try { | ||
try (AzureBlobFileSystem spiedFs = createSpiedFs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a try inside try?
If we are catching a general exception, can we have a single exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it.
final int fileId = fileIdx++; | ||
futureList.add(executorService.submit(() -> { | ||
try { | ||
try (AzureBlobFileSystem spiedFs = createSpiedFs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple try here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it.
private final int SIZE_256_KB = 256 * ONE_KB; | ||
|
||
private final Integer[] FILE_SIZES = { | ||
SIZE_256_KB, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming format seems not correct as the variables are not static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made static.
:::: AGGREGATED TEST RESULT :::: HNS-OAuth[INFO] Results: HNS-SharedKey[INFO] Results: NonHNS-SharedKey[INFO] Results: AppendBlob-HNS-OAuth[INFO] Results: Time taken: 45 mins 30 secs.
|
:::: AGGREGATED TEST RESULT :::: HNS-OAuth[INFO] Results: HNS-SharedKey[INFO] Results: NonHNS-SharedKey[INFO] Results: AppendBlob-HNS-OAuth[INFO] Results: Time taken: 45 mins 0 secs.
|
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking comments.
LGTM !! |
@steveloughran @mehakmeet @mukund-thakur , requesting you to kindly review please. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Production code all good. for the tests I"m worried this is now a larger test suite and will have to be tagged as a Scale test, Which means that people like me running remotely won't be running it as often. And when we do,It is at risk of timing out. Would it be possible to test with smaller sizes?
|
||
private static final int SIZE_256_KB = 256 * ONE_KB; | ||
|
||
private static final Integer[] FILE_SIZES = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to make a slower test on remote runs. Does it really have to be this big or is it possible to tune things so that They work with smaller files? Because if this is the restriction then it is going to have to become a scale test, which will not be run as often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On trunk, file size has the range: 256KB, 512KB, 1MB, 2MB, 4MB.
As part of this PR, fileSize has the range 256KB, 512KB, 1MB, 4MB. And as part of this PR, a dimension of readBufferSize is added [256 KB, 512KB, 1MB, 4MB]. With this PR. for a test, at a given fileSize, only once the file would be created, and all the combinations of readBufferSize and footerReadBufferSize would test on that file.
On trunk, if we run all the test sequentially, it takes ~8min47sec and if all tests are run on this PR (including readBufferSize dimension), it takes only ~7min (These tests runs are done out of Azure network). With this PR, the time to run this class reduces.
4MB fileSize is included because we have a default readBufferSize of 4MB. Kindly advise please if we should remove the 4MB fileSize from the fileSizes.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having a smaller file size is good, but at the same time, so is the broader coverage.
making it a scale test means that it'll run on a -Dscale option. you can also set up up your ide or auth-keys to always run scale tests too, if you haven't noticed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As suggested, have made the class ITestAbfsInputStreamFooter
extend AbstractAbfsScaleTest
...ure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsInputStreamReadFooter.java
Outdated
Show resolved
Hide resolved
...ure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsInputStreamReadFooter.java
Outdated
Show resolved
Hide resolved
testPartialReadWithSomeData(spiedFs, testFilePath, | ||
fileSize - AbfsInputStream.FOOTER_SIZE, | ||
AbfsInputStream.FOOTER_SIZE, | ||
fileContent, footerReadBufferSize, readBufferSize); | ||
} | ||
} | ||
} | ||
|
||
private void testPartialReadWithSomeData(final FileSystem fs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is a junit entry point, add @test and leave the name alone. Otherwise to avoid confusion give it a different name such as validatePartialReadWithSomeData
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken. Refactored the names of non-test-entry methods.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
…llFileReads; fixed futureAwait API use; javadocs
:::: AGGREGATED TEST RESULT :::: HNS-OAuth[INFO] Results: HNS-SharedKey[INFO] Results: NonHNS-SharedKey[INFO] Results: AppendBlob-HNS-OAuth[INFO] Results: Time taken: 48 mins 20 secs.
|
Hi @steveloughran, thank you very much for the suggestion. Have taken it. As part of the change, I have the following changes:
Thank you very much for the suggestions, requesting your kind review. Thank you a lot! |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, all changes looking good, I've just been rigorously reviewing the new test utils methods. Afraid they'll need javadocs.
If it was new code I'd say "embrace assertj" but as they already existed, not so important. just suggested some improvements in hardening and reporting.
do that and this is good to merge -though I'll need to do a local build and test first.
...p-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/functional/FutureIO.java
Show resolved
Hide resolved
...p-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/functional/FutureIO.java
Outdated
Show resolved
Hide resolved
...p-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/functional/FutureIO.java
Outdated
Show resolved
Hide resolved
...oop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStreamTestUtils.java
Show resolved
Hide resolved
...oop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStreamTestUtils.java
Outdated
Show resolved
Hide resolved
...oop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStreamTestUtils.java
Show resolved
Hide resolved
...oop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStreamTestUtils.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
public void verifyBeforeSeek(AbfsInputStream abfsInputStream) { | ||
assertEquals(0, abfsInputStream.getFCursor()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use assertJ or at least add a description of what each field is being checked for, so when there's a failure the error message is more meaningful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to assertJ. Added description.
verifyAfterSeek(abfsInputStream, seekPos); | ||
} | ||
|
||
public void verifyBeforeSeek(AbfsInputStream abfsInputStream) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verify "what". Verifying that the steam has not yet done any seek? read?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored the name of the method to verifyAbfsInputStreamBaseStateBeforeSeek
. Added javadocs for the method.
|
||
private static final int SIZE_256_KB = 256 * ONE_KB; | ||
|
||
private static final Integer[] FILE_SIZES = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having a smaller file size is good, but at the same time, so is the broader coverage.
making it a scale test means that it'll run on a -Dscale option. you can also set up up your ide or auth-keys to always run scale tests too, if you haven't noticed
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Hi @steveloughran , thank you a lot for the review. Have taken the comments. Have taken assertj in the new test util class. Class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 pending the minor changes; we will get to see what surprises surface, but I am happy.
@@ -31,6 +32,8 @@ | |||
import java.util.concurrent.Future; | |||
import java.util.concurrent.TimeUnit; | |||
import java.util.concurrent.TimeoutException; | |||
import org.slf4j.Logger; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't be in same import block as java*.
tip: you can set your IDE up for these rules
@@ -141,9 +145,11 @@ public static <T> List<T> awaitFuture(final Collection<Future<T>> collection) | |||
} | |||
return results; | |||
} catch (InterruptedException e) { | |||
LOG.error("Execution of future interrupted ", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets make these a debug() and let the caller handle the the rest.
throw (InterruptedIOException) new InterruptedIOException(e.toString()) | ||
.initCause(e); | ||
} catch (ExecutionException e) { | ||
LOG.error("Execution of future failed with exception", e.getCause()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log this at debug. handlers up the stack can choose what to do -it may be harmless
Thanks @steveloughran for the review. Have refactored the import order, and have refactored |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. merging to trunk; do a PR with this cherrypicked to 3.4.x and I will merge there too
…readBufferSize (apache#6617) Contributed by Pranav Saxena
Thank you @steveloughran very much! Have opened a PR against branch-3.4 #6763. Thank you! |
…readBufferSize (#6617) Contributed by Pranav Saxena
JIRA: https://issues.apache.org/jira/browse/HADOOP-19102
The method
optimisedRead
creates a buffer array of sizereadBufferSize
. If footerReadBufferSize is greater than readBufferSize, abfs will attempt to read more data than the buffer array can hold, which causes an exception.Change: To avoid this, we will keep footerBufferSize = min(readBufferSizeConfig, footerBufferSizeConfig)
Test change:
ITestAbfsInputStreamReadFooter
tests different scenarios with different combinations of fileSize and footerBufferReadSize. Have added a dimension of readBufferSize in the testcases. Now its a combination of fileSize, readBufferSize, footerBufferReadSize.Also, as part of this PR, have improved tests within
ITestAbfsInputStreamReadFooter
. There are tests which have multiple combination, and there was file getting created for all the combination. There has to be a combination on different fileSize.The change: We will spin up one thread each for each fileSize. And in each thread, all the combination for that particular fileSize will run. This will help in creating file once for a fileSize and multiple fileSize related assertion can happen in parallel and use hardware capability.
Improvement: on a 6 processor VM [outside Azure network], on trunk, it tool 8min47sec to run all tests of ITestAbfsInputStreamReadFooter and in the PR branch, it took 7 min. (Its IDE run wherein each test method run one after another unlike sunfire-maven command(used in runTest script) which can run tests in parallel).