New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIFI-7830: Support large files in PutAzureDataLakeStorage #4556
Conversation
public int available() { | ||
// com.azure.storage.common.Utility.convertStreamToByteBuffer() throws an exception | ||
// if there are more available bytes in the stream after reading the chunk | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MuazmaZ Do you happen to know why Utility.convertStreamToByteBuffer()
throws an exception when available() > 0
?
https://github.com/Azure/azure-sdk-for-java/blob/0345889402425191b7003e73b7b3d6ea3c0a5175/sdk/storage/azure-storage-common/src/main/java/com/azure/storage/common/Utility.java#L268
Due to this, it is not possible to process a longer input stream in portions / chunks.
As a workaround, I added a fake available()
method to lie there is no more data in the input stream which is not really nice but works.
Another option would be to read the chunks in a loop into a byte array on our side and pass a stream on the byte array to the Azure client lib. But I would rather avoid this extra copy and extra memory for the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@turcsanyip , the patch looks good to me, but maybe you could use the BoundedInputStream
from the Apache Commons library [1] instead of the workaround. What do you think?
[1] https://commons.apache.org/proper/commons-io/javadocs/api-2.5/org/apache/commons/io/input/BoundedInputStream.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adenes Thanks for the idea. BoundedInputStream
works properly here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@turcsanyip I am looking into this and I will respond by tomorrow based on the internal team's response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MuazmaZ Thanks. The BoundedInputStream
approach is much better, than my original workaround, so it is not so critical anymore. However, I'm still wondering why it is not possible to pass in a longer stream to Utility.convertStreamToByteBuffer()
.
@@ -216,13 +216,15 @@ public void testFetchNonExistentFile() { | |||
testFailedFetch(fileSystemName, directory, filename, inputFlowFileContent, inputFlowFileContent, 404); | |||
} | |||
|
|||
@Ignore("Takes some time, only recommended for manual testing.") | |||
//@Ignore("Takes some time, only recommended for manual testing.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it no longer take "some time"? :)
… it was ignored originally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked and tested, LGTM. +1
Merged to main, thanks @turcsanyip and everyone who reviewed. |
Signed-off-by: Pierre Villard <pierre.villard.fr@gmail.com> This closes apache#4556.
Signed-off-by: Pierre Villard <pierre.villard.fr@gmail.com> This closes apache#4556.
Signed-off-by: Pierre Villard <pierre.villard.fr@gmail.com> This closes apache#4556.
Signed-off-by: Pierre Villard <pierre.villard.fr@gmail.com> This closes apache#4556.
Signed-off-by: Pierre Villard <pierre.villard.fr@gmail.com> This closes apache#4556.
https://issues.apache.org/jira/browse/NIFI-7830
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically
main
)?Is your initial contribution a single, squashed commit? Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not
squash
or use--force
when pushing to allow for clean monitoring of changes.For code changes:
mvn -Pcontrib-check clean install
at the rootnifi
folder?LICENSE
file, including the mainLICENSE
file undernifi-assembly
?NOTICE
file, including the mainNOTICE
file found undernifi-assembly
?.displayName
in addition to .name (programmatic access) for each of the new properties?For documentation related changes:
Note:
Please ensure that once the PR is submitted, you check GitHub Actions CI for build issues and submit an update to your PR as soon as possible.