-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NIFI-10792] Fixed bug to allow for processing files larger than 10MB… #7016
Conversation
…. Also refactored to reduce the cognitive complexity in the onTrigger method.
@exceptionfactory I did not end up including the unit test I had as it was a unit which tested with a 20MB file. I would have thought there should be a unit test to exercise the change I made. Please advise. |
Anecdotally, the NiFi mock framework will read the entire FF contents into memory (I discovered this in #6369) so I think you're correct in not including unit tests testing larger FFs. Here's my except from that PR
|
Thanks for the comment @mh013370, this issue is similar to the size limits for Tar files resolved in #6369. @dan-s1, Although unit tests are the preferred way to confirm expected behavior, unit tests are not optimal for testing these kinds of scenarios. For this particular situation, confirming existing functionality is good, and runtime verification is better than introducing large files or long-running tests into the repository. |
@exceptionfactory When you get a chance can you please look over this PR? I was hoping I could reuse the Excel part of this code for NIFI-11167. I noticed a drawback with fastexcel-reader that it
therefore I would like to use excel-streaming-reader as I did in this PR as I believes it preserves these things or at least a subset of them. |
Thanks @dan-s1, I plan to take a closer look at this pull request soon. The |
@exceptionfactory Thanks! That is a good point regarding the formatting. Though I would think you would want to have an option to preserve formatting for use with an Excel record writer. |
Good point @dan-s1, I'm not sure if the formatting information could be carried through to an Excel Writer, but it would be useful if it can be supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I was able to verify that I was able to convert an excel spreadsheet > 10 MB successfully into CSV, I just have a few questions
...-poi-processors/src/main/java/org/apache/nifi/processors/poi/ConvertExcelToCSVProcessor.java
Show resolved
Hide resolved
...-poi-processors/src/main/java/org/apache/nifi/processors/poi/ConvertExcelToCSVProcessor.java
Show resolved
Hide resolved
@emiliosetiadarma @exceptionfactory I believe I addressed the issues brought up. Is there anything else needed? |
- Removed unnecessary graphics2d dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work on this improvement @dan-s1, and thanks for the testing @emiliosetiadarma.
The code changes for streaming look good, existing unit tests pass and I was able to convert a 20 MB XLSX file at runtime with these changes.
I pushed a commit that adds the log4j-to-slf4j
dependency, which was something missing after recent Apache POI upgrades, supporting log routing from Log4j 2 to the NiFi standard SLF4J. I also removed the explicit dependency on graphics2d
, which has a holdover from earlier Apache POI versions and no longer used. Lastly, I reverted the reformatting of the properties and relationships, since it is easier to maintain the list of one per line versus wrapping lines.
With these changes, planning to merge following a successful automated build.
...-poi-processors/src/main/java/org/apache/nifi/processors/poi/ConvertExcelToCSVProcessor.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Automated builds look good. +1 merging
- Refactored to reduce the cognitive complexity in the onTrigger method - Added log4j-to-slf4j library and adjusted formatting - Removed unnecessary graphics2d dependency This closes #7016 Signed-off-by: David Handermann <exceptionfactory@apache.org>
- Refactored to reduce the cognitive complexity in the onTrigger method - Added log4j-to-slf4j library and adjusted formatting - Removed unnecessary graphics2d dependency This closes apache#7016 Signed-off-by: David Handermann <exceptionfactory@apache.org>
…eGroovyScript This closes #7016 Signed-off-by: Mike Thomsen <mthomsen@apache.org>
…eGroovyScript This closes #7016 Signed-off-by: Mike Thomsen <mthomsen@apache.org>
…. Also refactored to reduce the cognitive complexity in the onTrigger method.
Summary
NIFI-10792
Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-00000
NIFI-00000
Pull Request Formatting
main
branchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
mvn clean install -P contrib-check
Licensing
LICENSE
andNOTICE
filesDocumentation