New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MINIFICPP-1177 Improvements to the TailFile processor #791
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the processor code itself, will proceed to tests and changes in session.
39a8fef
to
c98c5a2
Compare
febefb9
to
5ad851b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1st round.
Later, I'm planning to:
- come up with a solution to the CRC in ProcessSession problem
- review tests in more detail
- check backwards compatibility
a7e4ee7
to
eee2028
Compare
ce2f71c
to
43de540
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some minor comments, overall looks good to me, I like the amount of tests added.
@fgerlits : one of the tests failed in CI: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed a few format specifier issues, but looks good in general.
ffdac00
to
7306e43
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My tests showed incorrect handling of rotation while the minifi c++ agent is stopped. Instead of finding the file that we were last working with, under a potentially different name, I observed an iteration over all rotated files and consuming everything again. Could you please check that?
cca8f37 fixes this bug according to my tests -- please redo your tests, too The rollover detection logic is still primitive: it only checks for rollover if the current file length is less than the last read position. So if
I am planning to fix that in a separate pull request, if that is OK. |
cca8f37
to
8216569
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My manual tests showed no other problems with the fixed version. I didn't test the not implemented feature of smarter rollover handling while the agent is shut down.
Thanks for the numerous improvements and your patience. This is a really significant improvement to the minifi c++ agent: a major use case of the project is log collection, and TailFile is consuming one of the most common log formats. 👍
8216569
to
426efaa
Compare
Also implemented some API methods where the inherited version was incorrect.
Instead of the dodgy logic of doing one file at a time in onTrigger and trying to keep track using the persisted state, we follow the same logic as NiFi now: only look at rolled files if the input file got truncated, only look at new rotated files matching the pattern, and stream all new content in one onTrigger call. Also fixed some bugs in Multiple file mode and Yield detection logic, as well as corrected some unit tests (when we checked the log output, we previously included output from earlier stages of the test). Support some previously unsupported NiFi properties: - Recursive lookup - Lookup frequency - Rolling Filename Pattern Change the default delimiter to \n (the previous default was to always read to the current end of the file, even if it is in the middle of a line). Also include the delimiter in the flow file. (As NiFi does.)
- make parseDelimiter more readable - use type{foo} instead of of (type)foo - move replaceOne to StringUtils - take the argument of globToRegex by value, since we always copy it - do not indent namespace contents - better workaround for the Windows minmax macro issue - remove an unnecessary explicit specifier - use chrono types instead of integers for time points - move mtime out of the TailState object
These used to fail in a small percentage of cases due to timing issues. I have reordered some operations and added some sleeps, and the failures seem to be gone now.
This version was only used in TailFile, it is no longer used, and it does too many things.
Also fix variable naming, remove C-style casts, change virtual to override, add missing overrides, and fix a bug in TestRepository::DeSerialize.
- use gsl::narrow instead of an ad-hoc replacement - use the correct format specifiers in log lines - make some type conversions explicit - change the buffer type from uint8_t to char - add a trailing underscore to private field names - reserve space in a vector to minimize allocations - remove some unnecessary #includes - fix a typo in a test
Also remove some unused arguments and clean up some comments in the header.
If a flow file was added in this session and immediately removed, then it is not in the FlowFileRepository, yet, so there is no need to Delete it. (And trying to delete it will cause an error later in FlowFileRepository::flush.)
87ee9ed
to
e4146cc
Compare
e4146cc
to
c681b87
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! Will merge soon.
Thank you for submitting a contribution to Apache NiFi - MiNiFi C++.
Description of PR
Fixed some bugs and implemented missing features in the TailFile processor:
Not done as part of this PR, as it was already very large; I have created the Jira MINIFICPP-1244 for this:
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with MINIFICPP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?
For code changes:
For documentation related changes:
Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.