Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINIFICPP-1177 Improvements to the TailFile processor #791

Closed
wants to merge 14 commits into from

Conversation

fgerlits
Copy link
Contributor

@fgerlits fgerlits commented May 21, 2020

Thank you for submitting a contribution to Apache NiFi - MiNiFi C++.

Description of PR

Fixed some bugs and implemented missing features in the TailFile processor:

  • handle rotation of log files correctly -- instead of the dodgy logic of doing one file at a time in onTrigger and trying to keep track using the persisted state, we follow the same logic as NiFi: only look at rolled files if the input file got truncated, only look at new rotated files matching the pattern, and stream all new content in one onTrigger call
  • find and pick up new files in Multiple file mode
  • yield when no flow files were generated, and only then (used to fail in multiple and rotation cases)
  • include the delimiter at the end of the flow file, as this is what NiFi does and it would work better on Windows
  • make '\n' the default delimiter (currently the default is no delimiter)
  • support Recursive lookup in Multiple files mode
  • support Lookup frequency in Multiple files mode (we do a lookup in every onTrigger at the moment)
  • lots of minor cleanup, unit test fixes, and added more unit tests

Not done as part of this PR, as it was already very large; I have created the Jira MINIFICPP-1244 for this:

  • support NiFi's Initial Start Position setting (Beginning of Time | Beginning of File | Current Time); we always use Beginning of File at the moment

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with MINIFICPP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically master)?

  • Is your initial contribution a single, squashed commit?

For code changes:

  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file?
  • If applicable, have you updated the NOTICE file?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

Copy link
Contributor

@arpadboda arpadboda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the processor code itself, will proceed to tests and changes in session.

extensions/standard-processors/processors/TailFile.cpp Outdated Show resolved Hide resolved
extensions/standard-processors/processors/TailFile.cpp Outdated Show resolved Hide resolved
extensions/standard-processors/processors/TailFile.cpp Outdated Show resolved Hide resolved
extensions/standard-processors/processors/TailFile.cpp Outdated Show resolved Hide resolved
extensions/standard-processors/processors/TailFile.cpp Outdated Show resolved Hide resolved
@fgerlits fgerlits force-pushed the MINIFICPP-1177-PR branch 6 times, most recently from 39a8fef to c98c5a2 Compare May 27, 2020 18:20
@fgerlits fgerlits force-pushed the MINIFICPP-1177-PR branch 2 times, most recently from febefb9 to 5ad851b Compare June 3, 2020 14:02
Copy link
Member

@szaszm szaszm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1st round.

Later, I'm planning to:

  • come up with a solution to the CRC in ProcessSession problem
  • review tests in more detail
  • check backwards compatibility

extensions/standard-processors/processors/TailFile.cpp Outdated Show resolved Hide resolved
extensions/standard-processors/processors/TailFile.cpp Outdated Show resolved Hide resolved
extensions/standard-processors/processors/TailFile.cpp Outdated Show resolved Hide resolved
libminifi/src/utils/file/PathUtils.cpp Outdated Show resolved Hide resolved
libminifi/src/utils/file/PathUtils.cpp Outdated Show resolved Hide resolved
libminifi/src/utils/RegexUtils.cpp Show resolved Hide resolved
libminifi/src/utils/file/FileUtils.cpp Outdated Show resolved Hide resolved
libminifi/src/utils/file/FileUtils.cpp Outdated Show resolved Hide resolved
libminifi/test/unit/FileUtilsTests.cpp Outdated Show resolved Hide resolved
@fgerlits fgerlits force-pushed the MINIFICPP-1177-PR branch 4 times, most recently from a7e4ee7 to eee2028 Compare June 8, 2020 09:04
@fgerlits fgerlits force-pushed the MINIFICPP-1177-PR branch 2 times, most recently from ce2f71c to 43de540 Compare June 11, 2020 13:08
Copy link
Contributor

@arpadboda arpadboda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some minor comments, overall looks good to me, I like the amount of tests added.

extensions/standard-processors/processors/TailFile.cpp Outdated Show resolved Hide resolved
extensions/standard-processors/processors/TailFile.cpp Outdated Show resolved Hide resolved
extensions/standard-processors/processors/TailFile.cpp Outdated Show resolved Hide resolved
@arpadboda
Copy link
Contributor

@fgerlits : one of the tests failed in CI:
https://travis-ci.org/github/apache/nifi-minifi-cpp/jobs/697214111

Copy link
Member

@szaszm szaszm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed a few format specifier issues, but looks good in general.

libminifi/test/unit/StringUtilsTests.cpp Show resolved Hide resolved
extensions/standard-processors/processors/TailFile.cpp Outdated Show resolved Hide resolved
extensions/standard-processors/processors/TailFile.cpp Outdated Show resolved Hide resolved
extensions/standard-processors/processors/TailFile.cpp Outdated Show resolved Hide resolved
extensions/standard-processors/processors/TailFile.cpp Outdated Show resolved Hide resolved
extensions/standard-processors/processors/TailFile.cpp Outdated Show resolved Hide resolved
extensions/standard-processors/processors/TailFile.cpp Outdated Show resolved Hide resolved
extensions/standard-processors/processors/TailFile.cpp Outdated Show resolved Hide resolved
@fgerlits fgerlits force-pushed the MINIFICPP-1177-PR branch 7 times, most recently from ffdac00 to 7306e43 Compare June 17, 2020 15:49
Copy link
Member

@szaszm szaszm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My tests showed incorrect handling of rotation while the minifi c++ agent is stopped. Instead of finding the file that we were last working with, under a potentially different name, I observed an iteration over all rotated files and consuming everything again. Could you please check that?

@fgerlits fgerlits marked this pull request as draft June 24, 2020 15:14
@fgerlits
Copy link
Contributor Author

My tests showed incorrect handling of rotation while the minifi c++ agent is stopped. Instead of finding the file that we were last working with, under a potentially different name, I observed an iteration over all rotated files and consuming everything again. Could you please check that?

cca8f37 fixes this bug according to my tests -- please redo your tests, too

The rollover detection logic is still primitive: it only checks for rollover if the current file length is less than the last read position. So if

  • minifi is stopped after reading 100 bytes from test.log
  • 900 more bytes are written to test.log, and it is renamed to test.log.1
  • 300 bytes are written to a new test.log file
  • and minifi is restarted, then it will happily continue from byte 101 in the new file, and skip the 900 + 100 bytes written in between.

I am planning to fix that in a separate pull request, if that is OK.

@fgerlits fgerlits marked this pull request as ready for review June 25, 2020 16:16
Copy link
Member

@szaszm szaszm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My manual tests showed no other problems with the fixed version. I didn't test the not implemented feature of smarter rollover handling while the agent is shut down.

Thanks for the numerous improvements and your patience. This is a really significant improvement to the minifi c++ agent: a major use case of the project is log collection, and TailFile is consuming one of the most common log formats. 👍

Also implemented some API methods where the inherited version was incorrect.
Instead of the dodgy logic of doing one file at a time in onTrigger
and trying to keep track using the persisted state, we follow the same
logic as NiFi now: only look at rolled files if the input file got
truncated, only look at new rotated files matching the pattern, and
stream all new content in one onTrigger call.

Also fixed some bugs in Multiple file mode and Yield detection logic,
as well as corrected some unit tests (when we checked the log output,
we previously included output from earlier stages of the test).

Support some previously unsupported NiFi properties:
- Recursive lookup
- Lookup frequency
- Rolling Filename Pattern

Change the default delimiter to \n (the previous default was to always
read to the current end of the file, even if it is in the middle of a
line).  Also include the delimiter in the flow file.  (As NiFi does.)
- make parseDelimiter more readable
- use type{foo} instead of of (type)foo
- move replaceOne to StringUtils
- take the argument of globToRegex by value, since we always copy it
- do not indent namespace contents
- better workaround for the Windows minmax macro issue
- remove an unnecessary explicit specifier
- use chrono types instead of integers for time points
- move mtime out of the TailState object
These used to fail in a small percentage of cases due to timing issues.
I have reordered some operations and added some sleeps, and the failures
seem to be gone now.
This version was only used in TailFile, it is no longer used, and
it does too many things.
Also fix variable naming, remove C-style casts, change virtual to override,
add missing overrides, and fix a bug in TestRepository::DeSerialize.
- use gsl::narrow instead of an ad-hoc replacement
- use the correct format specifiers in log lines
- make some type conversions explicit
- change the buffer type from uint8_t to char
- add a trailing underscore to private field names
- reserve space in a vector to minimize allocations
- remove some unnecessary #includes
- fix a typo in a test
Also remove some unused arguments and clean up some comments in the header.
If a flow file was added in this session and immediately removed, then it is
not in the FlowFileRepository, yet, so there is no need to Delete it.  (And
trying to delete it will cause an error later in FlowFileRepository::flush.)
@fgerlits fgerlits force-pushed the MINIFICPP-1177-PR branch 2 times, most recently from 87ee9ed to e4146cc Compare June 30, 2020 15:10
Copy link
Contributor

@arpadboda arpadboda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Will merge soon.

@arpadboda arpadboda closed this in 6ef4dd5 Jun 30, 2020
@fgerlits fgerlits deleted the MINIFICPP-1177-PR branch July 13, 2020 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants