Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINIFICPP-621 Nanofi Tailfile example #590

Closed
wants to merge 1 commit into from

Conversation

msharee9
Copy link
Contributor

Thank you for submitting a contribution to Apache NiFi - MiNiFi C++.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with MINIFICPP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically master)?

  • Is your initial contribution a single, squashed commit?

For code changes:

  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file?
  • If applicable, have you updated the NOTICE file?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

libminifi/include/ResourceClaim.h Show resolved Hide resolved
nanofi/examples/tail_file.c Outdated Show resolved Hide resolved
nanofi/examples/tail_file.c Outdated Show resolved Hide resolved
nanofi/examples/tail_file.c Outdated Show resolved Hide resolved
nanofi/examples/tail_file.c Outdated Show resolved Hide resolved
nanofi/examples/tail_file.c Outdated Show resolved Hide resolved
nanofi/examples/tail_file.c Outdated Show resolved Hide resolved
nanofi/examples/tail_file.c Outdated Show resolved Hide resolved
nanofi/examples/tail_file.c Outdated Show resolved Hide resolved
nanofi/examples/tail_file.c Outdated Show resolved Hide resolved
nanofi/examples/tail_file.c Outdated Show resolved Hide resolved
nanofi/examples/tail_file.c Outdated Show resolved Hide resolved
nanofi/examples/tail_file.c Outdated Show resolved Hide resolved
nanofi/examples/tail_file.c Outdated Show resolved Hide resolved
nanofi/examples/tail_file.c Outdated Show resolved Hide resolved
nanofi/examples/tail_file.c Outdated Show resolved Hide resolved
nanofi/examples/tail_file.c Outdated Show resolved Hide resolved
@msharee9 msharee9 force-pushed the master branch 3 times, most recently from 98e2d96 to 335ae96 Compare June 13, 2019 04:06
nanofi/examples/tail_file.c Outdated Show resolved Hide resolved
nanofi/examples/tail_file.c Outdated Show resolved Hide resolved
nanofi/src/core/string_utils.c Outdated Show resolved Hide resolved
nanofi/src/core/string_utils.c Outdated Show resolved Hide resolved
nanofi/include/core/string_utils.h Outdated Show resolved Hide resolved
nanofi/include/core/cstructs.h Show resolved Hide resolved
nanofi/src/core/string_utils.c Outdated Show resolved Hide resolved
nanofi/src/core/string_utils.c Outdated Show resolved Hide resolved
@arpadboda
Copy link
Contributor

Please also fix the build error in Travis:

/home/travis/build/apache/nifi-minifi-cpp/nanofi/src/core/string_utils.c: In function ‘tokenize_string’:
/home/travis/build/apache/nifi-minifi-cpp/nanofi/src/core/string_utils.c:66:5: error: ‘for’ loop initial declarations are only allowed in C99 mode
     for (int i = 0; i < num_strings; ++i) {
     ^
/home/travis/build/apache/nifi-minifi-cpp/nanofi/src/core/string_utils.c:66:5: note: use option -std=c99 or -std=gnu99 to compile your code
make[2]: *** [nanofi/CMakeFiles/nanofi.dir/src/core/string_utils.c.o] Error 1

Copy link
Contributor

@bakaid bakaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also create some tests if possible.

nanofi/src/core/string_utils.c Show resolved Hide resolved
nanofi/src/core/string_utils.c Outdated Show resolved Hide resolved
nanofi/include/core/cstructs.h Outdated Show resolved Hide resolved
nanofi/examples/tail_file.c Outdated Show resolved Hide resolved
nanofi/examples/tail_file.c Outdated Show resolved Hide resolved
@msharee9 msharee9 force-pushed the master branch 2 times, most recently from ea46206 to e0837ca Compare June 13, 2019 20:45
@msharee9 msharee9 force-pushed the master branch 2 times, most recently from 69e2e9a to b681714 Compare June 17, 2019 02:20

endif()

if (WIN32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't belong to the scope of this ticket, but copy-pasting this flag setting to all cmake files is something we should stop doing.

transmit_flow_files(instance);
free_flow_file_list(&ff_list);
free_flowfile(new_ff);
sleep(intrvl);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicking, but sleeping after receiving a signal is not so nice. :)

nanofi/src/core/flowfiles.c Show resolved Hide resolved
nanofi/src/core/string_utils.c Outdated Show resolved Hide resolved
Copy link
Contributor

@arpadboda arpadboda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests for newly created utils!

nanofi/src/core/string_utils.c Show resolved Hide resolved
nanofi/src/core/file_utils.c Outdated Show resolved Hide resolved
@msharee9 msharee9 force-pushed the master branch 2 times, most recently from be869dc to 80ddd46 Compare June 18, 2019 05:08

// Although there is no delimiter within the string that is equal to 4096 bytes or longer
// tail_file creates a flow file for each subsequent 4096 byte chunk. It leaves the last chunk
// if it is smaller than 4096 bytes and not delimited
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both NiFi (https://github.com/apache/nifi/blob/ea9b0db2f620526c8dd0db595cf8b44c3ef835be/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/TailFile.java#L832) and MiNiFi (https://github.com/apache/nifi-minifi-cpp/blob/master/libminifi/src/core/ProcessSession.cpp#L560) implement the TailFile processor to only ingest data (create flow files) when it encounters a delimiter. It is also in the specification of the processor (https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.9.2/org.apache.nifi.processors.standard.TailFile/).
I think that in nanofi we can deviate from the behaviour of NiFi or MiNiFi when it is necessary or beneficial.
In this case, however, I do not see how this behaviour (creating flow files from the data if it is larger than or equal to 4096 bytes even if there was no delimiter) would be beneficial. On the contrary, I think it is an unexpected behaviour, and makes the processor really hard to use, as it can potentially chunk your data into arbitrary pieces which are hard to parse and manage later on.
I am willing to be convinced that this behaviour is better (I do not think it is necessary), but so far I have not seen a reason to believe that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, I have misread how MiNiFi's TailFile works. It has two modes: with delimiter and without delimiter. If a delimiter is not set, it will read until EOF and create a flow file from all of that data.
If a delimiter is set, it will indeed only try to read at max 4095+1 bytes to find a delimiter.
Unfortunately that implementation is wrong: std::ifstream's getline will set the failbit, if it doesn't find the delimiter in the given size. We will check for this failbit, and stop reading the file without creating any FlowFiles.
And because we only set TailFile's current offset in the file based on the generated FlowFiles, this means that if TailFile encounters a line longer than 4095 bytes without a delimiter, it will stop working indefinitely (this was a bug in a previous version of this PR too).
I have just tried this out, and it indeed fails. :(
This means that MiNiFi's TailFile will have to be fixed: at a minimum, to fix the implementation, but I propose to change it to match NiFi's behaviour, and only ingest data when a delimiter is found.

Copy link
Contributor

@arpadboda arpadboda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the line that most probably generates warning.

Otherwise okay with some topics left to follow-up.


char * end = NULL;

while ((end = strchr(begin, delim))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it produce warning?
const char * is assigned to char * (end should be const, too)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strchr returns const char*, so I agree, end should be const too.

}
}

std::string join_strings(const std::vector<std::string>& strings, const std::string& token) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#585

One more reason to move that PR forward (which provides a much more generic and efficient way for doing the same).

Copy link
Contributor

@bakaid bakaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking tickets for the discussed refactor has been created, this is a review with that in mind.
Please see the inline comments.
As the unit tests test the tokenizer, and not the entire ECU, there are some untested functionality (creating and sending flow files, restoring offset). Have you run manual tests to test these?

char file_path[4096];
char delimiter[2];

if (get_property(ctx, "file_path", file_path, 50) != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buffer has been extended to 4096 bytes but here the size has been left 50.
Perhaps using sizeof(file_path) (and sizeof(delimiter) below) as the sizes would be less error-prone.

printf("Delimiter not specified or it is empty\n");
return;
}
char delim = '\0';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire section until checking whether the delimiter is an escape sequence is unnecessary.
You have already returned if strlen(delimiter) == 0, so strlen(delimiter) > 0 is true.
As strlen(delimiter) > 0 is true, the first character of the string cannot be a terminating null character, so the if (delim == '\0') check is unnecessary as well.
The entire section can be substituted with char delim = delimiter[0];

case '\\':
delim = '\\';
break;
default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning an error here instead of silently using \ as the delimiter would be better, but I can live with it.

struct stat stats;
int ret = stat(file, &stats);

errno = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you set errno here to 0, strerror(errno) will not return with stat's error.

while (head) {
free_flowfile(head->ff_record);
free(head);
head = head->next;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a use-after-free.
You have to save the next pointer before freeing the head structure.

free(node->data);
}
free(node);
memset(node, 0, sizeof(struct token_node));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a use-after-free, you can't null a struct after you have freed it.

token_node * head = tokens->head;
int i = 0;
while (head) {
printf("Token %d : %s Length = %d\n", i, head->data, strlen(head->data));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strlen returns a size_t, the printf format specifier for it should be %zu, not %d.


char * end = NULL;

while ((end = strchr(begin, delim))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strchr returns const char*, so I agree, end should be const too.

Copy link
Contributor

@phrocker phrocker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on comments and recent change to resolve a potential warning we can merge this with follow-ons. Thanks @msharee9

Anyone else feel free to create blockers if need be for any follow ons.

@bakaid
Copy link
Contributor

bakaid commented Jun 18, 2019

@phrocker please see my last review. There are - among other things - multiple use-after-free issues in the code. It should not be merged until those are fixed.

@phrocker
Copy link
Contributor

@msharee9 per @bakaid there are some use after frees that probably should be addressed. Thanks!

token_node * head = tokens->head;
int i = 0;
while (head) {
printf("Token %d : %s Length = %lu\n", i, head->data, strlen(head->data));
Copy link
Contributor

@bakaid bakaid Jun 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l is the length sub-specifier for long, strlen returns a size_t, and the sub-specifier for size_t is z, so this should be %zu not %lu

@@ -103,4 +105,5 @@ endif(ENABLE_PYTHON AND NOT STATIC_BUILD)

if (NOT DISABLE_CURL)
add_subdirectory(examples)
add_subdirectory(ecu)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this gets me super excited!

test_lists_equal(&tokens, std::vector<std::string>{std::move(s1), std::move(s2)});
}

TEST_CASE("Test tail file having more than 4096 bytes with delimiter and second chunk less than 4096", "[testTailFileWithDelimitedString]") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking another casual look to make it look like I'm useful. I like the breadth of tests! Nice!

@msharee9 msharee9 force-pushed the master branch 3 times, most recently from 7709fda to e9f123d Compare June 18, 2019 16:25
void on_trigger_callback(processor_session * ps, processor_context * ctx) {

char file_path[4096];
char delimiter[2];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This includes the terminating null character, meaning that only 1 character of the parameter will be read, so escape sequences won't work.

Copy link
Contributor

@arpadboda arpadboda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple things to follow-up here, but let's merge for now as this is getting quite big already, hence approved.

Let me apologize for not mentioning earlier, but we have UT containers added as a third party ( https://github.com/troydhanson/uthash/tree/master/src ), so we don't need to implement containers like linked list.
It's fine to be cleaned up in a follow-up, just a notice to avoid implementing collections in the future.

@phrocker
Copy link
Contributor

@bakaid Are you good with merging?

@bakaid
Copy link
Contributor

bakaid commented Jun 24, 2019

@phrocker I think the serious issues preventing merge has been fixed, but if I gather correctly @msharee9 is still working on this PR.

@asfgit asfgit closed this in e7ac27d Jun 24, 2019
nghiaxlee pushed a commit to nghiaxlee/nifi-minifi-cpp that referenced this pull request Jul 8, 2019
This closes apache#590.

Signed-off-by: Marc Parisi <phrocker@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants