NIFI-2565: add Grok parser #1108

selim-namsi · 2016-10-05T17:18:32Z

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?

For code changes:

Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
Have you written or updated unit tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?

For documentation related changes:

Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

trixpan · 2016-10-05T22:26:59Z

@selim-namsi thanks for putting together. I was coding this processor but happy to review it.

Few comments:

Your code is failing -Pcontrib-check. Can you please fix this?
Ideally I believe this processor should allow use to chose between content replacement (replace the original log line with the json representation) and adding attributes (what you already did). This should give an idea of what I mean: NIFI-2341 - Introduce ParseCEF processor #785
Please don't hard-code the patterns, instead, let the user configure the pattern files
The java-grok version has some bugs, you may want to upgrade it (this is the reason I haven't submitted the code previously... 😃 )

Thank you again, looking forward your modifications

selim-namsi · 2016-10-06T22:53:12Z

@trixpan Thank you for all this useful feedback, I'll start working on these modifications.
For the patterns, I hard coded the patterns because I was thinking about adding by default some useful patterns and also let the user add his custom pattern. What do you think about it ?

Thanks

trixpan · 2016-10-06T22:58:06Z

You can certainly include a file with the default patterns, however you should not hardcode them. By hardcoding them you prevent the user from optimising for speed by removing unused patterns from the pattern files (I realise they can remove from the default packaged patterns but that means they would be changing packaged files after an install, something you should always avoid)

trixpan · 2016-10-06T23:01:09Z

By the way, most of these modifications are already available here:

https://github.com/trixpan/nifi/commits/NIFI-2565

trixpan · 2016-10-08T21:58:27Z

Please rebase this PR so it only include your changes.

Cheers

selim-namsi · 2016-10-08T23:00:48Z

@trixpan could you please tell me how to rebase a PR ? I rebased my branch but I didn't find how to rebase the PR.
sorry for the inconvenience , this is my first PR.

Cheers

selim-namsi · 2016-10-10T23:22:46Z

@trixpan I figured out how to rebase the pull request :)

Cheers

markap14 · 2016-10-13T14:16:17Z

@selim-namsi Thanks for contributing this! I have actually been very interested in using NiFi to do some log parsing but hadn't really dug in very much to understand the best way to go about it. This looks like it could be very powerful!

Before we get this merged into the codebase, though, it looks like there is some work that needs to be done to the PR. The concern stems, I think, from you not yet being overly familiar with the API, as there are empty @ReadsAttributes, @WritesAttributes annotations, etc. But the great news is that the NiFi community tends to be very inclusive and will help to get everything in great shape!

One thing that I did notice is that you updated the Licensing information, which is one of the most commonly overlooked issues. So very glad that's there. I'll leave some inline feedback on things that I notice, but very much looking forward to this getting in!

markap14

I did a fairly thorough code review here. There are a lot of comments, but most of them should be very easy to address. Looking forward to seeing a revised version and merging things in!

markap14 · 2016-10-13T14:18:00Z

...e/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java

+@SeeAlso({})
+@ReadsAttributes({@ReadsAttribute(attribute="", description="")})
+@WritesAttributes({@WritesAttribute(attribute="", description="")})
+public class GrokParser extends AbstractProcessor {


The naming convention that we try to stick with for Processors is . While this may be counter-intuitive for a Java Developer, it results in making the flow much more readable for users. So we should consider ParseLog or GrokLog.

@markap14 - this is not a parser but an extractor (Grok is a hyper regex) so I suggest the name to be ExtractGrok (after ExtractText)

To this point, similar nomenclature has been used in other places:

https://github.com/DhruvKumar/nifi-grok-processor-bundle/tree/master/nifi-grok-processors/src/main/java/dhruv/nifi/processors

So the pattern of naming is 'Verb Subject'. It appears the point of this processor, from a users point of view (not the developers), is to evaluate Grok expressions against flow file content to replace that content with the result or to update a flow file attribute with that result. If that is the case we could take the approach of 'EvaluateGrok' or 'GrokEvaluateText' or 'ExtractGrok' is also fair game I think.

markap14 · 2016-10-13T14:18:33Z

...e/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java

+
+
+@Tags({"Grok Processor"})
+@CapabilityDescription("Use Grok expression ,a la logstash, to parse data.")


We should probably expand on this a bit more. Many users will not know what Grok is.

@selim-namsi

Perhaps you can use the description used as part of my WIP.

"Evaluates one or more Grok Expressions against the content of a FlowFile, adding the results as attributes or replacing the content of the FlowFile with a JSON notation of the matched content"

?

markap14 · 2016-10-13T14:20:13Z

...e/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java

+import java.util.Collections;
+
+
+@Tags({"Grok Processor"})


We should consider several more tags: grok, log, text, parse, delimit, extract

markap14 · 2016-10-13T14:20:39Z

...e/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java

+
+@Tags({"Grok Processor"})
+@CapabilityDescription("Use Grok expression ,a la logstash, to parse data.")
+@SeeAlso({})


No need for the @Seealso, @ReadsAtributes, and @WritesAttributes annotations if they are not being used.

markap14 · 2016-10-13T14:21:29Z

...e/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java

+    public static final PropertyDescriptor GROK_PATTERN_FILE = new PropertyDescriptor
+            .Builder().name("Grok Pattern file")
+            .description("Grok Pattern file definition")
+            .required(false)


If this is not required, how will the processor work if not set?

@markap14 In the first version of the code, I was loading few useful pattern files by default, so the user's custom pattern file was not required, but after removing that part I forgot to update the required attribute, I'll fix it

markap14 · 2016-10-13T14:29:34Z

...fi-standard-processors/src/test/java/org/apache/nifi/processors/standard/TestGrokParser.java

+import java.nio.file.Paths;
+
+/**
+ * Created by snamsi on 05/10/16.


We should not have usernames here, as Git will provide this information for us.

markap14 · 2016-10-13T14:30:40Z

...fi-standard-processors/src/test/java/org/apache/nifi/processors/standard/TestGrokParser.java

+
+    }
+
+    @Test(expected = java.lang.AssertionError.class)


Rather than expected an AssertionError, we should avoid calling testRunner.run() and instead just use testRunner.assertNotValid()

markap14 · 2016-10-13T14:30:44Z

...fi-standard-processors/src/test/java/org/apache/nifi/processors/standard/TestGrokParser.java

+    }
+
+
+    @Test(expected = java.lang.AssertionError.class)


Rather than expected an AssertionError, we should avoid calling testRunner.run() and instead just use testRunner.assertNotValid()

For this method "testGrokParserWithBadGrokExpression", although the processor is throwing GrokException, when I use assertNotValid, the test fails with the following message "java.lang.AssertionError: Processor appears to be valid but expected it to be invalid"

After Adding a custom validator the test passes using assertNotValid

markap14 · 2016-10-13T14:31:47Z

...les/nifi-standard-bundle/nifi-standard-processors/src/test/resources/TestGrokParser/patterns

@@ -0,0 +1,108 @@
+# Forked from https://github.com/elasticsearch/logstash/tree/v1.4.0/patterns


We have to ensure that we have proper licensing for these test files.

markap14 · 2016-10-13T14:32:35Z

...s/nifi-standard-bundle/nifi-standard-processors/src/test/resources/TestGrokParser/apache.log

@@ -0,0 +1 @@
+64.242.88.10 - - [07/Mar/2004:16:05:49 -0800] "GET /twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12846


We have to ensure that we have proper licensing for these test files. This one may be one that you created yourself? If not, we need to ensure that its license is properly accounted for - or just mock out a new one.

selim-namsi · 2016-10-17T20:00:33Z

@markap14 Thanks for all this suggestions, I'll update the code ASAP and push the changes!

trixpan · 2016-10-21T13:56:48Z

@selim-namsi - good work. We are getting there.

selim-namsi · 2016-10-25T23:47:20Z

@trixpan @markap14 I pushed the new changes.
Could you please check the changes ?

Thanks!

mattyb149 · 2016-10-26T14:54:16Z

Looks like there are new merge issues, do you mind rebasing against the latest master?

trixpan · 2016-11-14T22:39:44Z

...ifi-processor-utilities/src/main/java/org/apache/nifi/processor/util/StandardValidators.java

@@ -26,6 +26,8 @@
 import java.util.concurrent.TimeUnit;
 import java.util.regex.Pattern;

+import oi.thekraken.grok.api.Grok;


This validation routine should not be added to standard validators in order to avoid importing grok into the standard validator

agreed. Would just pull it into the processor class itself.

joewitt · 2016-11-15T02:27:58Z

nifi-assembly/LICENSE

This whole license section can be removed. This is the assembly license which is to cover all binary artifacts and source in the build of nifi itself. The dependency of java-grok is binary only (not source) and is ASLv2 so nothing needs to be in this license for it. There should be an entry for this in the notice similar to the many ASLv2 examples in there. The only thing needing mentioned then is the copyright line from the project's license file https://github.com/thekrakken/java-grok/blob/master/LICENSE. Also, this nifi-asembly/NOTICE change needed will also need to be in the NOTICE of the nifi-standard-nar as well.

Lots of words above but the short version is "No license change needed. Just add a small section to the nar NOTICE and assembly NOTICE to reflect this ASLv2 dependency specifically because it has a copyright reference in the license."

trixpan · 2016-11-15T09:38:51Z

nifi-commons/nifi-processor-utilities/pom.xml

+        <dependency>
+            <groupId>io.thekraken</groupId>
+            <artifactId>grok</artifactId>
+            <version>0.1.4</version>


A new version has been released today and contains important fixes (reduced depencies, better feature parity with logstash, etc). May I suggest we upgrade the dependency?

selim-namsi · 2016-12-14T02:17:03Z

@trixpan @joewitt Sorry for the long delay, I applied the changes that you suggested such as, adding the custom validator in the processor class, use the new version of grok and removing the license section from the assembly license

trixpan

@selim-namsi thanks for putting this together. I have made some minor feedback but they are minor so I don't think they should block merge.

@joewitt @markap14 anything I may be missing?

trixpan · 2017-01-16T14:10:28Z

.../nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ExtractGrok.java

+import java.util.concurrent.TimeUnit;
+
+
+@Tags({"Grok Processor", "grok", "log", "text", "parse", "delimit", "extract"})


"Grok Processor" looks a bit out of place but should not prevent merge.

trixpan · 2017-01-16T14:11:32Z

.../nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ExtractGrok.java

+        "adding the results as attributes or replacing the content of the FlowFile with a JSON " +
+        "notation of the matched content")
+@WritesAttributes({
+        @WritesAttribute(attribute = "grok.XXX", description = "Each of the Grok identifier that is matched in the flowfile will be added as an attribute, prefixed with \"grok.\" For example," +


Isn't this just applicable if using flowfile-attribute as destination?

trixpan · 2017-02-06T15:04:22Z

@joewitt when you have time can you have a final look on this one? LGTM and I am happy to address the two remaining cosmetic comments as part of a separate PR.

This closes apache#1108. Signed-off-by: Andre F de Miranda <trixpan@users.noreply.github.com>

selim-namsi closed this Oct 8, 2016

selim-namsi reopened this Oct 8, 2016

selim-namsi force-pushed the nifi-2565 branch from 5cb2a31 to 632df23 Compare October 10, 2016 22:44

selim-namsi force-pushed the nifi-2565 branch from 632df23 to 94f8a96 Compare October 10, 2016 23:35

markap14 requested changes Oct 13, 2016

View reviewed changes

selim-namsi force-pushed the nifi-2565 branch from e7a2833 to 44e6b64 Compare October 25, 2016 23:41

selim-namsi force-pushed the nifi-2565 branch 3 times, most recently from 4a4d733 to 3227372 Compare October 27, 2016 01:27

selim-namsi force-pushed the nifi-2565 branch from 341243b to 7c0f6a8 Compare November 14, 2016 21:52

trixpan suggested changes Nov 14, 2016

View reviewed changes

joewitt reviewed Nov 15, 2016

View reviewed changes

trixpan reviewed Nov 15, 2016

View reviewed changes

selim-namsi force-pushed the nifi-2565 branch 5 times, most recently from 5eccbc5 to 291e855 Compare December 14, 2016 02:12

nifi-2565: add Grok parser

c09d2cc

selim-namsi force-pushed the nifi-2565 branch from 291e855 to c09d2cc Compare December 14, 2016 02:15

trixpan reviewed Jan 16, 2017

View reviewed changes

asfgit closed this in 2ef7c15 Feb 14, 2017

aperepel pushed a commit to aperepel/nifi that referenced this pull request Mar 29, 2017

NIFI-2565: add Grok parser

6ec70dd

This closes apache#1108. Signed-off-by: Andre F de Miranda <trixpan@users.noreply.github.com>



		@Tags({"Grok Processor"})
		@CapabilityDescription("Use Grok expression ,a la logstash, to parse data.")

		@@ -0,0 +1,108 @@
		# Forked from https://github.com/elasticsearch/logstash/tree/v1.4.0/patterns

		@@ -0,0 +1 @@
		64.242.88.10 - - [07/Mar/2004:16:05:49 -0800] "GET /twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12846

		import java.util.concurrent.TimeUnit;


		@Tags({"Grok Processor", "grok", "log", "text", "parse", "delimit", "extract"})

NIFI-2565: add Grok parser #1108

NIFI-2565: add Grok parser #1108

Conversation

selim-namsi commented Oct 5, 2016 • edited

For all changes:

For code changes:

For documentation related changes:

Note:

trixpan commented Oct 5, 2016

selim-namsi commented Oct 6, 2016

trixpan commented Oct 6, 2016

trixpan commented Oct 6, 2016

trixpan commented Oct 8, 2016

selim-namsi commented Oct 8, 2016

selim-namsi commented Oct 10, 2016

markap14 commented Oct 13, 2016

markap14 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

selim-namsi commented Oct 17, 2016

trixpan commented Oct 21, 2016

selim-namsi commented Oct 25, 2016

mattyb149 commented Oct 26, 2016

trixpan Nov 14, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

selim-namsi commented Dec 14, 2016

trixpan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trixpan commented Feb 6, 2017

selim-namsi commented Oct 5, 2016 •

edited

trixpan Nov 14, 2016 •

edited