NIFI-1705 - Adding AttributesToCSV processor #2711

joetrite · 2018-05-16T19:31:31Z

New pull request replacing
#1589

commit 9c31e45 Author: Joe Trite <joetrite@gmail.com> Date: Mon Mar 13 07:55:19 2017 -0400 NIFI-1705 Adding AttributesToCSV processor commit 5e9afa9 Merge: 3177eb1 74cbfc4 Author: Joe Trite <joetrite@gmail.com> Date: Sat Mar 4 08:12:39 2017 -0500 Merge remote-tracking branch 'origin/master' # Conflicts: # nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ScanAttribute.java commit 3177eb1 Author: Joe Trite <joetrite@gmail.com> Date: Sat Mar 4 08:09:06 2017 -0500 NIFI-3497 Converted properties to use displayName, changed validator on demarcator property and created a method for duplicate code. commit 74cbfc4 Merge: a974f78 f8cad0f Author: Joe Trite <joetrite@gmail.com> Date: Sat Mar 4 07:47:46 2017 -0500 Merge branch 'master' into master commit a974f78 Author: Joe Trite <joetrite@gmail.com> Date: Sat Mar 4 07:43:02 2017 -0500 NIFI-3497 Converted properties to use displayName, changed validator on demarcator property and created a method for duplicate code. commit 1bfaef8 Merge: 65ed46d 89ec68d Author: Joe Trite <joetrite@gmail.com> Date: Fri Mar 3 08:01:59 2017 -0500 Merge branch 'master' of https://github.com/joetrite/nifi commit 65ed46d Author: Joe Trite <joetrite@gmail.com> Date: Fri Feb 24 18:09:36 2017 -0500 NIFI-3497 - fixing Pcontrib issues commit c5d52cf Author: Joe Trite <joetrite@gmail.com> Date: Thu Feb 23 10:19:01 2017 -0500 NIFI-3497 - excluding test files Adding new test data files to exclude list. commit b195934 Author: Joe Trite <joetrite@gmail.com> Date: Wed Feb 22 16:48:10 2017 -0500 NIFI-3497 - New dictionary files for test Adding new dictionary files to support metadata dictionary option. commit e296268 Author: Joe Trite <joetrite@gmail.com> Date: Wed Feb 22 16:46:13 2017 -0500 NIFI-3497 test cases for metadata updates Adding test cases to support metadata option update. commit de7e348 Author: Joe Trite <joetrite@gmail.com> Date: Wed Feb 22 16:36:08 2017 -0500 NIFI-3497 - Added metadata option Added optional to post additional metadata as new attributed if a match is found in the dictionary. commit 89ec68d Author: Joe Trite <joetrite@gmail.com> Date: Fri Feb 24 18:09:36 2017 -0500 NIFI-3497 - fixing Pcontrib issues commit d714260 Author: Joe Trite <joetrite@gmail.com> Date: Thu Feb 23 10:19:01 2017 -0500 NIFI-3497 - excluding test files Adding new test data files to exclude list. commit a7a7b6a Author: Joe Trite <joetrite@gmail.com> Date: Wed Feb 22 16:48:10 2017 -0500 NIFI-3497 - New dictionary files for test Adding new dictionary files to support metadata dictionary option. commit 8eb54a5 Author: Joe Trite <joetrite@gmail.com> Date: Wed Feb 22 16:46:13 2017 -0500 NIFI-3497 test cases for metadata updates Adding test cases to support metadata option update. commit f52e1f2 Author: Joe Trite <joetrite@gmail.com> Date: Wed Feb 22 16:36:08 2017 -0500 NIFI-3497 - Added metadata option Added optional to post additional metadata as new attributed if a match is found in the dictionary.

MikeThomsen

Overall looking solid. Needs a little cleanup.

MikeThomsen · 2018-05-23T12:33:28Z

...i-standard-processors/src/main/java/org/apache/nifi/processors/standard/AttributesToCSV.java

+            .build();
+
+    public static final PropertyDescriptor NULL_VALUE_FOR_EMPTY_STRING = new PropertyDescriptor.Builder()
+            .name(("null-value"))


You have an extra set of parenthesis here.

MikeThomsen · 2018-05-23T12:34:19Z

...i-standard-processors/src/main/java/org/apache/nifi/processors/standard/AttributesToCSV.java

+                    "string will be placed in the CSV")
+            .required(true)
+            .allowableValues("true", "false")
+            .defaultValue("false")


Should add StandardValidators.BOOLEAN_VALIDATOR here.

MikeThomsen · 2018-05-23T12:34:36Z

...i-standard-processors/src/main/java/org/apache/nifi/processors/standard/AttributesToCSV.java

+                    "contained in every FlowFile, should be included in the final CSV value generated.  The Attribute List property " +
+                    "overrides this setting.")
+            .required(true)
+            .allowableValues("true", "false")


Should add StandardValidators.BOOLEAN_VALIDATOR

MikeThomsen · 2018-05-23T12:43:49Z

...i-standard-processors/src/main/java/org/apache/nifi/processors/standard/AttributesToCSV.java

+            if (ats != null) {
+                for (String str : ats) {
+                    String trim = str.trim();
+                        result.add(trim);


Extra indentation.

MikeThomsen · 2018-05-23T12:48:54Z

...andard-processors/src/test/java/org/apache/nifi/processors/standard/TestAttributesToCSV.java

+        testRunner.setProperty(AttributesToCSV.NULL_VALUE_FOR_EMPTY_STRING, "false");
+
+
+        ProcessSession session = testRunner.getProcessSessionFactory().createSession();


You don't need this. You can just do this:

Map<String, String> attrs = new HashMap<String, String>(){{ put("x", "1"); put("y", "2"); }}; testRunner.enqueue("some-data", attrs);

MikeThomsen · 2018-05-23T12:49:36Z

...andard-processors/src/test/java/org/apache/nifi/processors/standard/TestAttributesToCSV.java

+        testRunner.setProperty(AttributesToCSV.NULL_VALUE_FOR_EMPTY_STRING, "false");
+
+        ProcessSession session = testRunner.getProcessSessionFactory().createSession();
+        FlowFile ff = session.create();


See other comment about this. You can wipe out these references and use the cleaner map-driven approach.

ottobackwards · 2018-05-23T14:36:26Z

...i-standard-processors/src/main/java/org/apache/nifi/processors/standard/AttributesToCSV.java

+            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
+            .expressionLanguageSupported(ExpressionLanguageScope.FLOWFILE_ATTRIBUTES)
+            .build();
+


I don't know if destination is the name that should be used here. It almost always is use with relationships and connections.

I know there are other processors that do attributes -or- flow file output, I just can't find one right now. But the name here should be similar to that.
@MikeThomsen do you know of one?

I copied the AttributesToJSON code so i just inherited these names. I'll change it to whatever you guys recommend.

You know what, I am wrong, we are consistent in using it inconsistently, please disregard. Sorry for the noise.

ottobackwards · 2018-05-23T14:40:44Z

...i-standard-processors/src/main/java/org/apache/nifi/processors/standard/AttributesToCSV.java

+            .displayName("Attribute List")
+            .description("Comma separated list of attributes to be included in the resulting CSV. If this value " +
+                    "is left empty then all existing Attributes will be included. This list of attributes is " +
+                    "case sensitive and does not support attribute names that contain commas. If an attribute specified in the list is not found it will be emitted " +


why not use http://commons.apache.org/proper/commons-lang/javadocs/api-release/org/apache/commons/lang3/StringEscapeUtils.html as you do with values for the attributes and headers?

will do, thanks for the feedback.

bdesert · 2018-05-23T15:34:46Z

@joetrite , I would like to mention it under the current PR - there was a discussion about having "record-aware" output. This will need to be addressed, but I believe is not part of this PR, as there are use cases when having schema will limit capabilities of this processor. I.e. when list of attributes is empty, it is expected to get all flow file attributes on output, but the list is unknown, so a schema cannot be pre-defined.

@mattyb149 , as alternative, we can issue enhancement jira for AttributeToRecord to address schema-specific extracts.

bdesert · 2018-05-23T15:35:55Z

@joetrite , while AttributesToJSON has regex support for attributes, new AttributesToCSV doesn't have it - is there a reason for not keeping consistency?

bdesert

Added some java related questions/comments

bdesert · 2018-05-23T15:36:58Z

...i-standard-processors/src/main/java/org/apache/nifi/processors/standard/AttributesToCSV.java

+        "escaped with double quotes.  Any double quote characters in the attribute value are escaped with " +
+        "another double quote.  If the attribute value does not contain a comma, newline or double quote, then the " +
+        "attribute value is returned unchanged.")
+@WritesAttribute(attribute = "CSVAttributes", description = "CSV representation of Attributes")


In case a destination is 'content', CSVAttributes won't be written at all (neither null nor empty string). We could mention that in description.

bdesert · 2018-05-23T15:37:16Z

...i-standard-processors/src/main/java/org/apache/nifi/processors/standard/AttributesToCSV.java

+                    "and the 'Include Core Attributes' property is false, the core attribute will be included. The attribute list " +
+                    "ALWAYS wins.")
+            .required(false)
+            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)


Since the attribute supports EL, can we use NON_EMPTY_EL_VALIDATOR?

bdesert · 2018-05-23T15:37:44Z

...i-standard-processors/src/main/java/org/apache/nifi/processors/standard/AttributesToCSV.java

+                    "or written in the flowfile content. Writing to flowfile content will overwrite any " +
+                    "existing flowfile content.")
+            .required(true)
+            .allowableValues(OUTPUT_NEW_ATTRIBUTE, OUTPUT_OVERWRITE_CONTENT)


Use AllowableValue Class - it's more flexible than just a string. So, on line 91 for description of "destination" you don't need to put complex explanation, instead - add description for each allowable value separately.

bdesert · 2018-05-23T15:38:17Z

...i-standard-processors/src/main/java/org/apache/nifi/processors/standard/AttributesToCSV.java

+            .build();
+
+    public static final PropertyDescriptor INCLUDE_CORE_ATTRIBUTES = new PropertyDescriptor.Builder()
+            .name("include-core-userSpecifiedAttributes")


a name of the property suggests 'userSpecifiedAttributes', which is confusing in conjunction with prefix 'include-core-'. IMO should be just 'include-core-attributes'

bdesert · 2018-05-23T15:41:56Z

...i-standard-processors/src/main/java/org/apache/nifi/processors/standard/AttributesToCSV.java

+
+    private List<PropertyDescriptor> properties;
+    private Set<Relationship> relationships;
+    private volatile Set<String> userSpecifiedAttributes;


This variable isn't being reused as object level variable. Also, you reset it every time in onTrigger, and pass as param to other methods. IMO it's better to keep it as local variable in onTrigger method.

bdesert · 2018-05-23T15:44:50Z

...i-standard-processors/src/main/java/org/apache/nifi/processors/standard/AttributesToCSV.java

+            String[] ats = StringUtils.split(attributeList, OUTPUT_SEPARATOR);
+            if (ats != null) {
+                for (String str : ats) {
+                    String trim = str.trim();


There is no point to define local var, since it's not being reused. You can just have result.add(str.trim());

bdesert · 2018-05-23T15:48:44Z

...i-standard-processors/src/main/java/org/apache/nifi/processors/standard/AttributesToCSV.java

+
+    private Set<String> attributeListStringToSet(String attributeList) {
+        //take the user specified attribute list string and convert to list of strings.
+        Set<String> result = new HashSet<>();


By using HashSet you will ignore order of provided attributes, so you won't have control on CSV structure. If you change it to LinkedHashSet - you'll honor the order.

bdesert · 2018-05-23T15:49:59Z

...i-standard-processors/src/main/java/org/apache/nifi/processors/standard/AttributesToCSV.java

+        Map<String, String> result;
+        if (!userSpecifiedAttributes.isEmpty()) {
+            //the user gave a list of attributes
+            result = new HashMap<>(userSpecifiedAttributes.size());


By using HashMap you will ignore order of provided attributes, so you won't have control on CSV structure. If you change it to LinkedHashMap - you'll honor the order. Should come along with a change for HashSet (commented below)

bdesert · 2018-05-23T15:54:07Z

...i-standard-processors/src/main/java/org/apache/nifi/processors/standard/AttributesToCSV.java

+            for (String coreAttribute : coreAttributes) {
+                //make sure this coreAttribute is applicable to this flowfile.
+                String val = ff.getAttribute(coreAttribute);
+                if(ff.getAttributes().containsKey(coreAttribute)) {


since ff.getAttributes() creates new object every time you call it, it's better to call it once outside the loop (before line 186).

bdesert · 2018-05-23T15:58:09Z

...i-standard-processors/src/main/java/org/apache/nifi/processors/standard/AttributesToCSV.java

+        final int atrListSize = atrList.values().size() -1;
+        final StringBuilder sb = new StringBuilder();
+        for (final String val : atrList.values()) {
+            sb.append(StringEscapeUtils.escapeCsv(val));


It looks like StringEscapeUtils class is deprecated. It is suggested to use org.apache.commons.text.StringEscapeUtils instead of lang3

joetrite · 2018-05-23T20:43:26Z

@bdesert it looks like the regex support was added to AttributesToJSON after copied the code. I'll add regex support to this processor and check if anything else has changed.

bdesert

Looks good. Test suite has been updated to test regex based attribute list. I'll build and test it later tonight on local nifi, and will update with results.

bdesert · 2018-05-24T22:36:52Z

...i-standard-processors/src/main/java/org/apache/nifi/processors/standard/AttributesToCSV.java

+            return;
+        }
+        if(context.getProperty(ATTRIBUTES_REGEX).isSet()) {
+            pattern = Pattern.compile(context.getProperty(ATTRIBUTES_REGEX).evaluateAttributeExpressions(original).getValue());


Local usage of variable only. Should not be defined as field/instance member (same as attributeList).

bdesert · 2018-05-25T05:42:08Z

@joetrite , @MikeThomsen ,
So, I tested the processor. It works OK.
But I've got a question.
In JSON it's easy - json structure will include both attribute name and attribute value. in CSV - there will be only values on the output. Don't you think it would be useful to add a header, or attribute with avro-like generated schema? I agree that specified attributes will be in provided order, but if regex is used - then user won't be able to relate a value to a name. The same is about core attributes - since we add only existing not empty.
If we decide to add avro-like schema, we gonna have problems - attribute names could be non Avro-safe.

joetrite · 2018-05-25T16:03:15Z

@bdesert good catch, we should add the option to push the header to attribute or content based on the selection. The header should also be csv format so that when the header and data are in the output file they will have the same schema.

bdesert

Please add test cases to cover core attributes in schema details attribute/content (both new test cases have only regex coverage)

bdesert · 2018-05-25T19:40:43Z

...i-standard-processors/src/main/java/org/apache/nifi/processors/standard/AttributesToCSV.java

 public class AttributesToCSV extends AbstractProcessor {
-    private static final String OUTPUT_ATTRIBUTE_NAME = "CSVAttributes";
+    private static final String DATA_ATTRIBUTE_NAME = "CSVData";
+    private static final String SCHEMA_ATTRIBUTE_NAME = "CSVSchema";


This attribute should be declared @ WritesAttribute(....)

Also, there are style-check errors related to "if" one-line statement.

bdesert · 2018-05-28T23:33:43Z

@joetrite , no more comments from my side, thanks for addressing code review comments!
@MikeThomsen , LGTM. Could you please give a second look on it now? I believe it's ready, but there are issues with build, which seems not related to this PR.

mattyb149 · 2018-06-07T02:37:53Z

...i-standard-processors/src/main/java/org/apache/nifi/processors/standard/AttributesToCSV.java

+                    "or written in the flowfile content.")
+            .required(true)
+            .allowableValues(OUTPUT_NEW_ATTRIBUTE, OUTPUT_OVERWRITE_CONTENT)
+            .defaultValue(OUTPUT_NEW_ATTRIBUTE.getDisplayName())


The "write to attribute" is a cool concept, I thought it might also be beneficial in other areas (including UpdateAttribute), so I wrote up https://issues.apache.org/jira/browse/NIFI-5276 to cover it (and will soon have a PR). Were it included, would it preclude you from adding such a property to this processor? If not then no worries, just wanted to mention it, in case the other Jira would take care of that use case instead.

Nevermind about NIFI-5276, turns out you can already do this with allMatchingAttributes. Since that is the case, do you think it is still useful to have the same type of feature here, or should it just support flowfile content as a destination? The latter would make it a little easier to configure the processor as there wouldn't be a Destination property, and the associated doc would be unnecessary as well

we are writting two attributes, the CSVSchema(attribute names) and the CSVData(attribute values). it looks like allMatchingAttributes will take care of the CSVData attribute nicely, but I don't see any option to handle the schema attribute. I don't see anything like allMatchingAttributeNames, which would take care of the CSVSchema part.

Good point. It would be nice to have something in EL that matches on attribute names and returns them instead of evaluating them to their values, I'll take a look at what that would entail. In the meantime it's cool to have that here, running some final tests but things are looking good :)

mattyb149 · 2018-06-07T19:03:53Z

+1 LGTM, thanks all for the reviews and @joetrite for the new feature! I ran unit tests and tried various combinations of options on a live NiFi, everything looks good, merging to master.

MikeThomsen reviewed May 23, 2018

View reviewed changes

ottobackwards reviewed May 23, 2018

View reviewed changes

bdesert reviewed May 23, 2018

View reviewed changes

NIFI-1705 Adding AttributesToCSV processor.

df4444c

bdesert reviewed May 24, 2018

View reviewed changes

NIFI-1705 Adding AttributesToCSV processor.

4e645d1

bdesert requested changes May 25, 2018

View reviewed changes

NIFI-1705 Adding AttributesToCSV processor.

de9c4b8

mattyb149 reviewed Jun 7, 2018

View reviewed changes

asfgit closed this in f536487 Jun 7, 2018

		testRunner.setProperty(AttributesToCSV.NULL_VALUE_FOR_EMPTY_STRING, "false");


		ProcessSession session = testRunner.getProcessSessionFactory().createSession();

NIFI-1705 - Adding AttributesToCSV processor #2711

NIFI-1705 - Adding AttributesToCSV processor #2711

Conversation

joetrite commented May 16, 2018

MikeThomsen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdesert commented May 23, 2018

bdesert commented May 23, 2018

bdesert left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joetrite commented May 23, 2018

bdesert left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdesert commented May 25, 2018

joetrite commented May 25, 2018

bdesert left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdesert commented May 28, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattyb149 commented Jun 7, 2018