MINIFICPP-681 - Add content hash processor #445

arpadboda · 2018-11-20T12:11:03Z

Thank you for submitting a contribution to Apache NiFi - MiNiFi C++.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with MINIFICPP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?

For code changes:

If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
If applicable, have you updated the LICENSE file?
If applicable, have you updated the NOTICE file?

For documentation related changes:

Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

phrocker · 2018-11-21T16:31:14Z

taking a look.

PROCESSORS.md

libminifi/include/processors/ContentHash.h

libminifi/src/processors/ContentHash.cpp

phrocker · 2018-11-21T16:39:59Z

libminifi/include/processors/ContentHash.h

+    ret_val.first = digestToString(digest, SHA_DIGEST_LENGTH);
+    return ret_val;
+  }
+


Can these functions be combined to reduce duplication?

Honestly I wanted to do but failed, given the digest, the init and the fine calls, moreover the digest length const all differ.
The header file is pure C, some tempalte magic there could easily help to reduce code duplication here, but I don't think we should do that and add it here. Maybe to a util file, but definitely not to a processor.

phrocker · 2018-11-21T17:04:54Z

libminifi/src/processors/ContentHash.cpp

+  // Erase '-' to make sha-256 and sha-2 work, too
+  algoName.erase(std::remove(algoName.begin(), algoName.end(), '-'), algoName.end());
+
+  // This throws in case algo is not found, but that's fine


Might the code above be less duplicative with a simple if statement here?

phrocker · 2018-11-21T17:08:28Z

libminifi/src/processors/ContentHash.cpp

+  // Erase '-' to make sha-256 and sha-2 work, too
+  algoName.erase(std::remove(algoName.begin(), algoName.end(), '-'), algoName.end());
+
+  // This throws in case algo is not found, but that's fine


Curious about the comment, "This throws in case algo is not found, but that's fine" What do you mean by "that's fine?"
That would cause a rollback, which may then put back pressure on the flow. This may not be desired. It doesn't allow the user to gracefully deal with the failure relationship. Might there be a way to deal with this such that failure is a condition we can account for in our relationships?

This is definitely a interesting point.
I would consider failure relationship being used for flowfile-specific and environment issues mostly. For eg. the processor expects some data to be present in attributes/content of the flowfile, and this criteria isn't met, putfile cannot write as disk is full, getfile has no permission to read, etc.
Although this case is about the processor being misconfigured, which I consider to be a bit different case. In an ideal world this shoudn't even happen as setting the property should fail in case the provided value is not one of the allowed ones.

Failure is a generic term and is generally up to the independent processor to define -- your definition is not true across all processors in NiFi, so I'm fine leaving that up to the author to decide in this case; however, if the empty content case accounted for within the processor? Some have perceived empty content hashing as a potential failure case. Can make that a follow on task, though, but we should probably reach parity with NiFI eventually

libminifi/include/processors/ContentHash.h

phrocker · 2018-11-21T17:22:04Z

libminifi/src/processors/ContentHash.cpp

+
+  const auto& ret_val = algo(stream);
+
+  if (ret_val.second <= 0) {


Is this necessary? Cryptographic hash functions ensure the result will never be empty. If the stream failed the digest string would be empty. This would make the return code unnecessary.

Rerturn code was originally introduced to meet the requirements of readCallback: it's int64_t return value expexts the number of bytes read to be returned.
The condition here can be removed, we can stamp the empty hash as well.

phrocker · 2018-11-21T17:25:02Z

libminifi/include/processors/ContentHash.h

+      ret = stream->readData(buffer, HASH_BUFFER_SIZE);
+      if(ret > 0) {
+        MD5_Update(&context, buffer, ret);
+        ret_val.second += ret;


As mentioned, below, the HashReturnType seems like it might not be necessary. If ret < 0 on any given rad you exit the conditional and loop, then you proceed to call finalize on the hash functions with that partially written context. The code then supplies a digest that is potentially incorrect. Alternatively you can simply short circuit and return an empty string on any stream error and be guaranteed that the resulting hash is an error case.

As per the previous comment, ret is to used as return value of the readCallback.

libminifi/include/processors/HashContent.h

phrocker · 2018-11-28T12:12:58Z

libminifi/src/processors/HashContent.cpp

+  std::string value;
+
+  attrKey_ = (context->getProperty(HashAttribute.getName(), value)) ? value : "Checksum";
+  algoName_ = (context->getProperty(HashAlgorithm.getName(), value)) ? value : "MD5";


Default should probably be sha-256. I believe NiFi has transitioned to this.

I'm not sure that's "low power friendly," but they can deal with that through configuration.

Sure, changed.

alopresto · 2018-11-28T19:31:08Z

You can also look at CryptographicHashContent and HashService in NiFi to see how these actions are currently handled.

phrocker

I see a "ContentHash" in the doc, but will change that upon merge. thanks!

This closes apache#445. Signed-off-by: Marc Parisi <phrocker@apache.org>

arpadboda force-pushed the MINIFICPP-681 branch 3 times, most recently from 4bc64af to e58aac6 Compare November 20, 2018 15:50

phrocker requested changes Nov 21, 2018

View reviewed changes

arpadboda force-pushed the MINIFICPP-681 branch from e58aac6 to c4f02d4 Compare November 22, 2018 13:57

MINIFICPP-681 - Add content hash processor

c2c8983

arpadboda force-pushed the MINIFICPP-681 branch from c4f02d4 to c2c8983 Compare November 22, 2018 22:57

phrocker reviewed Nov 28, 2018

View reviewed changes

libminifi/include/processors/HashContent.h Show resolved Hide resolved

phrocker reviewed Nov 28, 2018

View reviewed changes

phrocker approved these changes Nov 29, 2018

View reviewed changes

asfgit closed this in b53f497 Nov 29, 2018

nghiaxlee pushed a commit to nghiaxlee/nifi-minifi-cpp that referenced this pull request Jul 8, 2019

MINIFICPP-681 - Add content hash processor

d94ad7f

This closes apache#445. Signed-off-by: Marc Parisi <phrocker@apache.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MINIFICPP-681 - Add content hash processor #445

MINIFICPP-681 - Add content hash processor #445

arpadboda commented Nov 20, 2018

phrocker commented Nov 21, 2018

phrocker Nov 21, 2018

arpadboda Nov 22, 2018

phrocker Nov 21, 2018

phrocker Nov 21, 2018

arpadboda Nov 22, 2018 •

edited

phrocker Nov 28, 2018

phrocker Nov 21, 2018

arpadboda Nov 22, 2018

phrocker Nov 21, 2018

arpadboda Nov 22, 2018

phrocker Nov 28, 2018

phrocker Nov 28, 2018

arpadboda Nov 29, 2018

alopresto commented Nov 28, 2018

phrocker left a comment


		const auto& ret_val = algo(stream);

		if (ret_val.second <= 0) {

MINIFICPP-681 - Add content hash processor #445

MINIFICPP-681 - Add content hash processor #445

Conversation

arpadboda commented Nov 20, 2018

For all changes:

For code changes:

For documentation related changes:

Note:

phrocker commented Nov 21, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arpadboda Nov 22, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alopresto commented Nov 28, 2018

phrocker left a comment

Choose a reason for hiding this comment

arpadboda Nov 22, 2018 •

edited