Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINIFICPP-681 - Add content hash processor #445

Closed
wants to merge 1 commit into from

Conversation

arpadboda
Copy link
Contributor

Thank you for submitting a contribution to Apache NiFi - MiNiFi C++.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with MINIFICPP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically master)?

  • Is your initial contribution a single, squashed commit?

For code changes:

  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file?
  • If applicable, have you updated the NOTICE file?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

@arpadboda arpadboda force-pushed the MINIFICPP-681 branch 3 times, most recently from 4bc64af to e58aac6 Compare November 20, 2018 15:50
@phrocker
Copy link
Contributor

taking a look.

PROCESSORS.md Show resolved Hide resolved
libminifi/include/processors/ContentHash.h Outdated Show resolved Hide resolved
libminifi/src/processors/ContentHash.cpp Outdated Show resolved Hide resolved
libminifi/src/processors/ContentHash.cpp Outdated Show resolved Hide resolved
ret_val.first = digestToString(digest, SHA_DIGEST_LENGTH);
return ret_val;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these functions be combined to reduce duplication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I wanted to do but failed, given the digest, the init and the fine calls, moreover the digest length const all differ.
The header file is pure C, some tempalte magic there could easily help to reduce code duplication here, but I don't think we should do that and add it here. Maybe to a util file, but definitely not to a processor.

// Erase '-' to make sha-256 and sha-2 work, too
algoName.erase(std::remove(algoName.begin(), algoName.end(), '-'), algoName.end());

// This throws in case algo is not found, but that's fine
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might the code above be less duplicative with a simple if statement here?

// Erase '-' to make sha-256 and sha-2 work, too
algoName.erase(std::remove(algoName.begin(), algoName.end(), '-'), algoName.end());

// This throws in case algo is not found, but that's fine
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about the comment, "This throws in case algo is not found, but that's fine" What do you mean by "that's fine?"
That would cause a rollback, which may then put back pressure on the flow. This may not be desired. It doesn't allow the user to gracefully deal with the failure relationship. Might there be a way to deal with this such that failure is a condition we can account for in our relationships?

Copy link
Contributor Author

@arpadboda arpadboda Nov 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely a interesting point.
I would consider failure relationship being used for flowfile-specific and environment issues mostly. For eg. the processor expects some data to be present in attributes/content of the flowfile, and this criteria isn't met, putfile cannot write as disk is full, getfile has no permission to read, etc.
Although this case is about the processor being misconfigured, which I consider to be a bit different case. In an ideal world this shoudn't even happen as setting the property should fail in case the provided value is not one of the allowed ones.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failure is a generic term and is generally up to the independent processor to define -- your definition is not true across all processors in NiFi, so I'm fine leaving that up to the author to decide in this case; however, if the empty content case accounted for within the processor? Some have perceived empty content hashing as a potential failure case. Can make that a follow on task, though, but we should probably reach parity with NiFI eventually

libminifi/include/processors/ContentHash.h Outdated Show resolved Hide resolved

const auto& ret_val = algo(stream);

if (ret_val.second <= 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? Cryptographic hash functions ensure the result will never be empty. If the stream failed the digest string would be empty. This would make the return code unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rerturn code was originally introduced to meet the requirements of readCallback: it's int64_t return value expexts the number of bytes read to be returned.
The condition here can be removed, we can stamp the empty hash as well.

ret = stream->readData(buffer, HASH_BUFFER_SIZE);
if(ret > 0) {
MD5_Update(&context, buffer, ret);
ret_val.second += ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, below, the HashReturnType seems like it might not be necessary. If ret < 0 on any given rad you exit the conditional and loop, then you proceed to call finalize on the hash functions with that partially written context. The code then supplies a digest that is potentially incorrect. Alternatively you can simply short circuit and return an empty string on any stream error and be guaranteed that the resulting hash is an error case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the previous comment, ret is to used as return value of the readCallback.

std::string value;

attrKey_ = (context->getProperty(HashAttribute.getName(), value)) ? value : "Checksum";
algoName_ = (context->getProperty(HashAlgorithm.getName(), value)) ? value : "MD5";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default should probably be sha-256. I believe NiFi has transitioned to this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's "low power friendly," but they can deal with that through configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, changed.

@alopresto
Copy link

You can also look at CryptographicHashContent and HashService in NiFi to see how these actions are currently handled.

Copy link
Contributor

@phrocker phrocker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a "ContentHash" in the doc, but will change that upon merge. thanks!

@asfgit asfgit closed this in b53f497 Nov 29, 2018
nghiaxlee pushed a commit to nghiaxlee/nifi-minifi-cpp that referenced this pull request Jul 8, 2019
This closes apache#445.

Signed-off-by: Marc Parisi <phrocker@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants