New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MINIFICPP-681 - Add content hash processor #445
Conversation
4bc64af
to
e58aac6
Compare
taking a look. |
ret_val.first = digestToString(digest, SHA_DIGEST_LENGTH); | ||
return ret_val; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these functions be combined to reduce duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I wanted to do but failed, given the digest, the init and the fine calls, moreover the digest length const all differ.
The header file is pure C, some tempalte magic there could easily help to reduce code duplication here, but I don't think we should do that and add it here. Maybe to a util file, but definitely not to a processor.
// Erase '-' to make sha-256 and sha-2 work, too | ||
algoName.erase(std::remove(algoName.begin(), algoName.end(), '-'), algoName.end()); | ||
|
||
// This throws in case algo is not found, but that's fine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might the code above be less duplicative with a simple if statement here?
// Erase '-' to make sha-256 and sha-2 work, too | ||
algoName.erase(std::remove(algoName.begin(), algoName.end(), '-'), algoName.end()); | ||
|
||
// This throws in case algo is not found, but that's fine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious about the comment, "This throws in case algo is not found, but that's fine" What do you mean by "that's fine?"
That would cause a rollback, which may then put back pressure on the flow. This may not be desired. It doesn't allow the user to gracefully deal with the failure relationship. Might there be a way to deal with this such that failure is a condition we can account for in our relationships?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely a interesting point.
I would consider failure relationship being used for flowfile-specific and environment issues mostly. For eg. the processor expects some data to be present in attributes/content of the flowfile, and this criteria isn't met, putfile cannot write as disk is full, getfile has no permission to read, etc.
Although this case is about the processor being misconfigured, which I consider to be a bit different case. In an ideal world this shoudn't even happen as setting the property should fail in case the provided value is not one of the allowed ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failure is a generic term and is generally up to the independent processor to define -- your definition is not true across all processors in NiFi, so I'm fine leaving that up to the author to decide in this case; however, if the empty content case accounted for within the processor? Some have perceived empty content hashing as a potential failure case. Can make that a follow on task, though, but we should probably reach parity with NiFI eventually
|
||
const auto& ret_val = algo(stream); | ||
|
||
if (ret_val.second <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? Cryptographic hash functions ensure the result will never be empty. If the stream failed the digest string would be empty. This would make the return code unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rerturn code was originally introduced to meet the requirements of readCallback: it's int64_t return value expexts the number of bytes read to be returned.
The condition here can be removed, we can stamp the empty hash as well.
ret = stream->readData(buffer, HASH_BUFFER_SIZE); | ||
if(ret > 0) { | ||
MD5_Update(&context, buffer, ret); | ||
ret_val.second += ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, below, the HashReturnType seems like it might not be necessary. If ret < 0 on any given rad you exit the conditional and loop, then you proceed to call finalize on the hash functions with that partially written context. The code then supplies a digest that is potentially incorrect. Alternatively you can simply short circuit and return an empty string on any stream error and be guaranteed that the resulting hash is an error case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the previous comment, ret is to used as return value of the readCallback.
e58aac6
to
c4f02d4
Compare
c4f02d4
to
c2c8983
Compare
std::string value; | ||
|
||
attrKey_ = (context->getProperty(HashAttribute.getName(), value)) ? value : "Checksum"; | ||
algoName_ = (context->getProperty(HashAlgorithm.getName(), value)) ? value : "MD5"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default should probably be sha-256. I believe NiFi has transitioned to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that's "low power friendly," but they can deal with that through configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, changed.
You can also look at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a "ContentHash" in the doc, but will change that upon merge. thanks!
This closes apache#445. Signed-off-by: Marc Parisi <phrocker@apache.org>
Thank you for submitting a contribution to Apache NiFi - MiNiFi C++.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with MINIFICPP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?
For code changes:
For documentation related changes:
Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.