NIFI-5147 Implement CalculateAttributeHash processor #2980

alopresto · 2018-08-31T07:19:23Z

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?

For code changes:

Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
Have you written or updated unit tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?

For documentation related changes:

Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

- added properties to control behavior when attributes that are configured are partially or completely missing - set charset with a property - added tests

…hContent. Added unit tests.

Cleaned up typos and descriptions. Added unit test demonstrating missing Blake2 algorithm.

…implementation. Updated unit test.

Unit test for all default test vectors passes.

Added unit tests.

…s, character encoding, and defaults.

All unit tests pass.

…stRunner, and MockProcessContext.

…n name matching. Added unit test.

Added unit tests.

alopresto · 2018-08-31T07:23:43Z

This encapsulates the changes @ottobackwards made in PR 2836, but also:

Adds the SHA-224, SHA-512/224, SHA-512/256, SHA-3 (SHA3-224, SHA3-256, SHA3-384, SHA3-512), and BLAKE2 (BLAKE2-160, BLAKE2-256, BLAKE2-384, BLAKE2-512) functions
Moves the hashing functionality into an enum and service which can be reused by HashContent
Clearly marks cryptographically broken algorithms as such
Adds unit tests

I will open follow-on issues to:

Add documentation to HashAttribute to explain the different scenarios where these processors are used
Refactor HashContent to use the HashService

alopresto · 2018-08-31T07:28:03Z

...fi-standard-processors/src/main/java/org/apache/nifi/security/util/crypto/HashAlgorithm.java

+     * * SHA-512 (SHA2)
+     * * SHA-512/224 (SHA2)
+     * * SHA-512/256 (SHA2)
+     * * SHA3-256


Add SHA3-224 and Blake2b-160.

ottobackwards

This looks great, just a couple of comments

ottobackwards · 2018-09-04T11:43:47Z

...fi-standard-processors/src/main/java/org/apache/nifi/security/util/crypto/HashAlgorithm.java

+    public boolean isStrongAlgorithm() {
+        return (!BROKEN_ALGORITHMS.contains(name));
+    }
+


What is the isBlake2 check about? Is there a way to make it more general? It seems strange to call out by the name as opposed to the "why"

The Blake2 implementations need BouncyCastle and use different API calls than the other MessageDigest instances.

ottobackwards · 2018-09-04T11:46:12Z

...nifi-standard-processors/src/main/java/org/apache/nifi/security/util/crypto/HashService.java

+            return traditionalHash(algorithm, value);
+        }
+    }
+


Could we put this functionality in the enum and simplify this class? Having the specialization there?

I don't think it makes sense to move the execution logic into the enum. The enum is there to capture metadata about the acceptable values, while the logic is independent from that selection.

thenatog · 2018-09-07T17:48:02Z

Reviewing..

thenatog · 2018-09-07T21:48:37Z

Tested out the HashAttribute processor. This all worked fine:

MD5 and creating a new attribute
MD5 and overwriting the attribute with hashed value
SHA256 and creating a new attribute
MD5 of chinese characters using UTF-8 (matched web tool hasher and command line md5 utility)

UTF-16 is where I came unstuck:

MD5 of simple string using "UTF-16" encoding, I get a different hash to what I expect.
MD5 of simple string using "UTF-16BE" and "UTF-16LE" encoding DO match what I expect.

Test input string in all cases: “hehe”

NiFi CalculateAttributeHash:
UTF-8:MD5 = 529ca8050a00180790cf88b63468826a
UTF-16BE:MD5 = b0ed26b524e0b0606551d78e42b5b7bc
UTF-16LE:MD5 = 2db0ecc27f7abd29ba95412feb3b5e07
UTF-16:MD5 = 9b6dcd3887ebdb43d66fb4b3ef9c259b

CyberChef (https://gchq.github.io/CyberChef/#recipe=Encode_text('UTF16BE%20(1201)')MD5()&input=aGVoZQ):
UTF-8:MD5 = 529ca8050a00180790cf88b63468826a
UTF-16BE:MD5 = b0ed26b524e0b0606551d78e42b5b7bc
UTF-16LE:MD5 = 2db0ecc27f7abd29ba95412feb3b5e07

I found that “UTF-16” is different because when encoding, Java adds a big-endian BOM: “When decoding, the UTF-16 charset interprets the byte-order mark at the beginning of the input stream to indicate the byte-order of the stream but defaults to big-endian if there is no byte-order mark; when encoding, it uses big-endian byte order and writes a big-endian byte-order mark.” As expected, adding the BOM changes the output bytes which are then hashed, resulting in a different hash to “UTF-16BE” encoding. Is this a problem or is this simply expected behaviour - ie. should the user realize that there will be a difference between UTF-16 and UTF-16BE encoding and the resulting hash?

alopresto · 2018-09-08T04:24:15Z

Thanks for discovering this @thenatog . This is an excellent catch.

I've added behavior to catch this, better documentation, and unit tests. However, I added them on the branch that includes PR 2983. Let's mark this PR as closed and just review the other one, as it is more complete and addresses this issue.

2018-09-07 21:21:19,784 WARN [Timer-Driven Process Thread-6] o.a.n.security.util.crypto.HashService The charset provided was UTF-16, but Java will insert a Big Endian BOM in the decoded message before hashing, so switching to UTF-16BE
2018-09-07 21:21:19,797 INFO [Timer-Driven Process Thread-9] o.a.n.processors.standard.LogAttribute LogAttribute[id=b15f3209-344d-10a6-4a7b-454530bb72fc] logging for flow file StandardFlowFileRecord[uuid=a4a223fb-aa11-43b9-93a3-d7675c44593c,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1536378604366-1, container=default, section=1], offset=56, length=4],offset=0,name=33467912436349,size=4]
--------------------[SUCCESS] --------------------
Standard FlowFile Attributes
Key: 'entryDate'
	Value: 'Fri Sep 07 21:21:19 PDT 2018'
Key: 'lineageStartDate'
	Value: 'Fri Sep 07 21:21:19 PDT 2018'
Key: 'fileSize'
	Value: '4'
FlowFile Attribute Map Content
Key: 'filename'
	Value: '33467912436349'
Key: 'path'
	Value: './'
Key: 'test_attribute'
	Value: 'hehe'
Key: 'test_attribute_md5_utf16le'
	Value: '2db0ecc27f7abd29ba95412feb3b5e07'
Key: 'uuid'
	Value: 'a4a223fb-aa11-43b9-93a3-d7675c44593c'
--------------------[SUCCESS] --------------------
hehe
2018-09-07 21:21:19,799 INFO [Timer-Driven Process Thread-9] o.a.n.processors.standard.LogAttribute LogAttribute[id=b15f3209-344d-10a6-4a7b-454530bb72fc] logging for flow file StandardFlowFileRecord[uuid=b7459e40-500b-488d-a0dc-3e09ebc6b86e,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1536378604366-1, container=default, section=1], offset=56, length=4],offset=0,name=33467912436349,size=4]
--------------------[SUCCESS] --------------------
Standard FlowFile Attributes
Key: 'entryDate'
	Value: 'Fri Sep 07 21:21:19 PDT 2018'
Key: 'lineageStartDate'
	Value: 'Fri Sep 07 21:21:19 PDT 2018'
Key: 'fileSize'
	Value: '4'
FlowFile Attribute Map Content
Key: 'filename'
	Value: '33467912436349'
Key: 'path'
	Value: './'
Key: 'test_attribute'
	Value: 'hehe'
Key: 'test_attribute_md5_utf16'
	Value: 'b0ed26b524e0b0606551d78e42b5b7bc'
Key: 'uuid'
	Value: 'b7459e40-500b-488d-a0dc-3e09ebc6b86e'
--------------------[SUCCESS] --------------------
hehe
2018-09-07 21:21:19,801 INFO [Timer-Driven Process Thread-9] o.a.n.processors.standard.LogAttribute LogAttribute[id=b15f3209-344d-10a6-4a7b-454530bb72fc] logging for flow file StandardFlowFileRecord[uuid=25c5d1b1-faa4-418d-911c-5c0cea399b83,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1536378604366-1, container=default, section=1], offset=56, length=4],offset=0,name=33467912436349,size=4]
--------------------[SUCCESS] --------------------
Standard FlowFile Attributes
Key: 'entryDate'
	Value: 'Fri Sep 07 21:21:19 PDT 2018'
Key: 'lineageStartDate'
	Value: 'Fri Sep 07 21:21:19 PDT 2018'
Key: 'fileSize'
	Value: '4'
FlowFile Attribute Map Content
Key: 'filename'
	Value: '33467912436349'
Key: 'path'
	Value: './'
Key: 'test_attribute'
	Value: 'hehe'
Key: 'test_attribute_md5_utf16be'
	Value: 'b0ed26b524e0b0606551d78e42b5b7bc'
Key: 'uuid'
	Value: '25c5d1b1-faa4-418d-911c-5c0cea399b83'
--------------------[SUCCESS] --------------------
hehe

ottobackwards and others added 20 commits August 30, 2018 17:37

NIFI-5147 Add CalculateAttributeHash processor

d23c775

add warning and recommendation link

9199450

per review

1e178b7

- added properties to control behavior when attributes that are configured are partially or completely missing - set charset with a property - added tests

fix assert parameter order

f9ce58f

NIFI-5147 Added HashAlgorithm enum for CalculateHashAlgorithm and Has…

40ac9f1

…hContent. Added unit tests.

NIFI-5147 [WIP] Used HashAlgorithm enum in CalculateHashAttribute.

c54ca63

Cleaned up typos and descriptions. Added unit test demonstrating missing Blake2 algorithm.

NIFI-5147 [WIP] Added logic for Blake2 algorithms (needs refactoring).

40bc74d

NIFI-5147 Blake2b limited to 160, 256, 384, and 512 in Bouncy Castle …

a7ba56e

…implementation. Updated unit test.

NIFI-5147 Finished implementing Blake2 logic.

3a9c5c5

Unit test for all default test vectors passes.

NIFI-5147 Added unit test demonstrating empty values fail.

9c682b8

NIFI-5147 Implemented logic for empty input values.

a928ade

NIFI-5147 Implemented HashService.

67468d8

Added unit tests.

NIFI-5147 Implemented unit tests for test vectors, convenience method…

65b53e8

…s, character encoding, and defaults.

NIFI-5147 Refactored CalculateAttributeHash to use HashService.

957628e

All unit tests pass.

NIFI-5147 Added #clearProperties() to TestRunner, StandardProcessorTe…

b112c21

…stRunner, and MockProcessContext.

NIFI-5147 Added HashAttribute#fromName() method to be more generous o…

65ac947

…n name matching. Added unit test.

NIFI-5147 Handled edge cases.

ac24b78

Added unit tests.

NIFI-5147 Removed old unit tests.

9fb93f2

NIFI-5147 Fixed checkstyle issues.

988d65b

NIFI-5147 Added full algorithm description to AllowableValue.

489f439

alopresto commented Aug 31, 2018

View reviewed changes

This was referenced Aug 31, 2018

NIFI-5147 Calculate hash attribute redux #2836

Closed

NIFI-5566 Improve HashContent processor and standardize HashAttribute processor #2983

Closed

ottobackwards reviewed Sep 4, 2018

View reviewed changes

alopresto closed this Sep 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NIFI-5147 Implement CalculateAttributeHash processor #2980

NIFI-5147 Implement CalculateAttributeHash processor #2980

alopresto commented Aug 31, 2018

alopresto commented Aug 31, 2018

alopresto Aug 31, 2018

ottobackwards left a comment

ottobackwards Sep 4, 2018

alopresto Sep 4, 2018

ottobackwards Sep 4, 2018

alopresto Sep 4, 2018

thenatog commented Sep 7, 2018

thenatog commented Sep 7, 2018

alopresto commented Sep 8, 2018

NIFI-5147 Implement CalculateAttributeHash processor #2980

NIFI-5147 Implement CalculateAttributeHash processor #2980

Conversation

alopresto commented Aug 31, 2018

For all changes:

For code changes:

For documentation related changes:

Note:

alopresto commented Aug 31, 2018

alopresto Aug 31, 2018

Choose a reason for hiding this comment

ottobackwards left a comment

Choose a reason for hiding this comment

ottobackwards Sep 4, 2018

Choose a reason for hiding this comment

alopresto Sep 4, 2018

Choose a reason for hiding this comment

ottobackwards Sep 4, 2018

Choose a reason for hiding this comment

alopresto Sep 4, 2018

Choose a reason for hiding this comment

thenatog commented Sep 7, 2018

thenatog commented Sep 7, 2018

alopresto commented Sep 8, 2018