Skip to content

NIFI-6970 add DistributeRecord processor#3984

Closed
IlyaKovalev wants to merge 24 commits intoapache:masterfrom
IlyaKovalev:NIFI-6970
Closed

NIFI-6970 add DistributeRecord processor#3984
IlyaKovalev wants to merge 24 commits intoapache:masterfrom
IlyaKovalev:NIFI-6970

Conversation

@IlyaKovalev
Copy link
Contributor

Thank you for submitting a contribution to Apache NiFi.

Please provide a short description of the PR here:

add DistributeRecord processor for distribute data over user specified relationships by distribution key/keys.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically master)?

  • Is your initial contribution a single, squashed commit? Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not squash or use --force when pushing to allow for clean monitoring of changes.

For code changes:

  • Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
  • Have you written or updated unit tests to verify your changes?
  • Have you verified that the full build is successful on both JDK 8 and JDK 11?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
  • If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

@IlyaKovalev IlyaKovalev changed the title add DistributeRecord processor NIFI-6970 add DistributeRecord processor Jan 14, 2020
@shayneburgess
Copy link
Contributor

@IlyaKovalev I had a couple of quick questions looking at this:

  1. Are you going to add this to the manifest for the standard processor NAR?
  2. Distributing based on a field is really interesting and useful but the DistributeLoad also has the ability to round robin distribute. Did you think about doing that here? Potentially make the Keys field optional and do round-robin if not specified. I'm thinking of a case where I don't want to distribute based on the actual data.

@IlyaKovalev
Copy link
Contributor Author

1 Fixed, also remove ORIGINAL relationship (it's excessively i think)
2 Yes, i really thought about it. In case merge logic of DistributeRecord with DistributeLoad we should be add reader, writer, keys, hash_function, strategy fields but logic for understanding how processor will process input is growing i suggest it's too ambigious.
So look at this like:
DistributeLoad - works on flowfile level without processing content i.e distribute flowfiles.
DistributeRecord - works on content level i.e distribute content
(So maybe we need rename DistributeLoad to DistributeFlowFile just because DistributeLoad is
very general definition)
I think it's much easier for understanding processor logic(how it would works) and much simplest for user to appreciate init process.

@joewitt
Copy link
Contributor

joewitt commented Feb 23, 2020

Hello. This is an interesting PR/good idea. I wonder though if this should just be a partitioning strategy in PartitionRecord. If we keep it separate like this we might want to go with a more specific name than DistributeRecord as we might want different distribution options and this one seems to be pretty specific to weighted distribution using one or more hashing functions. Maybe then the name should be 'WeightedRecordDistribution'

In any case how about squashing the commits and rebasing to master and pushing the forced PR. This will let the new CI processes run against the PR.

@IlyaKovalev IlyaKovalev force-pushed the NIFI-6970 branch 2 times, most recently from eb7d5e9 to b9a583f Compare February 24, 2020 15:44
@IlyaKovalev
Copy link
Contributor Author

Hello.
1 Done
2 Yes, i agree, we can implement this logic with PartitionRecord only if RecordPath will have hash functions and operations for number processing like mod. So PartitionRecord + RouteOnAttribute will have the same effect as this processor. (Implementation of weights can be awkward but it is definitely possible)
3 Renamed processor with DistributeHashRecord. I think the key feature here is distribution over hashed key.
and windows build failed ... hmm...

NIFI-6970 add DistributeRecord to the meta-inf, remove ORIGINAL relationship

NIFI-6970 fix bug

NIFI-6970 rename processor

NIFI-6970 add 'replacement value' property and add tests
r65535 and others added 9 commits May 29, 2020 10:28
Signed-off-by: Pierre Villard <pierre.villard.fr@gmail.com>

This closes apache#4302.
… docs

Signed-off-by: Pierre Villard <pierre.villard.fr@gmail.com>

This closes apache#4293.
…scription

Signed-off-by: Pierre Villard <pierre.villard.fr@gmail.com>

This closes apache#4294.
Signed-off-by: Pierre Villard <pierre.villard.fr@gmail.com>

This closes apache#4292.
This closes apache#3734

Signed-off-by: Mike Thomsen <mthomsen@apache.org>
…e call the onFailed or onCompleted function. If the result is failed, return true and do sth

NIFI-7403:Add an extension point to adjust the result, if the result is failed then process onFailed function

NIFI-7403:Implement the AdjustFailed Function, if PutSQL set the SUPPORT_TRANSACTIONS true, then check whether the result contains REL_RETRY or REL_FAILURE.If it contains that, reroute the result and return true.

NIFI-7403: fix reroute logic in AdjustFailed function

NIFI-7403:Add and modify some unit test for PutSQL's SUPPORT_TRANSACTIONS property

NIFI-7403:Update for PR recheck

NIFI-7403:Add documentation on the Support Fragmented Transactions property to indicate the transactions rollback behavior

NIFI-7403: Fix Checkstyle issue

Signed-off-by: Matthew Burgess <mattyb149@apache.org>

This closes apache#4266
.displayName("Hash Function")
.required(true)
.description("Hash algorithm for keys hashing")
.allowableValues(MURMURHASH_32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what this hash function is, and why there is a configurable property for it if there is only one allowable value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is name of hash function for distribution. I suggest there will be several hash functions in future (MD5 for example). After keys hashing processor will decide where record should route (by division without remainder)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add the other hashing algorithms now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Done!

apuntandoanulo and others added 9 commits June 1, 2020 14:08
…the flowfile

NIFI-7477 Improving description and unit test now verifies attribute content

NIFI-7477: Fixed checkstyle errors

Signed-off-by: Matthew Burgess <mattyb149@apache.org>

This closes apache#4301
…t object is defined as an interface, proxy that interface. This way, any method call into the object will also change the classloader to the appropriate classloader.
This closes apache#4303.

Signed-off-by: Mark Payne <markap14@hotmail.com>
…d missing precision when reading in records

Signed-off-by: Mark Payne <markap14@hotmail.com>
…ss tokens when supplied with appropriate credentials.

Added skeleton of oauth2 provider.
Added copy of our code.
Refactored a few things.
Updated apis to better match flow descriptions.
Updated poms and other artifacts.
Updated copyright notice.
Updated LICENSE.

This closes apache#4173

Signed-off-by: Jeremy Dyer <jeremydyer@apache.org>
…ocessGroup level

Added FlowFileOutboundPolicy to ProcessGroups and updated LocalPort to make use of it
Persisted FlowFile Concurrency and FlowFile Output Policy to flow.xml.gz and included in flow fingerprint
Added configuration for FlowFile concurrency and outbound policy to UI for configuration of Process Groups
Added system tests. Fixed a couple of bugs that were found
Fixed a couple of typos in the RecordPath guide

Signed-off-by: Pierre Villard <pierre.villard.fr@gmail.com>

This closes apache#4306.
…iceInvocationHandler

and fix checkstyle violation on NiFiSystemIT
Removed list structure for peer selection as it was unnecessary and often wasteful (most clusters are 3 - 7 nodes, the list was always 128 elements).
Changed integer percentages to double to allow for better normalization.
Removed 80% cap on remote peers as it was due to legacy requirements.
Added unit tests for non-deterministic distribution calculations.
Added unit tests for edge cases due to rounding errors, single valid remotes, unbalanced clusters, and peer queue consecutive selection tracking.
Migrated all legacy PeerSelector unit tests to new API.
Removed unused System time manipulation as tests no longer need it.
Added class-level Javadoc to PeerSelector.
Removed S2S details request replication, as the responses were not being merged, which led to incorrect ports being returned and breaking S2S peer retrieval.
Fixed copy/paste error where input ports were being listed as output ports during remote flow refresh.
Fixed comments and added unbalanced cluster test scenarios.
Removed unnecessary marker interface.
Removed commented code.
Changed weighting & penalization behavior.
Changed dependency scope to test.

This closes apache#4289.

Signed-off-by: Mark Payne <markap14@hotmail.com>
NIFI-6970 add DistributeRecord to the meta-inf, remove ORIGINAL relationship

NIFI-6970 fix bug

NIFI-6970 rename processor

NIFI-6970 add 'replacement value' property and add tests
@IlyaKovalev IlyaKovalev closed this Jun 9, 2020
@IlyaKovalev IlyaKovalev deleted the NIFI-6970 branch June 9, 2020 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.