Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-3163: Flow Fingerprint should include new RPG configurations #1332

Closed
wants to merge 2 commits into from

Conversation

ijokarumawak
Copy link
Member

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically master)?

  • Is your initial contribution a single, squashed commit? No. This PR contains two commits for specific review purpose.

For code changes:

  • Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
  • Have you written or updated unit tests to verify your changes?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

@ijokarumawak
Copy link
Member Author

This PR contains two commits for easier review process (I hope).

1st commit: Added RPG settings to fingerprint

The first commit adds RemoteProcessGroup's (RPG) timeout, yieldPeriod, transportProtocol, proxyHost, proxyPort, proxyUser and proxyPassword to flow fingerprint.
This commit changes NiFi cluster behavior in a scenario below:

  • For example, there are 3 nodes (node1, 2 and 3) in a cluster, and a RPG is used in a flow
  • Node2 gets disconnected, after these steps:

image

  • Case1: The cluster (node1 and node3) remove node2 from the cluster, and update the RPG setting, such as Transport Protocol. Then node2 is restarted.
  • Case2: Node2 starts running as a standalone mode since it is disconnected from the cluster. It's able to modify the flow from node2 UI. Update the RPG setting such as Transport Protocol. Then node2 is reconnected by the cluster.

With above case 1 and 2, the results will be different with this fix.

  • Case1:

    • Before fix: Node2 can join the cluster because flow of Node2 and the one in the cluster are determined as equal by fingerprint. Node2 will update its flow with the one in the cluster. This is problematic when a cluster is entirely restarted. An old flow can be elected and possibly overwrites the latest flow.
    • After fix: Node2 CANNOT join the cluster since it has different flow than the cluster. This ensures each node has the latest flow.
  • Case2:

    • Before fix: Node2 can join the cluster. It keeps working with the old flow! E.g. the latest flow uses HTTP as transport protocol on Node 1 and 3 while Node2 uses RAW protocol.
    • After fix: Node2 CANNOT join the cluster. This ensures each node has the latest flow when it gets reconnected.

For both cases, expected recover operation would be: DataFlowManager removes the flow.xml.gz on Node2 then restart it to join the cluster.

Tested above scenarios before and after applying this fix. Confirmed it works as written above.

2nd commit: Removed unused private methods from FingerprintFactory

FingerprintFactory has two types of fingerprinting method, from XML
elements and from DTO. However, the ones from DTO are not used by
anywhere. IDE didn't report those private methods unused because
addProcessGroupFingerprint and addSnippetFingerprint call each other,
but those are not used from outside actually.

This commit removes those private methods to keep the class clean to
avoid unnecessary code maintenance and tests.

@ijokarumawak ijokarumawak changed the title Nifi 3163 NIFI-3163: Flow Fingerprint should include new RPG configurations Dec 15, 2016
@trixpan
Copy link
Contributor

trixpan commented Feb 19, 2017

@ijokarumawak - This PR needs rebasing

@ijokarumawak
Copy link
Member Author

@trixpan Rebased with the latest master, confirmed contrib check passed locally. Thanks!

- Added timeout, yieldPeriod, transportProtocol, proxyHost, proxyPort,
  proxyUser and proxyPassword
FingerprintFactory has two types of fingerprinting method, from XML
elements and from DTO. However, the ones from DTO are not used by
anywhere. IDE didn't report those private methods unused because
addProcessGroupFingerprint and addSnippetFingerprint call each other,
but those are not used from outside actuallly.

This commit removes those private methods to keep the class clean to
avoid unnecessary code maintenance and tests.
@ijokarumawak
Copy link
Member Author

ijokarumawak commented Mar 7, 2017

@markap14 Would you be able to review this PR? It's there for a while and I had to update this PR couple of times to resolve conflicts with the latest master. This PR is required to move NIFI-1202 (#1306) forward which adds batch related configurations at RPG Port for better control over load distribution. Thanks a lot in advance!

@alopresto
Copy link
Contributor

I have a question about this process (may be beyond the scope of this PR though) -- why not allow the DFM to remove the flow.xml.gz from the node with outdated flow via the REST API and UI from any connected node? The issue is not limited or blocked connectivity between the nodes, so there could be logic on the node itself to remove its flow definition when given a properly-authorized command from a remote node that it knows is connected to the cluster. The authentication should not be an issue as the DFM permissions should be the same across the cluster.

I at first considered allowing the node to perform this logic on its own by archiving its current flow and then deleting it to attempt to rejoin the cluster, but I can understand how admins would be uncomfortable that the level of autonomous activity could lead to data loss.

Thoughts?

@ijokarumawak
Copy link
Member Author

@alopresto It'd be convenient if a node can fix stale flow.xml.gz automatically and join a cluster by removing its flow.xml.gz and fetch the latest one from primary node. I think we need to add a nifi.property to enable that logic (disable by default). As you wrote, that is beyond the scope of this PR. It may be worth for another JIRA to keep discussion going on.

@pvillard31
Copy link
Contributor

@alopresto @ijokarumawak I'm a +1 on this idea, it'd make a lot of sense to me.

@asfgit asfgit closed this in 27afd2e Mar 8, 2017
@markap14
Copy link
Contributor

markap14 commented Mar 8, 2017

@ijokarumawak all looks good to me. Code looks fine, and i verified behavior. Thanks for addressing this. A good catch. +1 merged to master.

aperepel pushed a commit to aperepel/nifi that referenced this pull request Mar 29, 2017
- Added timeout, yieldPeriod, transportProtocol, proxyHost, proxyPort,
  proxyUser and proxyPassword

- Removed unused fingerprint methods

FingerprintFactory has two types of fingerprinting method, from XML
elements and from DTO. However, the ones from DTO are not used by
anywhere. IDE didn't report those private methods unused because
addProcessGroupFingerprint and addSnippetFingerprint call each other,
but those are not used from outside actuallly.

This commit removes those private methods to keep the class clean to
avoid unnecessary code maintenance and tests.

This closes apache#1332.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants