Skip to content

NIFI-6197 HBase Client Service uses a bad default for retries#3418

Closed
lfrancke wants to merge 33 commits intoapache:masterfrom
lfrancke:NIFI-6197
Closed

NIFI-6197 HBase Client Service uses a bad default for retries#3418
lfrancke wants to merge 33 commits intoapache:masterfrom
lfrancke:NIFI-6197

Conversation

@lfrancke
Copy link
Member

@lfrancke lfrancke commented Apr 8, 2019

This changes the default to 15 as it is as of HBASE-19660.

I ran contrib-check successfully.

lfrancke and others added 3 commits April 8, 2019 23:17
…alm' exceptions

NIFI-5953 Manage GetTwitter connection retries on '420 Enhance Your Calm' exceptions

Update "Max Client Error Retries" parameter name.
reintriduce client.reconnect() on HTTP_ERROR 420

This closes apache#3276.

Signed-off-by: Koji Kawamura <ijokarumawak@apache.org>
This closes apache#3419.

Signed-off-by: Koji Kawamura <ijokarumawak@apache.org>
@bbende
Copy link
Contributor

bbende commented Apr 9, 2019

@lfrancke my only concern is that the most common time that this retry happens is when trying to enable to the service with bad configuration, and then the service is stuck there retrying 15 times. I purposely made this 1 retry because of that, although I admit I was not aware that it would impact other operations, and at the time I think the default HBase retry was 30 times so it was quite ridiculous.

markap14 added 3 commits April 9, 2019 10:19
…outgoing connections

This closes apache#3417.

Signed-off-by: Bryan Bende <bbende@apache.org>
…iable's value can be changed. Previously, the logic had a bug that resulted in a failure if any child group contained a running processor that references an overridden variable and the user attempts to change the overridden variable at the higher level. We should not include any components of descendent groups if the descendent group overrides the variable.

This closes apache#3420.

Signed-off-by: Bryan Bende <bbende@apache.org>
…r async tasks to complete before making assertions

This closes apache#3421.

Signed-off-by: Bryan Bende <bbende@apache.org>
@lfrancke
Copy link
Member Author

lfrancke commented Apr 9, 2019

Thanks for the background information. I understand and yes that's unfortunate but I think that's a pill we should swallow.

The way it is now normal things like when HBase moves a Region will be a fatal error. That's not good. The default used to be 10 and was upped to 15 recently.

I've actually tested today and I believe this to be even worse: The default is set to 1 so it overrides whatever is set in an hbase-site.xml. That's definitely not good. Ideally I think we should just remove the default to be honest. It should fall back to the hardcoded default in HBase anyway. So I don't really see a reason to provide this option at all.

@bbende
Copy link
Contributor

bbende commented Apr 9, 2019

The reason for the option is that you can use the service without an hbase-site.xml by specifying the ZK Quorum, ZK Port, and ZK ZNode directly in the service. Although this is probably an uncommon use-case we can't assume no one is using it like this, so we have to keep the existing properties.

Also, we don't necessarily have to rely on the retry within the client, anyone can also built retry logic into their flow to route failure relationships from hbase related processors through a retry loop. I'm not totally disagreeing about increasing the default from 1, but just pointing at that I don't think users are in a terrible place as is.

@lfrancke
Copy link
Member Author

lfrancke commented Apr 9, 2019

Actually: I'm using it like that (i.e. without an hbase-site.xml) so I do appreciate that those options are there. But I believe that the retry option is not needed to connect to HBase. It feels very much like the odd one out from all those options.

I'll try to run a test later today against HBase without an explicitly set retry.

I'm in favor of either (ordered by priority):

  1. removing the retry config option altogether or
  2. remove the default (and the validation that it needs to be set) to rely on the HBase default but people can override it here or
  3. change the default from 1 to 15 but make it so that hbase-site.xml overrides the setting (this is what I thought would happen from my interpretation of the documentation string)
  4. change the default from 1 to 15

@lfrancke
Copy link
Member Author

lfrancke commented Apr 9, 2019

I verified that not setting client retries defaults to 15 in HBase 2.0. So there's no reason to force users to manually choose a value and the default isn't great.

I'll change this pull request to implement option two from above. I believe options 3 and 4 don't make much sense considering that we can rely on the sane defaults that HBase itself provides.

@bbende
Copy link
Contributor

bbende commented Apr 9, 2019

I personally like #3 so that a NiFi user without a site xml can easily change the value. I still don't fully understand why we need to retry something 15 times, if it hasn't worked after 3-4 times do we really expect it to start working?

@lfrancke
Copy link
Member Author

lfrancke commented Apr 9, 2019

Option #2 will allow that as well!
It'll just remove the hardcoded default in NiFi (deferring to what HBase deems a reasonable default) but it will allow everyone to override the default (and also a potential value from hbase-site.xml).

And yes, we do :)
One example: A RegionServer crashes with lots of WALs. Those need to be processed by the new RegionServers taking over the duties of the old one. For that they need to split/read them etc.
This can take a significant amount of time but is still "expected".

I'd like to widen the scope of this PR & Jira to include fixing deprecations in the HBase 2 API as well as fixing the other properties (i.e. ZooKeeper parent node is not required, neither is ZooKeeper client port, both come with sane defaults). I'm running contrib-check now and will update then.

lfrancke and others added 17 commits April 9, 2019 23:09
This closes apache#3405.

Signed-off-by: Koji Kawamura <ijokarumawak@apache.org>
This closes apache#3397

Signed-off-by: Mike Thomsen <mikerthomsen@gmail.com>
…ny server requests when holding down CTRL-R

* Added lodash utility library to leverage its throttle capability (and many other useful functions in the future).
* Made lodash available in all JSP's so it could be imported into nf-common (or any component for that matter).
* Added a throttle function to nf-common that just wraps _.throttle

This closes apache#3393
…gurator

Signed-off-by: Matthew Burgess <mattyb149@apache.org>

This closes apache#3406
Adds resetting the batch size to fix broken batch processing

Removes empty line

Signed-off-by: Matthew Burgess <mattyb149@apache.org>

This closes apache#3337
This closes apache#3409.

Signed-off-by: Koji Kawamura <ijokarumawak@apache.org>
… always return a value even when it should return Optional.empty().

NIFI-6172 Fixed broken integration test.

Signed-off-by: Matthew Burgess <mattyb149@apache.org>

This closes apache#3399
… to prevent premature results retrieval

This closes apache#3408.
Signed-off-by: Brandon Devries <devriesb@apache.org>
Signed-off-by: Pierre Villard <pierre.villard.fr@gmail.com>

This closes apache#3407.
…RecordLookupProcessor

Signed-off-by: Pierre Villard <pierre.villard.fr@gmail.com>

This closes apache#3400.
…s" and "Occurrence offset" configurations

Fixed indentation errors to pass checkstyle-checks

Added Evaluation Modes as per discussion in PR thread

Adding exclusions of test files on rat plugin

Added new property 'Line-by-Line Evaluation Mode' and refactored common code

This closes apache#3375.

Signed-off-by: Koji Kawamura <ijokarumawak@apache.org>
Refactored to use functions to better handle strategy specific variables
via closure.
Thread.currentThread().getContextClassLoader().getResourceAsStream(...) works in UnitTest, but not if the NAR is deployed in /extensions folder. If you want to use the processer the resource 'file.txt' is not found.
I changed this to 'getClass().getClassLoader()...' and its working as UnitTest and when deployed

Signed-off-by: Matthew Burgess <mattyb149@apache.org>

This closes apache#3381
…from nifi-livy-processors causing race condition, added assertion to catch failures

Signed-off-by: Joe Witt <joewitt@apache.org>
NIFI-6136 - return the appropriate value (false) when verifyDisconnectedCluster is called from a non-clustered environment.

This closes apache#3390
…alm' exceptions

NIFI-5953 Manage GetTwitter connection retries on '420 Enhance Your Calm' exceptions

Update "Max Client Error Retries" parameter name.
reintriduce client.reconnect() on HTTP_ERROR 420

This closes apache#3276.

Signed-off-by: Koji Kawamura <ijokarumawak@apache.org>
lfrancke and others added 6 commits April 9, 2019 23:09
This closes apache#3419.

Signed-off-by: Koji Kawamura <ijokarumawak@apache.org>
…outgoing connections

This closes apache#3417.

Signed-off-by: Bryan Bende <bbende@apache.org>
…iable's value can be changed. Previously, the logic had a bug that resulted in a failure if any child group contained a running processor that references an overridden variable and the user attempts to change the overridden variable at the higher level. We should not include any components of descendent groups if the descendent group overrides the variable.

This closes apache#3420.

Signed-off-by: Bryan Bende <bbende@apache.org>
…r async tasks to complete before making assertions

This closes apache#3421.

Signed-off-by: Bryan Bende <bbende@apache.org>
@lfrancke
Copy link
Member Author

lfrancke commented Apr 9, 2019

Uhm... I believe I broke my Pull Request somehow....trying to repair or I'll create a new one.

@lfrancke lfrancke closed this Apr 9, 2019
@lfrancke
Copy link
Member Author

lfrancke commented Apr 9, 2019

I'll have to open a new Pull Request. I'll reference this one. I have no idea what I did to break this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.