Curator 110 by cammckenzie · Pull Request #9 · apache/curator

cammckenzie · 2014-06-02T23:03:13Z

'RECONNECTED' events in the same way. Added a test case for the leader latch being started before a connection to ZK has been established.

Randgalt · 2014-06-03T23:15:54Z

curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java

Hmm - this suggests that we should have an isLost() method in the ConnectionState no?

that can be overridden by each enum instance.

…nnected' method. This allows a race condition between the start() method and the ConnectionStateListener in the LeaderLatch recipe to be avoided. Updated the leader latch start() method to block until a connection is available (in a background thread), before it attempts to setup its state. This means that it will work correctly if started before a connection to ZK is available.

Randgalt · 2014-06-12T12:57:02Z

curator-framework/src/main/java/org/apache/curator/framework/CuratorFramework.java

Why are these needed? They already exist in the CuratorZookeeperClient.

The blockUntilConnectedOrTimedOut() method doesn't do what is needed in this case, because there's no way of specifying a timeout. We could call it in a loop I guess, but would need a second thread to interrupt it when our specified time out occurred, which is pretty ugly. Alternatively, we could modify the CuratorZookeeperClient to expose a blockUntilConnected() method that takes a time out? And have blockUntilConnectedOrTimedOut call this with the connection time out? Preferences?

Technically, there is a timeout. It's the connectionTimeout used when creating the CuratorFramework instance. Is that not sufficient? If it's still needed, it seems better to push it down to CuratorZookeeperClient with the other method.

I just thought it's nicer to give control to the caller as to how long they want to wait for rather than being bound to the connection timeout. So, you would prefer if I exposed a blockUntilConnected() method on the CuratorZookeeperClient that takes a timeout, and then refactor the current blockUntilConnectedOrTimedOut to call that with the connection timeout? The only issue with this is that the current implementation of blockUntilConnectedOrTimedOut is implemented with 1 second sleeps, so the best granularity as far as timeouts you're going to get is a full second. This was partly why I implemented it via a connection state listener in the CuratorFramework. It just seemed cleaner than sitting in a loop sleeping for a second and then checking the state. I don't think that the connection state listeners are available from the CuratorZookeeperClient though.

Randgalt · 2014-06-13T00:23:17Z

Man - it's getting complicated huh? I guess there's no other way. I wonder if there's some simplification that can be done.

cammckenzie · 2014-06-13T00:38:17Z

For what I originally thought was going to be a 1 line fix, yes!

It seems to me that the wait logic is cleaner to implement at the CuratorFramework level, rather than at the CuratorZooKeeperClient, just because you've got a nice ConnectionStateListener framework to deal with. The internalBlockUntilConnectedOrTimedOut() method is a bit ugly in that it has to block in 1 second increments. Is there a reason that this can't just wait until for the entire session timeout in one go?

As an aside, it's a bit inefficient too as it allocates and destroys a CountDownLatch, and Watcher each iteration of the loop.

Randgalt · 2014-06-13T00:41:08Z

OK - I see your point. I forget why the CuratorZookeeperClient does a spin loop like that. That's some of the oldest code in the lib.

cammckenzie · 2014-06-13T01:09:12Z

Ok, so what's the way forward? We can either refactor internalBlockUntilConnectedOrTimedOut() in CuratorZooKeeperClient to allow arbitrary timeouts, and try to remove the 1 second sleep increments. Or, we can move the wait logic to the CuratorFramework, and use the ConnectionStateListener.

Assuming there's not technical reason why the CuratorZooKeeperClient can't block for arbitrary lengths of time, then it's ok to implement there. I don't really have a strong preference either way.

Randgalt · 2014-06-13T18:03:12Z

I'd appreciate your opinion. I'm OK with either.

cammckenzie · 2014-06-14T05:00:01Z

Ok, I've implemented it in the CuratorZookeeperClient. Doesn't seem to be any issues with using an arbitrary sleep length. Still need to do a bit of testing. Will get something sorted early next week hopefully.

cammckenzie · 2014-06-16T03:49:25Z

Scratch that, I think that the reason that the CuratorZookeeperClient was doing a spin loop was because there's a race condition with the watchers. It's possible for the local watcher to get a 'connected' event before the ConnectionState gets its 'connected' event. This means that when you call into ConnectionState.isConnected() it returns false, even though we know it's actually true.

So, while we could return the boolean based on what we know the state to be, it's going to be inconsistent for a short period with the ConnectionState, and this has potential for knock on consequences, even though the window of inconsistency is short.

So, I think it's actually better to move this wait logic into the CuratorFramework and use the ConnectionStateListener.

the ExecuteAfterConnectionEstablished utility class. Cleaned up the blockUntilConnected() logic in the CuratorFrameworkImpl

only notify blocked clients when a connected state is reached, rather than any state.

cammckenzie · 2014-06-16T07:14:06Z

Latest commit has everything implemented at the CuratorFramework level. Have a look and see what you think. I still seem to have random completely unrelated tests failing occasionally which is a bit disconcerting. They work fine when I rerun them though. I'm not sure if these are due to race conditions or flakiness of the TestingServer.

Randgalt · 2014-06-16T17:23:17Z

The test are flakey because many of them have to wait "for a bit" for things to settle (session to fail, ephemeral to delete, etc.). But, the ZK server has essentially random behavior in terms of when it reconnects, connects, etc. If the VM is gc'ing or your system is slow, etc. it gets worse.

I keep trying to tune things but it's still not perfect.

Randgalt · 2014-06-16T19:00:53Z

Interestingly, now that we have ExecuteAfterConnectionEstablished, we no longer need ConnectionState.isConnected() nor do we need the change to LeaderLatch.handleStateChange().

If you don't mind, I'll remove those changes.

cammckenzie · 2014-06-16T22:03:35Z

The ConnectionState.isConnected() is being used is in the CuratorFrameworkImpl when we're blocking for a connection. I think that leaving it in is probably not a bad idea, as it's still got potential to be useful in other places too.

In regards to the tests, I wonder if there's a way of structuring them to be more event driven rather than just waiting some arbitrary amount of time and hope an event occurs within that window? I'll have a look and see if I can find anything, but that's for another JIRA ticket.

Cameron McKenzie added 2 commits June 2, 2014 16:30

CURATOR-110 - Modified state handling to treat 'CONNECTED' and

1a63a10

'RECONNECTED' events in the same way. Added a test case for the leader latch being started before a connection to ZK has been established.

CURATOR-110 - Fixed up formatting to Curator standards

0b7ae7e

Randgalt reviewed Jun 3, 2014
View reviewed changes

Cameron McKenzie and others added 2 commits June 4, 2014 10:24

CURATOR-110 - Modified the enum to have an abstract isConnected() method

2c376b9

that can be overridden by each enum instance.

Randgalt reviewed Jun 12, 2014
View reviewed changes

cammckenzie added 2 commits June 16, 2014 15:59

CURATOR-110 - Moved the 'wait until connection established' logic into

e8138ed

the ExecuteAfterConnectionEstablished utility class. Cleaned up the blockUntilConnected() logic in the CuratorFrameworkImpl

CURATOR-110 - Fixed up the connection notifications so that they will

59bab73

only notify blocked clients when a connected state is reached, rather than any state.

asfgit merged commit 59bab73 into apache:master Jun 17, 2014

cammckenzie deleted the CURATOR-110 branch June 17, 2014 23:10

Conversation

cammckenzie commented Jun 2, 2014

Uh oh!

Randgalt Jun 3, 2014

Choose a reason for hiding this comment

Uh oh!

Randgalt Jun 12, 2014

Choose a reason for hiding this comment

Uh oh!

cammckenzie Jun 12, 2014

Choose a reason for hiding this comment

Uh oh!

Randgalt Jun 13, 2014

Choose a reason for hiding this comment

Uh oh!

cammckenzie Jun 13, 2014

Choose a reason for hiding this comment

Uh oh!

Randgalt commented Jun 13, 2014

Uh oh!

cammckenzie commented Jun 13, 2014

Uh oh!

Randgalt commented Jun 13, 2014

Uh oh!

cammckenzie commented Jun 13, 2014

Uh oh!

Randgalt commented Jun 13, 2014

Uh oh!

cammckenzie commented Jun 14, 2014

Uh oh!

cammckenzie commented Jun 16, 2014

Uh oh!

cammckenzie commented Jun 16, 2014

Uh oh!

Randgalt commented Jun 16, 2014

Uh oh!

Randgalt commented Jun 16, 2014

Uh oh!

cammckenzie commented Jun 16, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants