Dynamic config2 #16

igor47 · 2013-10-18T11:06:10Z

same as #10 but rebased on current master

@pcarrier @brndnmtthws

igor47 · 2013-10-18T11:09:23Z

warning: DO NOT MERGE THIS!

brndnmtthws · 2013-10-22T00:29:52Z

I added a new thing recently:

when an ephemeral service client disconnects, the service is cleaned up

igor47 · 2013-10-22T17:49:27Z

sorry for the delay; in general, i'm not opposed to the concept here. but i'm not a fan of the implementation. i would like to cleaner separation of concerns, with a cleaner interface for dynamically adding and removing services in the service watcher and a better-documented and separated out server. i'm also not sure if eventmachine is the right approach. i would not merge this as-is.

also, i think @pcarrier has additional concerns about the concept as a whole. pierre, can you elaborate?

brndnmtthws · 2013-10-22T17:53:42Z

I'm not overly attached to this implementation--it's just a minimum viable product. The intention is to be able to make progress on other things while you guys decide what the overall direction is for dynamic service discovery. We need functionality like this, but it doesn't necessarily need to be done this way.

That said, we've been using this for a few weeks now without any trouble.

igor47 · 2013-10-30T16:38:09Z

we talked in person about how to move forward on this. @brndnmtthws is going to (a) split out the server code (b) improve the interface for setting up additional watchers internally and (c) think about improving the TCP protocol for service registration to be more resilient to currently unknown effects.

this should be within the next week.

brndnmtthws · 2013-11-06T03:59:16Z

I'm made progress on this, but I'm stuck on an issue with JRuby right now. I can't seem to get the eventmachine jar to load correctly without forcing it to use the ruby version of eventmachine rather than the java one.

brndnmtthws · 2013-11-18T19:25:48Z

Bump. Added some documentation to the readme.

brndnmtthws · 2013-12-02T18:19:35Z

@igor47 @pcarrier

Bump!

Also, this is a bit of a duplicate of #17 since it resolves some JRuby issues.

Makes it nice to kill.

Scenario: Nerve terminates prematurely (maybe because of a disconnection for a ZK server). It gets restarted by runit. It then does its health checks, finds that the service is up. It tries to create the node, fails because the node already exists, so it sets its value. The session for the previous instance of nerve expires, and the node being ephemeral and attached to that session, disappears. Nerve doesn't do anything about that unless the service goes down then back up, at which point the node gets recreated.

Expose a TCP interface for changing the nerve configuration at runtime.

brndnmtthws · 2014-03-07T20:57:28Z

Updated and rebased. Might be completely broken now, however.

Can we get this merged please? Thanks.

brndnmtthws · 2014-03-11T17:02:48Z

Still need this, fyi.

brndnmtthws · 2014-03-12T18:08:37Z

Yep, still important.

brndnmtthws · 2014-03-13T17:49:25Z

@airbnb/sre

Yep, still need this.

The original PR (#10 (comment)) was opened 5 months ago. What's the problem?

igor47 · 2014-03-13T18:16:59Z

what does "might be completely broken now" mean?

brndnmtthws · 2014-03-13T18:24:38Z

You changed a bunch of things and ignored this PR, so for all I know this branch is broken now. I'll try and test it today, but it's rather frustrating.

brndnmtthws · 2014-03-13T18:54:47Z

I've fixed the breakage, everything works again.

pcarrier · 2014-03-13T19:35:39Z

I haven't ignored this, I've suggested running multiple nerve instances to achieve the same effect.

You've ignored my recommendation.

brndnmtthws · 2014-03-13T19:50:00Z

Your suggestion was never ignored. We discussed it, and our team decided
this was the best path.
On Mar 13, 2014 12:35 PM, "Pierre Carrier" notifications@github.com wrote:

I haven't ignored this, I've suggested running multiple nerve instances to
achieve the same effect.

You've ignored my recommendation.

Reply to this email directly or view it on GitHubhttps://github.com//pull/16#issuecomment-37577032
.

nelgau · 2014-03-27T14:13:35Z

I've observed that this version of Nerve fails to announce services and logs that a service is "up" when no mention of that host can be found in Zookeeper.

Consider host i-72a9b25c:

Kafka is running on port 31124.
The last logs from Nerve (nerve-0.5.3-test.jar; built from this branch) were:

I, [2014-03-26T21:09:46.253000 #26106]  INFO -- Nerve::ServiceCheck::TcpServiceCheck: nerve: service check kafka-h1 tcp-xx.xx.xxx.xxx:31124 initial check returned true
I, [2014-03-26T21:09:46.268000 #26106]  INFO -- Nerve::ServiceWatcher: nerve: service kafka-h1 is now up

However, Zookeeper has no i-72a9b25c registered. Restarting Nerve corrected the issue.
A second instance had the same pathology and it was also corrected by a restart.

I received a stack trace from @brndnmtthws on Tuesday that might be of some help to track down the cause of the issue: https://gist.github.com/nelgau/aad34287936abce3ca08

When our ZK node is deleted or otherwise changes, reset it to ensure we don't drop our node.

brndnmtthws · 2014-03-28T23:09:53Z

I tracked the problem @nelgau mentioned above in to a bug in how nerve handles ZK nodes. I've pushed a patch here:

4d5690a

With ZooKeeper, in the event of a session timeout (or other interruption) you need to recreate ephemeral nodes. It now correctly recreates the nodes when they are lost.

jolynch · 2015-09-14T00:35:40Z

I'm going to close this because I don't think that dynamic nerve/synapse config reloads should be the goal going forward. We can reconsider if our current hitless system turns out not to scale (very possible given the design), in which case we'll work on it then.

Basically, you can always restart nerve without dropping traffic by starting two nerves and then cycling between them. I actually think the system is easier to understand this way because the config is the source of truth for what nerve/synapse are doing.

ghost assigned brndnmtthws Oct 18, 2013

brndnmtthws mentioned this pull request Oct 18, 2013

Added dynamic configuration endpoint. #10

Closed

Pierre Carrier and others added 6 commits March 7, 2014 12:44

WIP jruby support

7f3f04e

testbed

a227805

testbed/tcp_server.rb: run in a separate thread

7190d18

Makes it nice to kill.

Added dynamic configuration endpoint.

c97af1f

Expose a TCP interface for changing the nerve configuration at runtime.

Minor fix on shutdown.

1e546e5

Fixed bad rebase breakage.

f5ce80d

Removed extra gemspec.

6c4c8e6

Added additional level of protocol indirection.

f80419f

More robust handling of ZK reporters.

4d5690a

When our ZK node is deleted or otherwise changes, reset it to ensure we don't drop our node.

brndnmtthws added 4 commits March 28, 2014 19:27

[cleanup] Removed unused/unneeded functions.

bf7ff28

Handle ZK session expiry.

3afca70

Even more better ZK handling.

c555a09

Don't revive zombie ZK sessions.

50ded4c

ubiquitousthey mentioned this pull request Nov 7, 2014

Dynamic Configuration #62

Closed

igor47 unassigned brndnmtthws Jan 17, 2015

jolynch closed this Sep 14, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic config2 #16

Dynamic config2 #16

igor47 commented Oct 18, 2013

igor47 commented Oct 18, 2013

brndnmtthws commented Oct 22, 2013

igor47 commented Oct 22, 2013

brndnmtthws commented Oct 22, 2013

igor47 commented Oct 30, 2013

brndnmtthws commented Nov 6, 2013

brndnmtthws commented Nov 18, 2013

brndnmtthws commented Dec 2, 2013

brndnmtthws commented Mar 7, 2014

brndnmtthws commented Mar 11, 2014

brndnmtthws commented Mar 12, 2014

brndnmtthws commented Mar 13, 2014

igor47 commented Mar 13, 2014

brndnmtthws commented Mar 13, 2014

brndnmtthws commented Mar 13, 2014

pcarrier commented Mar 13, 2014

brndnmtthws commented Mar 13, 2014

nelgau commented Mar 27, 2014

brndnmtthws commented Mar 28, 2014

jolynch commented Sep 14, 2015

Dynamic config2 #16

Dynamic config2 #16

Conversation

igor47 commented Oct 18, 2013

igor47 commented Oct 18, 2013

brndnmtthws commented Oct 22, 2013

igor47 commented Oct 22, 2013

brndnmtthws commented Oct 22, 2013

igor47 commented Oct 30, 2013

brndnmtthws commented Nov 6, 2013

brndnmtthws commented Nov 18, 2013

brndnmtthws commented Dec 2, 2013

brndnmtthws commented Mar 7, 2014

brndnmtthws commented Mar 11, 2014

brndnmtthws commented Mar 12, 2014

brndnmtthws commented Mar 13, 2014

igor47 commented Mar 13, 2014

brndnmtthws commented Mar 13, 2014

brndnmtthws commented Mar 13, 2014

pcarrier commented Mar 13, 2014

brndnmtthws commented Mar 13, 2014

nelgau commented Mar 27, 2014

brndnmtthws commented Mar 28, 2014

jolynch commented Sep 14, 2015