-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic config2 #16
Dynamic config2 #16
Conversation
warning: DO NOT MERGE THIS! |
I added a new thing recently:
|
sorry for the delay; in general, i'm not opposed to the concept here. but i'm not a fan of the implementation. i would like to cleaner separation of concerns, with a cleaner interface for dynamically adding and removing services in the service watcher and a better-documented and separated out server. i'm also not sure if eventmachine is the right approach. i would not merge this as-is. also, i think @pcarrier has additional concerns about the concept as a whole. pierre, can you elaborate? |
I'm not overly attached to this implementation--it's just a minimum viable product. The intention is to be able to make progress on other things while you guys decide what the overall direction is for dynamic service discovery. We need functionality like this, but it doesn't necessarily need to be done this way. That said, we've been using this for a few weeks now without any trouble. |
we talked in person about how to move forward on this. @brndnmtthws is going to (a) split out the server code (b) improve the interface for setting up additional watchers internally and (c) think about improving the TCP protocol for service registration to be more resilient to currently unknown effects. this should be within the next week. |
I'm made progress on this, but I'm stuck on an issue with JRuby right now. I can't seem to get the eventmachine jar to load correctly without forcing it to use the ruby version of eventmachine rather than the java one. |
Bump. Added some documentation to the readme. |
Makes it nice to kill.
Scenario: Nerve terminates prematurely (maybe because of a disconnection for a ZK server). It gets restarted by runit. It then does its health checks, finds that the service is up. It tries to create the node, fails because the node already exists, so it sets its value. The session for the previous instance of nerve expires, and the node being ephemeral and attached to that session, disappears. Nerve doesn't do anything about that unless the service goes down then back up, at which point the node gets recreated.
Expose a TCP interface for changing the nerve configuration at runtime.
Updated and rebased. Might be completely broken now, however. Can we get this merged please? Thanks. |
Still need this, fyi. |
Yep, still important. |
@airbnb/sre Yep, still need this. The original PR (#10 (comment)) was opened 5 months ago. What's the problem? |
what does "might be completely broken now" mean? |
You changed a bunch of things and ignored this PR, so for all I know this branch is broken now. I'll try and test it today, but it's rather frustrating. |
I've fixed the breakage, everything works again. |
I haven't ignored this, I've suggested running multiple nerve instances to achieve the same effect. You've ignored my recommendation. |
Your suggestion was never ignored. We discussed it, and our team decided
|
I've observed that this version of Nerve fails to announce services and logs that a service is "up" when no mention of that host can be found in Zookeeper. Consider host
However, Zookeeper has no I received a stack trace from @brndnmtthws on Tuesday that might be of some help to track down the cause of the issue: https://gist.github.com/nelgau/aad34287936abce3ca08 |
When our ZK node is deleted or otherwise changes, reset it to ensure we don't drop our node.
I'm going to close this because I don't think that dynamic nerve/synapse config reloads should be the goal going forward. We can reconsider if our current hitless system turns out not to scale (very possible given the design), in which case we'll work on it then. Basically, you can always restart nerve without dropping traffic by starting two nerves and then cycling between them. I actually think the system is easier to understand this way because the config is the source of truth for what nerve/synapse are doing. |
same as #10 but rebased on current master
@pcarrier @brndnmtthws