Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buddy does not deregister service from consul. #17

Closed
lettrung99 opened this issue Nov 19, 2015 · 7 comments
Closed

Buddy does not deregister service from consul. #17

lettrung99 opened this issue Nov 19, 2015 · 7 comments

Comments

@lettrung99
Copy link

I realized that if I stop and start the container containerbuddy does_not refresh the service ip address on consul. I would have to deregister the service manually. Is this a bug?

@tgross
Copy link
Contributor

tgross commented Nov 23, 2015

Is this a bug?

It's a known limitation. It was in the README at one point (ref cadb5e6) but it looks like I missed it when I moved the Roadmap items into Github issues. Let's mark this issue as an enhancement.

@misterbisson
Copy link
Contributor

This is an interesting question in the context of marking nodes for maintenance.

Should that process accept two signals: one will suspend health checks and mark the service as down, or another (new) signal can fully remove it from Consul. Alternatively, marking nodes for maintenance could simply remove the service entirely on the current signal.

I have mixed feelings about what's "correct" here.

@tgross
Copy link
Contributor

tgross commented Nov 24, 2015

Something to think about here is that we have a limited number of signals at our disposal before we start possibly interfering with normal operations. If we use a signal for this:

  • SIGHUP: reload config
  • SIGUSR1: mark nodes for maintenance
  • SIGUSR2: mark nodes for removal

And that's about it. The remaining signals are going to be used by the OS to manage the container or are just not right semantically.

Another thing to note here is that if the container halts on its own (i.e. it crashes), we still end up with nodes left behind in Consul. But I don't know... maybe that's ok because it differentiates between nodes intentionally removed and nodes involuntarily removed?

@tgross
Copy link
Contributor

tgross commented Nov 24, 2015

I almost hate to suggest this, but one way out of the signal namespace problem is to have a sock in the container (a la docker.sock except inside the container) that accepts some simple protocol. I'd like to avoid having an HTTP interface just because then securing it becomes a pain in the ass; this way we can assume that the container can manage itself and that any scheduler (which needs to have docker exec privileges anyway) can manage it too.

@tgross
Copy link
Contributor

tgross commented Nov 24, 2015

@misterbisson and I had a phone call about this and we've realized that the method we're using in #15 is limiting us here. If we reap the service from the discovery service instead of simply sending a FailTTL signal, we accomplish the same thing from the perspective of doing temporary maintenance and can cover the issue of reaping as well.

I'm going to update #15 in lieu of that.

@tgross
Copy link
Contributor

tgross commented Nov 24, 2015

#15 has been merged and I've got a release awaiting #19.

@tgross
Copy link
Contributor

tgross commented Nov 24, 2015

@tgross tgross closed this as completed Nov 24, 2015
@tgross tgross added released and removed open PR labels Dec 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants