-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add config support for health, onChange, sensor timeouts #199
Conversation
489e6fd
to
55ec387
Compare
I've pushed a commit for extending our test coverage, so I think we're good there. Still having trouble reproducing #139 (comment) |
Figured out that previous build failure was because the fix in a later commit wasn't included in that build. ref #139 (comment) for details. |
Oops, I need to add docs for this feature now. |
Ok I think this is ready for review. |
@@ -119,13 +122,15 @@ The format of the JSON file configuration is as follows: | |||
- `poll` is the time in seconds between polling for health checks. | |||
- `ttl` is the time-to-live of a successful health check. This should be longer than the polling rate so that the polling process and the TTL aren't racing; otherwise Consul will mark the service as unhealthy. | |||
- `tags` is an optional array of tags. If the discovery service supports it (Consul does), the service will register itself with these tags. | |||
- `timeout` an optional value to wait before forcibly killing the health check. Health checks killed in this way are terminated immediately (`SIGKILL`) without an opportunity to clean up their state. This means that a heartbeat will not be sent. The minimum timeout is `1ms`. Omitting this field means that ContainerPilot will wait indefinitely for the health check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Omitting this field means that ContainerPilot will wait indefinitely for the health check.
You probably debated this privately before deciding on it, but would the service's poll
or ttl
value be a more sane default? Yes, this default behavior matches the previous behavior, but is that most desirable?
This looks solid, but I have a question about the default behavior. @justenwalker: have you had a look? |
@misterbisson wrote:
Probably the That being said, the behavior as it exists right now is consistent across all pollables. Making an exception to the rule for health checks might mean that the results are surprising - so there's that trade-off to consider. I'm just not sure what is least surprising - but I'd lean towards using |
I need to explain to @tgross that I wasn't trying to draw @justenwalker into an argument. Rather, I was pinging him to make sure he saw the change in general. As for the default timeout, here's what this change affects:
Each of those has a |
I left it as forever for backwards compatibility. If we default to having a timeout where there was none previously then we might break someone's application. For example, currently autopilotpattern/mysql relies on the fact that it's not going to get timed out during health checks so that it can do the snapshot to Manta (will be changed in autopilotpattern/mysql#44). The primary will be marked as unhealthy during that time but it's intentional. That being said, I'm absolutely in agreement that using the |
Well put. It's a plan. Should we ticket that and note it in the docs? |
Will do. |
Opened #206 for future work and added deprecation warning to documentation. |
lgtm, 🏡 🚶 |
Ok, I'm going to merge this and cut the 2.4.0 release from it. |
For #139
This PR provides configuration options for the
health
,onChange
, andsensor
handlers to take advantage of the timeout option provided by PR #184.TODO:Donecc @misterbisson @justenwalker