Add config support for health, onChange, sensor timeouts #199

tgross · 2016-08-05T14:52:30Z

This PR provides configuration options for the health, onChange, and sensor handlers to take advantage of the timeout option provided by PR #184.

~~TODO:~~ Done

verify that I haven't reduced test coverage; the timeout options themselves have been well-tested previously so I just need to make sure the extra config is covered
figure out what's going on in Have health checks timeout #139 (comment), which I haven't been able to reproduce locally.

cc @misterbisson @justenwalker

tgross · 2016-08-05T17:13:56Z

I've pushed a commit for extending our test coverage, so I think we're good there. Still having trouble reproducing #139 (comment)

tgross · 2016-08-05T17:31:45Z

Figured out that previous build failure was because the fix in a later commit wasn't included in that build. ref #139 (comment) for details.

tgross · 2016-08-05T17:39:59Z

Oops, I need to add docs for this feature now.

tgross · 2016-08-05T17:45:38Z

Ok I think this is ready for review.

misterbisson · 2016-08-11T21:32:23Z

documentation/12-configuration/README.md

@@ -119,13 +122,15 @@ The format of the JSON file configuration is as follows:
 - `poll` is the time in seconds between polling for health checks.
 - `ttl` is the time-to-live of a successful health check. This should be longer than the polling rate so that the polling process and the TTL aren't racing; otherwise Consul will mark the service as unhealthy.
 - `tags` is an optional array of tags. If the discovery service supports it (Consul does), the service will register itself with these tags.
+- `timeout` an optional value to wait before forcibly killing the health check. Health checks killed in this way are terminated immediately (`SIGKILL`) without an opportunity to clean up their state. This means that a heartbeat will not be sent. The minimum timeout is `1ms`. Omitting this field means that ContainerPilot will wait indefinitely for the health check.


Omitting this field means that ContainerPilot will wait indefinitely for the health check.

You probably debated this privately before deciding on it, but would the service's poll or ttl value be a more sane default? Yes, this default behavior matches the previous behavior, but is that most desirable?

misterbisson · 2016-08-11T21:33:41Z

This looks solid, but I have a question about the default behavior.

@justenwalker: have you had a look?

justenwalker · 2016-08-11T21:40:48Z

@misterbisson wrote:

You probably debated this privately before deciding on it, but would the service's poll or ttl value be a more sane default? Yes, this default behavior matches the previous behavior, but is that most desirable?

Probably the poll would be a sane default, since another health check would be queued up to run.

That being said, the behavior as it exists right now is consistent across all pollables. Making an exception to the rule for health checks might mean that the results are surprising - so there's that trade-off to consider. I'm just not sure what is least surprising - but I'd lean towards using poll as a default over waiting forever since I can't really come up with a good reason why I'd ever want that.

misterbisson · 2016-08-11T21:52:10Z

I need to explain to @tgross that I wasn't trying to draw @justenwalker into an argument. Rather, I was pinging him to make sure he saw the change in general.

As for the default timeout, here's what this change affects:

health in services
onChange in backends
check in sensors

Each of those has a poll value that might be a suitable default. I don't have a strong commitment to changing it, just asking if defaulting to forever was intentional.

tgross · 2016-08-12T13:22:42Z

just asking if defaulting to forever was intentional.

I left it as forever for backwards compatibility. If we default to having a timeout where there was none previously then we might break someone's application. For example, currently autopilotpattern/mysql relies on the fact that it's not going to get timed out during health checks so that it can do the snapshot to Manta (will be changed in autopilotpattern/mysql#44). The primary will be marked as unhealthy during that time but it's intentional.

That being said, I'm absolutely in agreement that using the poll value as the default (as we did for task hooks) is the right approach in a future 3.0 release.

misterbisson · 2016-08-12T13:39:39Z

That being said, I'm absolutely in agreement that using the poll value as the default (as we did for task hooks) is the right approach in a future 3.0 release.

Well put. It's a plan. Should we ticket that and note it in the docs?

tgross · 2016-08-12T14:32:59Z

Should we ticket that and note it in the docs?

Will do.

tgross · 2016-08-12T16:54:37Z

Opened #206 for future work and added deprecation warning to documentation.

misterbisson · 2016-08-12T18:34:30Z

lgtm, 🏡 🚶

tgross · 2016-08-12T18:38:40Z

Ok, I'm going to merge this and cut the 2.4.0 release from it.

tgross added 2 commits August 5, 2016 13:12

Add config support for health, onChange, sensor timeouts

01f3a41

Expand backend and service config parsing tests

55ec387

tgross force-pushed the gh139_timeout_config branch from 489e6fd to 55ec387 Compare August 5, 2016 17:13

Add timeout config documentation

bedcd5b

tgross mentioned this pull request Aug 5, 2016

wait: no child processes #178

Closed

misterbisson reviewed Aug 11, 2016
View reviewed changes

tgross mentioned this pull request Aug 12, 2016

pollables should timeout by default #206

Closed

Added deprecation warning to documentation

2799052

tgross merged commit 8d4f123 into TritonDataCenter:master Aug 12, 2016

tgross deleted the gh139_timeout_config branch April 4, 2017 15:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add config support for health, onChange, sensor timeouts #199

Add config support for health, onChange, sensor timeouts #199

tgross commented Aug 5, 2016 •

edited

tgross commented Aug 5, 2016

tgross commented Aug 5, 2016

tgross commented Aug 5, 2016

tgross commented Aug 5, 2016

misterbisson Aug 11, 2016

misterbisson commented Aug 11, 2016

justenwalker commented Aug 11, 2016 •

edited

misterbisson commented Aug 11, 2016

tgross commented Aug 12, 2016

misterbisson commented Aug 12, 2016

tgross commented Aug 12, 2016

tgross commented Aug 12, 2016

misterbisson commented Aug 12, 2016

tgross commented Aug 12, 2016

Add config support for health, onChange, sensor timeouts #199

Add config support for health, onChange, sensor timeouts #199

Conversation

tgross commented Aug 5, 2016 • edited

tgross commented Aug 5, 2016

tgross commented Aug 5, 2016

tgross commented Aug 5, 2016

tgross commented Aug 5, 2016

misterbisson Aug 11, 2016

Choose a reason for hiding this comment

misterbisson commented Aug 11, 2016

justenwalker commented Aug 11, 2016 • edited

misterbisson commented Aug 11, 2016

tgross commented Aug 12, 2016

misterbisson commented Aug 12, 2016

tgross commented Aug 12, 2016

tgross commented Aug 12, 2016

misterbisson commented Aug 12, 2016

tgross commented Aug 12, 2016

tgross commented Aug 5, 2016 •

edited

justenwalker commented Aug 11, 2016 •

edited