Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config support for health, onChange, sensor timeouts #199

Merged
merged 4 commits into from
Aug 12, 2016

Conversation

tgross
Copy link
Contributor

@tgross tgross commented Aug 5, 2016

For #139

This PR provides configuration options for the health, onChange, and sensor handlers to take advantage of the timeout option provided by PR #184.

TODO: Done

  • verify that I haven't reduced test coverage; the timeout options themselves have been well-tested previously so I just need to make sure the extra config is covered
  • figure out what's going on in Have health checks timeout #139 (comment), which I haven't been able to reproduce locally.

cc @misterbisson @justenwalker

@tgross
Copy link
Contributor Author

tgross commented Aug 5, 2016

I've pushed a commit for extending our test coverage, so I think we're good there. Still having trouble reproducing #139 (comment)

@tgross
Copy link
Contributor Author

tgross commented Aug 5, 2016

Figured out that previous build failure was because the fix in a later commit wasn't included in that build. ref #139 (comment) for details.

@tgross
Copy link
Contributor Author

tgross commented Aug 5, 2016

Oops, I need to add docs for this feature now.

@tgross
Copy link
Contributor Author

tgross commented Aug 5, 2016

Ok I think this is ready for review.

@tgross tgross mentioned this pull request Aug 5, 2016
@@ -119,13 +122,15 @@ The format of the JSON file configuration is as follows:
- `poll` is the time in seconds between polling for health checks.
- `ttl` is the time-to-live of a successful health check. This should be longer than the polling rate so that the polling process and the TTL aren't racing; otherwise Consul will mark the service as unhealthy.
- `tags` is an optional array of tags. If the discovery service supports it (Consul does), the service will register itself with these tags.
- `timeout` an optional value to wait before forcibly killing the health check. Health checks killed in this way are terminated immediately (`SIGKILL`) without an opportunity to clean up their state. This means that a heartbeat will not be sent. The minimum timeout is `1ms`. Omitting this field means that ContainerPilot will wait indefinitely for the health check.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omitting this field means that ContainerPilot will wait indefinitely for the health check.

You probably debated this privately before deciding on it, but would the service's poll or ttl value be a more sane default? Yes, this default behavior matches the previous behavior, but is that most desirable?

@misterbisson
Copy link
Contributor

This looks solid, but I have a question about the default behavior.

@justenwalker: have you had a look?

@justenwalker
Copy link
Contributor

justenwalker commented Aug 11, 2016

@misterbisson wrote:

You probably debated this privately before deciding on it, but would the service's poll or ttl value be a more sane default? Yes, this default behavior matches the previous behavior, but is that most desirable?

Probably the poll would be a sane default, since another health check would be queued up to run.

That being said, the behavior as it exists right now is consistent across all pollables. Making an exception to the rule for health checks might mean that the results are surprising - so there's that trade-off to consider. I'm just not sure what is least surprising - but I'd lean towards using poll as a default over waiting forever since I can't really come up with a good reason why I'd ever want that.

@misterbisson
Copy link
Contributor

I need to explain to @tgross that I wasn't trying to draw @justenwalker into an argument. Rather, I was pinging him to make sure he saw the change in general.

As for the default timeout, here's what this change affects:

  • health in services
  • onChange in backends
  • check in sensors

Each of those has a poll value that might be a suitable default. I don't have a strong commitment to changing it, just asking if defaulting to forever was intentional.

@tgross
Copy link
Contributor Author

tgross commented Aug 12, 2016

just asking if defaulting to forever was intentional.

I left it as forever for backwards compatibility. If we default to having a timeout where there was none previously then we might break someone's application. For example, currently autopilotpattern/mysql relies on the fact that it's not going to get timed out during health checks so that it can do the snapshot to Manta (will be changed in autopilotpattern/mysql#44). The primary will be marked as unhealthy during that time but it's intentional.

That being said, I'm absolutely in agreement that using the poll value as the default (as we did for task hooks) is the right approach in a future 3.0 release.

@misterbisson
Copy link
Contributor

That being said, I'm absolutely in agreement that using the poll value as the default (as we did for task hooks) is the right approach in a future 3.0 release.

Well put. It's a plan. Should we ticket that and note it in the docs?

@tgross
Copy link
Contributor Author

tgross commented Aug 12, 2016

Should we ticket that and note it in the docs?

Will do.

@tgross
Copy link
Contributor Author

tgross commented Aug 12, 2016

Opened #206 for future work and added deprecation warning to documentation.

@misterbisson
Copy link
Contributor

lgtm, 🏡 🚶

@tgross
Copy link
Contributor Author

tgross commented Aug 12, 2016

Ok, I'm going to merge this and cut the 2.4.0 release from it.

@tgross tgross merged commit 8d4f123 into TritonDataCenter:master Aug 12, 2016
@tgross tgross deleted the gh139_timeout_config branch April 4, 2017 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants