Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quiet down healthchecks to save battery and CPU usage, fixes #1663 #1674

Merged
merged 8 commits into from Jul 3, 2019

Conversation

rfay
Copy link
Member

@rfay rfay commented Jun 26, 2019

The Problem/Issue/Bug:

#1663 complains about the constant running of the ddev-ssh-agent healthcheck, which uses CPU and battery. But really most of the containers have this problem, and with the other ones it multiplies.

Docker provides us only the "interval" configuration for healthcheck, which is both the time to wait before the first healthcheck and how often to run it after that.

In ddev, we mostly care about when it first becomes ready and usable, and it doesn't often need to be checked after that, but we were always stymied by Docker's design mistake here.

How this PR Solves The Problem:

This is admittedly a hacky approach, but

  • Use a 1 second interval. Docker will wait only 1 second before trying it the first time, and will try thereafter every 1 second.
  • Use a long 120s timeout, meaning docker will wait that long for a response
  • Track the health status out-of-band, by creating /tmp/healthy when we find it's healthy.
  • If the healthcheck finds on startup that it most recently was healthy, sleep for a defined time (configured here to a minute or so) before checking. That slows down the whole healthcheck routine.
  • The start script deletes /tmp/healthy. This solves the case where a container is docker-stopped (paused in ddev parlance) and then started. The start script on resuming deletes /tmp/healthy so we get a fresh start checking healthy.

Manual Testing Instructions:

ddev stop -a --stop-ssh-agent and ddev start
Look at your computer's CPU usage.
Look at docker events, you should see far less than previously.
An interesting tool is docker inspect ddev-router | jq ".[].State.Health", which shows the last few healthchecks and their result (change container name as necessary). This is particularly interesting early in the life of a container.

Automated Testing Overview:

None provided.

Related Issue Link(s):

OP #1663

Release/Deployment notes:

@rfay rfay added this to the v1.10 milestone Jun 26, 2019
@rfay
Copy link
Member Author

rfay commented Jun 27, 2019

This is close, but ddev pause (docker stop) leaves that /tmp/healthy in there, so on start, it doesn't update soon enough. Need to figure out a way to use the actual date on /tmp/healthy. Or kill it off on startup, or something.

@rfay rfay force-pushed the 20190626_healthcheck branch 2 times, most recently from 60d32d4 to 6b6a654 Compare June 28, 2019 22:21
@rfay
Copy link
Member Author

rfay commented Jun 28, 2019

Artifacts for this round are at https://circleci.com/gh/drud/ddev/15097#artifacts/containers/0 for anybody who would like to take a look.

@rfay
Copy link
Member Author

rfay commented Jun 30, 2019

@j6s reviewed and ccommented in #1663 (comment)

@j6s
Copy link

j6s commented Jun 30, 2019

One minor question (not sure how this project handles configuration): Would it maybe make sense to pass the sleep time down via environment variable? That way we could have something like

ENV HEALTHCHECK_SLEEP=59
HEALTHCHECK...

in the dockerfile

@rfay
Copy link
Member Author

rfay commented Jun 30, 2019

I think it's absolutely worth making that configurable, thanks. I thought about it as the work went on, but just wanted to prove the concept at the time, and never got back. I'll try to add a global config for that setup if this flies with everybody.

@rfay rfay merged commit a2b82a6 into ddev:master Jul 3, 2019
@rfay rfay deleted the 20190626_healthcheck branch July 3, 2019 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants