Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ContainerWait and healthchecks smarter, fixes #1287 #1301

Merged
merged 6 commits into from
Nov 30, 2018

Conversation

rfay
Copy link
Member

@rfay rfay commented Nov 24, 2018

The Problem/Issue/Bug:

  1. ContainerWait() was sometimes waiting for a really long time when one of the containers (usually db) had already exited. (It does seem like the code already handles this. I think the actual problem is waiting too long for an unhealthy container; in other words, the container does not display as unhealthy soon enough. )

It turns out that this behavior was mostly because we were polling health of the web container first and waiting for its healthcheck to complete, then tried the db container, which may have long since exited.

  1. People have mentioned (and I've noticed) that ddev projects use quite a lot of CPU and battery. And it's mostly the healthcheck. (I experimented with a number of solutions to this, but all of them result in ddev start taking longer, which I think is probably unacceptable. So I backed out my experiments.)

How this PR Solves The Problem:

  • Check db container first.
  • Add healthcheck timeouts and retries to the docker-compose.yaml (and related) so we control them at docker-compose level.
  • back off how long we'll wait for db container. It was set up for 30 retries, trying every 2 seconds, so 60 seconds before it became "unhealthy".

Manual Testing Instructions:

  • Try ddev rm && ddev start.
  • Try breaking a container deliberately. My technique for this was to do a docker-compose.breakit.yaml with contents like this:
version: '3.6'
services:
  web:
    command: "bash -c 'sleep 2 && exit '"
  • Do things to break the db container:
    • in mysql DROP DATABASE db; and then watch behavior as it changes (ddev list) then ddev rm && ddev start and see how long it takes to fail.
    • In the db container, rm /var/lib/mysql/mysql/*.MYD. Watch behavior after you do it. Then try ddev rm && ddev start

Automated Testing Overview:

The overall behavior has not changed, I don't think it needs new tests.

This doesn't make a major change in behavior, so no test changes yet.

Related Issue Link(s):

OP #1287

Release/Deployment notes:

It may be worth

  • Writing a stack overflow to tell people how to use longer healthchecks to save battery. Or perhaps adding a ddev start --slow-healthcheck or something.

@rfay rfay added this to the v1.5.0 milestone Nov 24, 2018
@rfay rfay self-assigned this Nov 24, 2018
@rfay rfay changed the title Make ContainerWait smarter, less healthcheck churn, fixes #1287 Make ContainerWait and healthchecks smarter, fixes #1287 Nov 25, 2018
Copy link
Contributor

@andrewfrench andrewfrench left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes look good to me, and testing showed healthy and unhealthy containers being reported reliably.

  • ddev rm && ddev start reported healthy containers,
  • ddev rm && ddev start with .ddev/compose.breakit.yaml reported an unhealthy web container within a few seconds
  • After dropping the db database, ddev list reported healthy containers for a few seconds, then reported an unhealthy db container; the container was still reported as unhealthy after ddev rm && ddev start after just a few seconds (time ddev start reports 11s, the second execution, after other containers are running, takes 3.5s)
  • After deleting mysql files, ddev rm && ddev start reported unhealthy containers in around the same time as above

@rfay rfay merged commit 7909843 into ddev:master Nov 30, 2018
@rfay rfay deleted the 20181124_containerwait branch November 30, 2018 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants