Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: shutdown supervisor in healthcheck.sh on fatal error, fixes #5993 #6033

Merged
merged 2 commits into from Apr 4, 2024

Conversation

stasadev
Copy link
Member

The Issue

How This PR Solves The Issue

I checked https://docs.docker.com/reference/dockerfile/#healthcheck before doing any change and when we use start_period, no matter what we do during this period, the container cannot become unhealthy.

start period provides initialization time for containers that need time to bootstrap. Probe failure during that period will not be counted towards the maximum number of retries.

I also searched for a fast fail when starting the container, but that's all I found:
https://forums.docker.com/t/healthcheck-fail-fast-feature-during-container-start-up/121201

So, in the end, I decided to do the same thing as suggested in:

but in a simpler way.

Also I decreased startretries=10 to startretries=3 in supervisor, there is no reason to try more and wait longer, if the process cannot start several times already.

Manual Testing Instructions

See error for php-fpm after 20-30 seconds (without waiting for 2 minutes default timeout):

echo 'RUN chmod 000 /run/php' > .ddev/web-build/Dockerfile
ddev restart
...
Failed waiting for web/db containers to become ready: web container failed: log=&{2024-03-29 21:42:41.347467771 +0200 EET 2024-03-29 21:42:42.086281673 +0200 EET 143 php-fpm:FATAL Shut down
}, err=ddev-d10-web container is unhealthy: &{2024-03-29 21:42:41.347467771 +0200 EET 2024-03-29 21:42:42.086281673 +0200 EET 143 php-fpm:FATAL Shut down
}, please use 'ddev logs -s web' and 'docker logs ddev-d10-web' and 'docker inspect --format "{{ json .State.Health }}" ddev-d10-web' to find out why it failed

Automated Testing Overview

Related Issue Link(s)

Release/Deployment Notes

@stasadev stasadev requested review from a team as code owners March 29, 2024 19:51
@github-actions github-actions bot added bugfix dependencies Pull requests that update a dependency file labels Mar 29, 2024
@rfay
Copy link
Member

rfay commented Mar 30, 2024

Just looking at the code says this is fine, will manually test, thanks!

Copy link
Member

@rfay rfay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great and works fine. The one request I have is not to include gunicorn, which I think will make it easier for python folks to debug. Gunicorn is more like web_extra_daemons in that it's easier to mess up, and when things go bad you want to be able to restart it.

I tested this with a bad nginx configuration and it behaved correctly. I also tried a bad PHP extra config, and surprisingly php-fpm just ignored that.

I really liked the clear output when nginx is the problem:
Waiting for web/db containers to become ready: [web db] Failed waiting for web/db containers to become ready: web container failed: log=&{2024-04-01 07:04:12.760287761 -0600 MDT 2024-04-01 07:04:13.11287521 -0600 MDT 143 nginx:FATAL Shut down }, err=ddev-d10-web container is unhealthy: &{2024-04-01 07:04:12.760287761 -0600 MDT 2024-04-01 07:04:13.11287521 -0600 MDT 143 nginx:FATAL Shut down

We don't get good info like that from apache or php.

@@ -20,6 +20,14 @@ if [ -f /tmp/healthy ]; then
sleep ${sleeptime}
fi

# Shutdown the supervisor if one of the critical processes is in the FATAL state
for service in php-fpm nginx apache2 gunicorn; do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the hardest things about python/gunicorn is that it doesn't stay up if we have a settings problem. I'd rather treat it like we do the web_extra_daemons. It might be a step forward with python.

Suggested change
for service in php-fpm nginx apache2 gunicorn; do
for service in php-fpm nginx apache2; do

@stasadev stasadev force-pushed the 20240329_stasadev_healthcheck branch from be1317a to 01f0529 Compare April 1, 2024 17:23
@stasadev stasadev force-pushed the 20240329_stasadev_healthcheck branch from 01f0529 to 7f542a6 Compare April 1, 2024 17:33
@stasadev
Copy link
Member Author

stasadev commented Apr 1, 2024

Rebased, and pushed the image again.

Copy link
Member

@rfay rfay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked great on my first try. I set up https://github.com/ddev/test-django4-bakerydemo and didn't remember that other things had to be done. So it failed.

But I was able to ddev ssh and study it, which is the good thing here.

And after I did the things it was OK.

@rfay rfay merged commit 7f2d79b into ddev:master Apr 4, 2024
24 checks passed
@stasadev stasadev deleted the 20240329_stasadev_healthcheck branch April 4, 2024 17:23
stasadev added a commit to stasadev/ddev that referenced this pull request Apr 24, 2024
stasadev added a commit to stasadev/ddev that referenced this pull request Apr 24, 2024
stasadev added a commit to stasadev/ddev that referenced this pull request May 10, 2024
stasadev added a commit to stasadev/ddev that referenced this pull request May 10, 2024
stasadev added a commit to stasadev/ddev that referenced this pull request May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants