-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the healthcheck step #740
Conversation
117d4aa
to
6d57012
Compare
b8b38c6
to
f1ea2c7
Compare
lib/kamal/cli/app/boot.rb
Outdated
if barrier | ||
if barrier_role? | ||
barrier.open | ||
else | ||
wait_for_barrier | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Early return suggestion:
if barrier | |
if barrier_role? | |
barrier.open | |
else | |
wait_for_barrier | |
end | |
end | |
return unless barrier | |
if barrier_role? | |
barrier.open | |
else | |
wait_for_barrier | |
end |
lib/kamal/cli/app.rb
Outdated
@@ -282,4 +285,8 @@ def current_running_version(host: KAMAL.primary_host) | |||
def version_or_latest | |||
options[:version] || KAMAL.config.latest_tag | |||
end | |||
|
|||
def web_and_non_web_roles? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This became unused after recent changes.
5890b14
to
494c205
Compare
I'm hoping to get capistrano/sshkit#535 added to SSHKit, so we don't need |
@djmb At the moment, we are using the fact that the healthcheck step runs before everything to run our migrations, Any recommendations for achieving a similar outcome, that is to make sure commands are run on one container successfully before traffic is served to new containers or new containers are booted, when the healthcheck step is removed? Thanks. |
Also, would this PR resolve the issue with not being able to deploy to a specific role on a host that doesn't have the primary role? As mentioned in this issue: #709 For additional context: kamal deploy -r secondary-role
Ensure app can pass healthcheck...
INFO [703ee48c] Running docker run --detach --name healthcheck-primary-role --publish 3999:80 --label service=healthcheck-primary-role -e KAMAL_CONTAINER_NAME="healthcheck-primary-role" --env-file .kamal/env/roles/primary-role.env --health-cmd "curl -f http://localhost/up || exit 1"
INFO [a8ccc7f3] Finished in 0.075 seconds with exit status 123 (failed).
Releasing the deploy lock...
Finished all in 22.6 seconds
ERROR (SSHKit::Command::Failed): Exception while executing on host 192.168.0.11: docker exit status: 125
docker stdout: Nothing written
docker stderr: docker: open .kamal/env/roles/primary-role.env: no such file or directory. |
@argia-andreas - you should be able to run migrations before anything is booted with a You'd want to run something like:
Regarding #709 - this is intended to fix that though I'll need to double check that it does! |
494c205
to
4083c86
Compare
4083c86
to
e9e51b7
Compare
To speed up deployments, we'll remove the healthcheck step. This adds some risk to deployments for non-web roles - if they don't have a Docker healthcheck configured then the only check we do is if the container is running. If there is a bad image we might see the container running before it exits and deploy it. Previously the healthcheck step would have avoided this by ensuring a web container could boot and serve traffic first. To mitigate this, we'll add a deployment barrier. Until one of the primary role containers passes its healthcheck, we'll keep the barrier up and avoid stopping the containers on the non-primary roles. It the primary role container fails its healthcheck, we'll close the barrier and shut down the new containers on the waiting roles. We also have a new integration test to check we correctly handle a a broken image. This highlighted that SSHKit's default runner will stop at the first error it encounters. We'll now have a custom runner that waits for all threads to finish allowing them to clean up.
39d0ab5
to
ee758d9
Compare
If a primary role container is unhealthy, we might take a while to timeout the health check poller. In the meantime if we have started the other roles, they'll be running tow containers. This could be a problem, especially if they read run jobs as that doubles the worker capacity which could cause exessive load. We'll wait for the first primary role container to boot successfully before starting the other containers from other roles.
Using a different SSHKit runner doesn't work well, because the group runner uses the Parallel runner internally. So instead we'll patch its behaviour to fail slow. We'll also get it to return all the errors so we can report on all the hosts that failed.
To speed up deployments, we'll remove the healthcheck step.
This adds some risk to deployments for non-web roles - if they don't have a Docker healthcheck configured then the only check we do is if the container is running.
If there is a bad image we might see the container running before it exits and deploy it. Previously the healthcheck step would have avoided this by ensuring a web container could boot and serve traffic first.
To mitigate this, we'll add a web barrier. Non web containers will wait before shutting down the old containers until at least one web container has passed its healthcheck.
It the web container fails its healthcheck, we'll close the barrier and shut down the new containers on the non-web roles.
We also have a new integration test to check we correctly handle a a broken image. This highlighted that SSHKit's default runner will stop at the first error it encounters. We'll now have a custom runner that waits for all threads to finish allowing them to clean up.
That means that if we have a deployment that completes on some hosts but not others we can run
kamal app version --quiet
to see which version is running on each host.