Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Healthcheck not working as expected #2073

Closed
EinfachHans opened this issue Apr 26, 2024 · 30 comments 路 Fixed by #2097 or #2155
Closed

[Bug]: Healthcheck not working as expected #2073

EinfachHans opened this issue Apr 26, 2024 · 30 comments 路 Fixed by #2097 or #2155
Assignees

Comments

@EinfachHans
Copy link

Description

After a longer email conversation with andras about health checks i want to track the last bugs around health-checks i have here. Thanks for all the help so far 馃憤馃徏

Before i had this configuration in my dockerfile:

HEALTHCHECK --interval=10s --timeout=3s --start-period=30s \
 CMD curl -f http://localhost/healthz || exit 1

Now i have this configuration:

Bildschirmfoto 2024-04-26 um 11 01 05

The two problems i have are:

  • It looks like the provided route is not called. When i look into the logs, i don't see any logs about that route. When i had the healthcheck in my dockerfile i could the the logs of the healthz route being called every 10 seconds
  • I'm unsure about the start period, because in my deployment logs the health check is performed immediately and the attempts are counted up. Is this value used?

In my health check i check the database connection. As a test i shut down the database. 5 minutes later the backend resource was still marked as "healthy" so i think the health check is not really executing

Minimal Reproduction (if possible, example repository)

Please let me know if you need a reproduction and what this should contain

Exception or Error

No response

Version

v4.0.0-beta.266

@pagoru
Copy link

pagoru commented Apr 28, 2024

Same problem on 270

@andrasbacsai
Copy link
Member

I have made some modifications in the upcoming version.

  1. The start period and check interval are fixed. There was a strange bug that caused them to be ignored.
  2. When you define a Docker image or Dockerfile buildpack, the health check will be turned off by default, as it cannot be determined automatically what to check.
  3. Parse HEALTHCHECK from dockerfile and use it.

@EinfachHans
Copy link
Author

EinfachHans commented Apr 29, 2024

Will re-test after the release and let you know if it works, thanks

Copy link
Member

Let me know if it still does not work

@fcpauldiaz
Copy link

fcpauldiaz commented Apr 30, 2024

I tried to re deploy my app today, without code changes and the health check is having error. I'm on v4.0.0-beta.271. Curl inside the docker works as expected

@fcpauldiaz
Copy link

fcpauldiaz commented May 1, 2024

I added a log when accessing the root path, that should be the default health path.

This is how it looks a deployment, first is the current docker and the second one is the one being deployed and the print "GET /", doesn't show on the second one, meaning it doesn't call the health url.

Screenshot 2024-04-30 at 6 40 37鈥疨M

additional: once using a custom health url, I couldn't revert to the use the panel config after deleting. Seems it kept the custom health url persisted.

@andrasbacsai andrasbacsai reopened this May 1, 2024
Copy link
Member

Can you please show me the healthcheck configuration page?

Also, is your app based on dockerfile buildpack?

@EinfachHans
Copy link
Author

EinfachHans commented May 1, 2024

Hey @andrasbacsai ,

i also checked my health check now with 271. I use the same health check configuration as in my original post:

Bildschirmfoto 2024-04-26 um 11 01 05

My issues seems to be resolved now. The startup time is now awaited and i now see the logs of my defined route, which is called in the interval i provided! 馃憤馃徏

I have the following points still in mind:

  1. Instead of awaiting the startup time, you could already start checking for the health, but in case it fails, don't count up the attempt. Like this it could be marked as healthy quicker with still having a startup time where its "okay" for the healthcheck to fail
  2. In my health check i do only a simple check if my db is accessible atm. When i turn off the db, the following happens: I stop seeing the logs about the request to the health url, but it seems to still call them, because the container is marked as unhealthy after a time. It also gets healthy after i turn on the db again. This is awesome, only thing missing here would be notifications for the health status change

Edit: The logs are shown when i call the health url via postman while the db is unaccessible. 馃

@andrasbacsai andrasbacsai self-assigned this May 1, 2024
@fcpauldiaz
Copy link

fcpauldiaz commented May 1, 2024

My app is based on Docker file.

I tried this config

HEALTHCHECK --interval=10s --timeout=3s --start-period=10s \
 CMD curl -f http://localhost:4000 || exit 1

and I removed it after, but the config panel kept the custom url.

Screenshot 2024-05-01 at 8 51 54鈥疉M

@pagoru
Copy link

pagoru commented May 2, 2024

Still not working on me neither. (I disabled it because it fails)

image

@gkibria
Copy link

gkibria commented May 2, 2024

Hi, why I am not getting any healthchecks option in the ui? I tried docker compose project. Any idea? using 272. Thanks

@pagoru
Copy link

pagoru commented May 2, 2024

Hi, why I am not getting any healthchecks option in the ui? I tried docker compose project. Any idea? using 272. Thanks

This is not related with this bug. Please create another issue, discussion or ask on discord!

@EinfachHans
Copy link
Author

EinfachHans commented May 3, 2024

I just set up a new project (production) where the health check is configured the same as in the other (staging) and it does not work..

The health check just don't marks as healthy and i still see no logs about the request against the endpoint.

EDIT: Wrong configuration on my side, sorry^^ But the logs are still not shown as mentioned above.

@lewisdewsbury
Copy link

I'm also having issues with health checks failing, but only when the Build Pack is set to Dockerfile and the version of Coolify is greater than v4.0.0-beta.270.

Part of Deployment Log on v4.0.0-beta.274 (fails):

[2024-May-03 14:47:06.359594] Rolling update started.
[2024-May-03 14:47:06.870879] New container started.
[2024-May-03 14:47:06.872930] Waiting for healthcheck to pass on the new container.
[2024-May-03 14:47:06.874496] Healthcheck URL (inside the container): GET: http://localhost:3000/
[2024-May-03 14:47:06.876012] Waiting for the start period (5 seconds) before starting healthcheck.
[2024-May-03 14:47:11.994908] Attempt 1 of 10 | Healthcheck status: "starting"
[2024-May-03 14:47:17.132463] Attempt 2 of 10 | Healthcheck status: "starting"
[2024-May-03 14:47:22.250097] Attempt 3 of 10 | Healthcheck status: "starting"
[2024-May-03 14:47:27.399172] Attempt 4 of 10 | Healthcheck status: "starting"
[2024-May-03 14:47:32.552693] Attempt 5 of 10 | Healthcheck status: "starting"
[2024-May-03 14:47:37.678722] Attempt 6 of 10 | Healthcheck status: "starting"
[2024-May-03 14:47:42.803334] Attempt 7 of 10 | Healthcheck status: "starting"
[2024-May-03 14:47:47.946351] Attempt 8 of 10 | Healthcheck status: "starting"
[2024-May-03 14:47:53.118104] Attempt 9 of 10 | Healthcheck status: "starting"
[2024-May-03 14:47:58.280413] Attempt 10 of 10 | Healthcheck status: "unhealthy"
[2024-May-03 14:47:58.283316] Removing old containers.
[2024-May-03 14:47:58.286167] New container is not healthy, rolling back to the old container.
[2024-May-03 14:47:58.627524] Rolling update completed.

Part of Deployment Log on v4.0.0-beta.270 (passes):

[2024-May-03 14:49:25.141632] Rolling update started.
[2024-May-03 14:49:25.730963] New container started.
[2024-May-03 14:49:25.733040] Waiting for healthcheck to pass on the new container.
[2024-May-03 14:49:25.854393] Attempt 1 of 10 | Healthcheck status: "starting"
[2024-May-03 14:49:27.597697] Attempt 2 of 10 | Healthcheck status: "starting"
[2024-May-03 14:49:32.734846] Attempt 3 of 10 | Healthcheck status: "healthy"
[2024-May-03 14:49:32.741230] New container is healthy.
[2024-May-03 14:49:32.742427] Removing old containers.
[2024-May-03 14:49:33.155857] Rolling update completed.

I'm using the Dockerfile from the nextjs with-docker example, on a newly created next project.

The health check settings are the default, with only the port changed to 3000.

Downgrading to 270 or disabling the health check allows the deployment to succeed.

@andrasbacsai
Copy link
Member

No matter how I try to replicate this issue, I cannot.

Healthcheck is working on older version of Coolify version with Dockerfile buildpack, because there was basically no healthcheck (exit 0 command was used, so it always returns true).

@lewisdewsbury Your nextjs applications returns a 200 on path / ?

@EinfachHans
Copy link
Author

But could you reproduce my two problems?

  1. Missing notification in case of healthy status changes
  2. Somehow missing logs for failed health checks

Also i would like to hear your opinion about my suggestion to the startPeriod to not await it, but don't count up the attempts while in this period.

@lewisdewsbury
Copy link

No matter how I try to replicate this issue, I cannot.

Healthcheck is working on older version of Coolify version with Dockerfile buildpack, because there was basically no healthcheck (exit 0 command was used, so it always returns true).

@lewisdewsbury Your nextjs applications returns a 200 on path / ?

Hi andras,

I've tracked down the cause of my issue, everything is now working perfectly on 274.

The problem was not having curl installed on the runner. The Dockerfile was based on node:18-alpine, which does not have curl installed by default. Installing curl on the runner, or changing the base image to node:18 allowed the health checks to pass.

Thank you for your help, it got me looking in the right direction 馃憤

@thomasmol
Copy link

I had the exact same problem as @lewisdewsbury. Started having the healthcheck not working on version beta.271 on a resource that is build with a dockerfile. Dockerfile is based on a Bun base image. Added apt-get install -y curl in the dockerfile and now the healthcheck runs fine again! thanks @lewisdewsbury for the solution

@fcpauldiaz
Copy link

fcpauldiaz commented May 5, 2024

I had the same issue. @andrasbacsai you can close the issue now.

Copy link
Member

andrasbacsai commented May 6, 2024

I will make a few changes. Coolify will check if curl or wget is available. If neither, it will try to use the proc filesystem for hc, or just pass healthy state. Not sure yet.

@EinfachHans
Copy link
Author

But could you reproduce my two problems?

  1. Missing notification in case of healthy status changes
  2. Somehow missing logs for failed health checks

Also i would like to hear your opinion about my suggestion to the startPeriod to not await it, but don't count up the attempts while in this period.

@andrasbacsai did you already looked at these three points?

@andrasbacsai
Copy link
Member

  1. Missing notification in case of healthy status changes

You mean if your app is running for a while and the healthy status changes somehow? There is not notification yet for that.

  1. Somehow missing logs for failed health checks
    Failed healthchecks are logged as starting or unhealthy. How do you want to log it? I think this is clear (Coolify getting these values from the docker engine).

@EinfachHans
Copy link
Author

@andrasbacsai

About 1: Yes, this notification would be awesome! I would be notified directly if my service is unhealthy and i can the look into it why.

About 2: My application is an express backend. I log every request. Successfully requests about the health url are logged correctly. When i turn off my postgres, which causes the health check to return an error, the container status is updated correctly (so the requests are send i guess), but there are absolut no logs about these failed requests in my logs. When i then manually call the health check url via postman there is a log. Thats quite weird

About 3: Thats my suggestion to faster have a healthy container. By not awaiting the start period, but not count up failed attempts this could be arcvhieved.

@EinfachHans
Copy link
Author

@andrasbacsai should i create a separate issue for these points?

@EinfachHans
Copy link
Author

I opened a feature discussion about point 1 & 3: #2200

@vikyw89
Copy link

vikyw89 commented May 15, 2024

On v4.0.0-beta.276

image

It was always working before, I think this issue started like a month ago. Need to disable healthcheck for it to work

@vikyw89
Copy link

vikyw89 commented May 15, 2024

dockerfile. Dockerfile is based on a Bun base image. Added apt-get install -y curl in the dockerfile and now the healthcheck runs fine again! thanks @lewisdewsbury for the solution

so do we need to do the healthcheck from inside the container ? I always thought healthcheck should be observed from outside the docker container.

@vikyw89
Copy link

vikyw89 commented May 15, 2024

No matter how I try to replicate this issue, I cannot.

Healthcheck is working on older version of Coolify version with Dockerfile buildpack, because there was basically no healthcheck (exit 0 command was used, so it always returns true).

@lewisdewsbury Your nextjs applications returns a 200 on path / ?

Is there a way to get the old "healthcheck" ? It works perfectly for my use case (knowing if the app is running when accessed from outside docker)

@fcpauldiaz
Copy link

previously it wasn't doing anything, you can disable the healthcheck

@vikyw89
Copy link

vikyw89 commented May 17, 2024

previously it wasn't doing anything, you can disable the healthcheck

before:

  • i can deploy and have the old application replaced after the new one is ready, no downtime

currently:

  • disabled healthcheck, application experiences downtime while the new app is starting. And in case the new deployment fail, the app is down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants