-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mpHealth-2.2 responds with a status UP briefly during startup #26195
Comments
After further investigations, we've found that one application was taking longer than the other applications to start. After speaking with the kernel team, we believe that due to the significant number of applications being started, the thread pool is becoming exhausted so the context listeners that we use to check for the server status aren't being called in time. The server is reporting that it was succesfully started in 41 seconds before the final application was started which took 87 seconds. There's a configuration attribute(startTimeout) that can be set in the server.xml which will delay Liberty from reporting a successful start until all the applications have been started. Adding startTimeout is the supported solution in Liberty to handle situations like this with slow starting applications as this will prevent false positive responses from mpHealth. |
This issue has been resolved and merged. I'll leave an additional comment to confirm which fixpack will contain this fix. |
Fixpack 24.0.0.3 will contain this fix. |
Describe the bug
mpHealth-2.2 Readiness check (at /health/ready) responds with a 200 {"checks":[], "status": "UP"} for a brief time during startup. Around 40 seconds of connection refused, A second or two of this 200/UP response, and then back to 503/Down responses. The brief UP status return is causing Kubernetes to think the application has finished starting up when it has not. It then calls the liveness/readiness probes which have a much shorter failure threshold and causes the pod to be restarted.
A stack trace will be provided to Prashanth via Slack as requested with "com.ibm.ws.logging.trace.specification=*=info:HEALTH=all" tracing enabled.
Steps to Reproduce
Use the feature mpHealth-2.2
Configure two health checks, one Liveness one Readiness.
Delete the old pod, the new pod will be recreated by Kubernetes.
Repeatedly hit the /health/ready endpoint, such as with a command like "watch curl -v http://localhost:9080/health/ready"
Observe that one of the returns is a 200/UP even though the application is not fully started
Expected behavior
The /health/ready endpoint should not return 200/UP before the application has started.
Diagnostic information:
OpenLiberty Version: 23.0.0.3
Affected feature(s): mpHealth-2.2
Java Version:
java version "1.8.0_381"
Java(TM) SE Runtime Environment (build 8.0.8.10 - pxa6480sr8fp10-20230703_02(SR8 FP10))
IBM J9 VM (build 2.9, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20230628_53798 (JIT enabled, AOT enabled)
OpenJ9 - a962f72
OMR - 40dbd2d
IBM - 696e9df)
JCL - 20230630_01 based on Oracle jdk8u381-b09
server.xml configuration: Unable to provide, can be provided via Slack if required
If it would be useful, upload the messages.log file found in
$WLP_OUTPUT_DIR/messages.log
: Provided to Prashanth via Slack.The text was updated successfully, but these errors were encountered: