Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpHealth-2.2 responds with a status UP briefly during startup #26195

Closed
keithper opened this issue Sep 6, 2023 · 3 comments · Fixed by #27360
Closed

mpHealth-2.2 responds with a status UP briefly during startup #26195

keithper opened this issue Sep 6, 2023 · 3 comments · Fixed by #27360
Assignees
Labels
Needs member attention release bug This bug is present in a released version of Open Liberty release:24003 team:Lumberjack

Comments

@keithper
Copy link

keithper commented Sep 6, 2023

Describe the bug
mpHealth-2.2 Readiness check (at /health/ready) responds with a 200 {"checks":[], "status": "UP"} for a brief time during startup. Around 40 seconds of connection refused, A second or two of this 200/UP response, and then back to 503/Down responses. The brief UP status return is causing Kubernetes to think the application has finished starting up when it has not. It then calls the liveness/readiness probes which have a much shorter failure threshold and causes the pod to be restarted.

A stack trace will be provided to Prashanth via Slack as requested with "com.ibm.ws.logging.trace.specification=*=info:HEALTH=all" tracing enabled.

Steps to Reproduce
Use the feature mpHealth-2.2
Configure two health checks, one Liveness one Readiness.
Delete the old pod, the new pod will be recreated by Kubernetes.
Repeatedly hit the /health/ready endpoint, such as with a command like "watch curl -v http://localhost:9080/health/ready"
Observe that one of the returns is a 200/UP even though the application is not fully started

Expected behavior
The /health/ready endpoint should not return 200/UP before the application has started.

Diagnostic information:

  • OpenLiberty Version: 23.0.0.3

  • Affected feature(s): mpHealth-2.2

  • Java Version:
    java version "1.8.0_381"
    Java(TM) SE Runtime Environment (build 8.0.8.10 - pxa6480sr8fp10-20230703_02(SR8 FP10))
    IBM J9 VM (build 2.9, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20230628_53798 (JIT enabled, AOT enabled)
    OpenJ9 - a962f72
    OMR - 40dbd2d
    IBM - 696e9df)
    JCL - 20230630_01 based on Oracle jdk8u381-b09

  • server.xml configuration: Unable to provide, can be provided via Slack if required

  • If it would be useful, upload the messages.log file found in $WLP_OUTPUT_DIR/messages.log : Provided to Prashanth via Slack.

@keithper keithper added the release bug This bug is present in a released version of Open Liberty label Sep 6, 2023
@tonyreigns
Copy link
Contributor

tonyreigns commented Oct 2, 2023

After further investigations, we've found that one application was taking longer than the other applications to start. After speaking with the kernel team, we believe that due to the significant number of applications being started, the thread pool is becoming exhausted so the context listeners that we use to check for the server status aren't being called in time. The server is reporting that it was succesfully started in 41 seconds before the final application was started which took 87 seconds.

There's a configuration attribute(startTimeout) that can be set in the server.xml which will delay Liberty from reporting a successful start until all the applications have been started.

Adding <applicationManager startTimeout="120s"/> to the server.xml is all that's required. Setting the startTimeout parameter to a longer value will not have a negative impact if the application takes less time to start.

startTimeout is the supported solution in Liberty to handle situations like this with slow starting applications as this will prevent false positive responses from mpHealth.

@tonyreigns
Copy link
Contributor

This issue has been resolved and merged. I'll leave an additional comment to confirm which fixpack will contain this fix.

@tonyreigns
Copy link
Contributor

Fixpack 24.0.0.3 will contain this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs member attention release bug This bug is present in a released version of Open Liberty release:24003 team:Lumberjack
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants