mpHealth-2.2 responds with a status UP briefly during startup #26195

keithper · 2023-09-06T19:13:47Z

Describe the bug
mpHealth-2.2 Readiness check (at /health/ready) responds with a 200 {"checks":[], "status": "UP"} for a brief time during startup. Around 40 seconds of connection refused, A second or two of this 200/UP response, and then back to 503/Down responses. The brief UP status return is causing Kubernetes to think the application has finished starting up when it has not. It then calls the liveness/readiness probes which have a much shorter failure threshold and causes the pod to be restarted.

A stack trace will be provided to Prashanth via Slack as requested with "com.ibm.ws.logging.trace.specification=*=info:HEALTH=all" tracing enabled.

Steps to Reproduce
Use the feature mpHealth-2.2
Configure two health checks, one Liveness one Readiness.
Delete the old pod, the new pod will be recreated by Kubernetes.
Repeatedly hit the /health/ready endpoint, such as with a command like "watch curl -v http://localhost:9080/health/ready"
Observe that one of the returns is a 200/UP even though the application is not fully started

Expected behavior
The /health/ready endpoint should not return 200/UP before the application has started.

Diagnostic information:

OpenLiberty Version: 23.0.0.3
Affected feature(s): mpHealth-2.2
Java Version:
java version "1.8.0_381"
Java(TM) SE Runtime Environment (build 8.0.8.10 - pxa6480sr8fp10-20230703_02(SR8 FP10))
IBM J9 VM (build 2.9, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20230628_53798 (JIT enabled, AOT enabled)
OpenJ9 - a962f72
OMR - 40dbd2d
IBM - 696e9df)
JCL - 20230630_01 based on Oracle jdk8u381-b09
server.xml configuration: Unable to provide, can be provided via Slack if required
If it would be useful, upload the messages.log file found in $WLP_OUTPUT_DIR/messages.log : Provided to Prashanth via Slack.

The text was updated successfully, but these errors were encountered:

tonyreigns · 2023-10-02T21:17:07Z

After further investigations, we've found that one application was taking longer than the other applications to start. After speaking with the kernel team, we believe that due to the significant number of applications being started, the thread pool is becoming exhausted so the context listeners that we use to check for the server status aren't being called in time. The server is reporting that it was succesfully started in 41 seconds before the final application was started which took 87 seconds.

There's a configuration attribute(startTimeout) that can be set in the server.xml which will delay Liberty from reporting a successful start until all the applications have been started.

Adding <applicationManager startTimeout="120s"/> to the server.xml is all that's required. Setting the startTimeout parameter to a longer value will not have a negative impact if the application takes less time to start.

startTimeout is the supported solution in Liberty to handle situations like this with slow starting applications as this will prevent false positive responses from mpHealth.

tonyreigns · 2024-03-04T19:13:02Z

This issue has been resolved and merged. I'll leave an additional comment to confirm which fixpack will contain this fix.

tonyreigns · 2024-03-14T13:18:33Z

Fixpack 24.0.0.3 will contain this fix.

keithper added the release bug This bug is present in a released version of Open Liberty label Sep 6, 2023

pgunapal added the team:Lumberjack label Sep 6, 2023

LibbyBot added the Needs member attention label Sep 6, 2023

donbourne assigned tonyreigns Sep 26, 2023

pgunapal added this to the [Iteration 23.20] Sep 25 - Oct 6 milestone Sep 27, 2023

fmhwong modified the milestones: [Iteration 23.20] Sep 25 - Oct 6, [Iteration 23.21] Oct 9 - Oct 20 Oct 12, 2023

fmhwong modified the milestones: [Iteration 23.21] Oct 9 - Oct 20, [Iteration 23.22] Oct 23 - Nov 3 Oct 24, 2023

fmhwong modified the milestones: [Iteration 23.22] Oct 23 - Nov 3, [Iteration 23.23] Nov 6 - Nov 17 Nov 6, 2023

fmhwong modified the milestones: [Iteration 23.23] Nov 6 - Nov 17, [Iteration 23.24] Nov 20 - Dec 1 Nov 20, 2023

fmhwong modified the milestones: [Iteration 23.24] Nov 20 - Dec 1, [Iteration 23.25] Dec 4 - Dec 15 Dec 5, 2023

fmhwong modified the milestones: [Iteration 23.25] Dec 4 - Dec 15, [Iteration 24.1] Jan 1 - Jan 19 Jan 2, 2024

fmhwong modified the milestones: [Iteration 24.1] Jan 1 - Jan 19, [Iteration 24.2] Jan 22 - Feb 2 Jan 23, 2024

fmhwong modified the milestones: [Iteration 24.2] Jan 22 - Feb 2, [Iteration 24.3] Feb 5 - Feb 16 Feb 6, 2024

fmhwong modified the milestones: [Iteration 24.3] Feb 5 - Feb 16, [Iteration 24.4] Feb 19 - Mar 1 Feb 22, 2024

tonyreigns mentioned this issue Mar 4, 2024

Using configAdmin to find undetected apps for mpHealth #27360

Merged

tonyreigns closed this as completed in #27360 Mar 4, 2024

fmhwong modified the milestones: [Iteration 24.4] Feb 19 - Mar 1, [Iteration 24.5] Mar 4 - Mar 15 Mar 4, 2024

LibbyBot added the release:24003 label Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mpHealth-2.2 responds with a status UP briefly during startup #26195

mpHealth-2.2 responds with a status UP briefly during startup #26195

keithper commented Sep 6, 2023

tonyreigns commented Oct 2, 2023 •

edited by pgunapal

Loading

tonyreigns commented Mar 4, 2024

tonyreigns commented Mar 14, 2024

mpHealth-2.2 responds with a status UP briefly during startup #26195

mpHealth-2.2 responds with a status UP briefly during startup #26195

Comments

keithper commented Sep 6, 2023

tonyreigns commented Oct 2, 2023 • edited by pgunapal Loading

tonyreigns commented Mar 4, 2024

tonyreigns commented Mar 14, 2024

tonyreigns commented Oct 2, 2023 •

edited by pgunapal

Loading