Hook DataSource validation into MP Health check by default #10021

aguibert · 2019-12-04T16:57:20Z

Goal

Combine MP Health checks and the recently delivered Validator REST API for DataSources so that user applications have more meaningful health checks out of the box by testing DB connections before claiming that an application is ready/healthy.

Background

When initially working on the Validator feature I had this idea and asked our MP Health SME if it was good practice to have health checks call out to external resources or not and was told "no". However, since that discussion I have seen several sources that indicate it is an acceptable thing to do.

IETF Health Check (draft) section 5 shows an example of including response time and number of connections to a Cassandra DB
Quarkus datasource health check is enabled by default
IBM Garage method: Health checks says:

At minimum, a health check API is a separate REST service that is implemented within a microservice component that quickly returns the operational status of the service and an indication of its ability to connect to downstream dependent services.

Proposed solution

Create a jdbc + mpHealth auto feature that enables health checks for all configured datasources by default. Similar to the IETF draft, the health check could include 2 components:

Validate that a connection could be made
Validate that the connection pool is below 100% capacity (if MP Health had a WARN function like the EITF draft suggests, we could warn for a threshold such as 80-99%)

If health checks are enabled by default, this raises a few concerns:

Zero migration. Customers that have servers with jdbc + mpHealth today could have their health checks start to fail after updating to a newer version of Liberty if they had invalid datasources in their configuration. Options here could be:
A. Wait for new version of mpHealth and key the auto feature off of that. MP Health 2.2 should be coming out in the next few months
B. Have datasource health checks disabled by default (not ideal)
False negatives. We can easily simulate connectivity for container-managed-auth resources (which is the dominant use case), but we cannot simulate connectivity for app-managed-auth resources (e.g. app code does ds.getConnection(user, pass)) ootb because it requires user input. As discussed in the comments, this may be an acceptable limitation because of the prevalence of container-auth
The ability to disable the datasource health checks. Some solutions here could include:
A. Adding a <dataSource checkHealth="true|false> attribute. If mpHealth is not enabled then this attribute is ignored
B. Add some new metatype for enabling/disabling health checks by element ID. For example:

<server>
   <!-- features... -->
  <dataSource id="myDS"/>
  <mpHealth disabledChecks="myDS"/>
</server>

The text was updated successfully, but these errors were encountered:

frowe · 2019-12-04T17:07:20Z

I like the idea. I'm a bit leery of enabled by default for current version of mpHealth. For a new version of mpHelath, I could see making it the default. I'm also not sure of adding new attribute on dataSource element, but I like it better than the mpHealth equivalent because trying to get the element id for an unnamed dataSource is less than obvious

aguibert · 2019-12-04T17:49:55Z

According to @pgunapal there is an MP Health 2.2 spec coming out soon (next few months)

frowe · 2019-12-04T17:56:59Z

That's reasonable. How would the auto feature know what exactly what REST URL to generate? For example, I'm using auth=application, how would it know what to put in the X-Validation-User and X-Validation-Password headers?
Since there is a default instance of DefaultDataSource, wouldn't it be flagged as unhealthy unless the admin overrides the dataSource def and hooked it to a valid destination? It would seem confusing to a user to have a datasource that is flagged as unhealthy, but is not in their server.xml.

aguibert · 2019-12-04T18:18:36Z

The auto feature would probably call an internal SPI instead of looping back to itself via an actual HTTP call, but it would have to do the default validation mode of auth=container, which covers >80% I expect. If a user does need app-auth, then they can disable the built-in check and write their own health check using MP Health API.

Good point about the DefaultDataSource, that would contaminate the health checks for a lot of valid configurations. Perhaps we can rework how we do DefaultDataSource so that it doesn't use a partial defaultInstances.xml and instead we insert the javaCompDefaultName attribute at runtime if the user has actually configured a DefaultDataSource

njr-11 · 2019-12-04T18:23:13Z

This could apply to other connection factories (JMS, generic JCA) as well, and possibly to other resource types such as Cloudant, just as the validator API does.

I like the idea of validating by default, but the scenarios with application authentication and in the case of container authentication where the bindings identify the proper auth data/login module will be a problem. The default health check indicating these resources are bad when they actually work perfectly fine in the app will look really bad. Any idea how other implementations which have it default (Quarkus) behave here?

aguibert · 2019-12-04T18:29:40Z

In Quarkus there isn't really any separation between "app config" or "server config", there is just an application.properties (also similar to Vertx and Spring Boot) where you can configure your datasource. Also, in Quarkus they only inject Datasources with CDI @Inject, not with Java EE @Resource so the user doesn't really have a choice for app auth vs. container auth.

I agree that there isn't a good solution for app auth here (aside from disabling the check), but I believe container-auth use case is dominant enough that enabling this ootb is justified. Additionally, I don't think we should attempt to complicate this proposal by trying to accommodate the app auth scenario, because for these instances users could simply disable the built-in check and write their own check by implement the MP Health interface inside their app.

frowe · 2019-12-04T18:34:39Z

But it's not all container auth, it would be another subset, because of the potential for resource bindings as mentioned by Nathan

njr-11 · 2019-12-04T18:35:36Z

Even with CDI inject of a DataSource, unless they have gone out of the way to reject the datasource.getConnection(user, password) method, there is a way for the Quarkus application to get what is, in effect, the equivalent of Liberty's application authentication. What happens in this case? Does Quarkus report these data sources as invalid? Or somehow skip over them?

aguibert · 2019-12-04T18:39:24Z

Quarkus only tests datasource.getConnection(), so it would report the data source as invalid. Here is their health check code for reference:
https://github.com/quarkusio/quarkus/blob/master/extensions/agroal/runtime/src/main/java/io/quarkus/agroal/runtime/health/DataSourceHealthCheck.java#L43

frowe · 2019-12-04T18:53:36Z

I could be convinced to leave aside app auth and container auth with bindings, IFF, it is clearly documented that the health check only works for container auth where the credentials are somehow configured in server.xml (like uid/pwd, auth alias, URL, etc). We get way too many skill cases where customer is complaining "test connection fails, but my kerberos datasource works from app", "test connection fails from dmgr", etc, because we only intended to support a subset of cases, but never documented those limitations.

aguibert · 2019-12-04T19:07:10Z

I believe a lot of those scenarios are solved with the default liberty implementation of datasource validation (including server-defined custom login modules and I think kerberos too) but I agree with your point about documenting limitations.

scottkurz · 2020-07-23T14:28:34Z

Do we see this as more of a readiness or liveness check? I'm thinking readiness, since there's a decent chance the app container can recover from certain conditions (e.g. temporarily out of connections).

aguibert · 2020-07-23T14:56:17Z

@scottkurz agreed, Readiness seems more appropriate here (and that is also what Quarkus classifies the check as also)

tevans78 · 2020-12-08T15:43:25Z

Agreed that this should be implemented as a generic framework for anything that plugs into the validation mechanisms. Should be put on the backlog and prioritized.

frowe · 2023-01-16T14:19:15Z

@malincoln Per your request for status, this issue is not currently being worked, and based on it's relative priority, I'd guess it will start after completion of Prepopulation of DB connections

cbridgha · 2023-05-30T13:33:13Z

After evaluating the relative priorities and constraints this item is not likely to complete in next 2 years, we are going to close for now - we can revisit if needed.

aguibert added team:Zombie Apocalypse design-issue in:JDBC/JCA in:MicroProfile/Health labels Dec 4, 2019

aguibert self-assigned this Dec 4, 2019

NottyCode added this to Backlog in Design Issues Jan 21, 2020

scottkurz mentioned this issue Sep 22, 2020

Add health check sample OpenLiberty/devfile-stack-samples#5

Merged

tevans78 added the Epic Used to track Feature Epics that are following the UFO process label Dec 8, 2020

tevans78 moved this from Backlog to Needs Implemention in Design Issues Dec 8, 2020

jhanders34 added this to New in Open Liberty Roadmap Jan 4, 2021

NottyCode moved this from New to Foundation in Open Liberty Roadmap Jan 4, 2021

scottkurz mentioned this issue Jan 6, 2021

Leverage Open Liberty DB connection validation to JPA template OpenLiberty/devfile-stack#106

Closed

NottyCode unassigned aguibert Mar 16, 2021

cbridgha closed this as completed May 30, 2023

cbridgha removed this from Core Runtime in Open Liberty Roadmap May 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hook DataSource validation into MP Health check by default #10021

Hook DataSource validation into MP Health check by default #10021

aguibert commented Dec 4, 2019 •

edited

frowe commented Dec 4, 2019 •

edited

aguibert commented Dec 4, 2019

frowe commented Dec 4, 2019 •

edited

aguibert commented Dec 4, 2019

njr-11 commented Dec 4, 2019

aguibert commented Dec 4, 2019

frowe commented Dec 4, 2019

njr-11 commented Dec 4, 2019 •

edited

aguibert commented Dec 4, 2019

frowe commented Dec 4, 2019

aguibert commented Dec 4, 2019

scottkurz commented Jul 23, 2020

aguibert commented Jul 23, 2020

tevans78 commented Dec 8, 2020

frowe commented Jan 16, 2023 •

edited

cbridgha commented May 30, 2023

Hook DataSource validation into MP Health check by default #10021

Hook DataSource validation into MP Health check by default #10021

Comments

aguibert commented Dec 4, 2019 • edited

Goal

Background

Proposed solution

frowe commented Dec 4, 2019 • edited

aguibert commented Dec 4, 2019

frowe commented Dec 4, 2019 • edited

aguibert commented Dec 4, 2019

njr-11 commented Dec 4, 2019

aguibert commented Dec 4, 2019

frowe commented Dec 4, 2019

njr-11 commented Dec 4, 2019 • edited

aguibert commented Dec 4, 2019

frowe commented Dec 4, 2019

aguibert commented Dec 4, 2019

scottkurz commented Jul 23, 2020

aguibert commented Jul 23, 2020

tevans78 commented Dec 8, 2020

frowe commented Jan 16, 2023 • edited

cbridgha commented May 30, 2023

aguibert commented Dec 4, 2019 •

edited

frowe commented Dec 4, 2019 •

edited

frowe commented Dec 4, 2019 •

edited

njr-11 commented Dec 4, 2019 •

edited

frowe commented Jan 16, 2023 •

edited