Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hook DataSource validation into MP Health check by default #10021

Closed
aguibert opened this issue Dec 4, 2019 · 16 comments
Closed

Hook DataSource validation into MP Health check by default #10021

aguibert opened this issue Dec 4, 2019 · 16 comments
Labels
design-issue Epic Used to track Feature Epics that are following the UFO process in:JDBC/JCA in:MicroProfile/Health team:Zombie Apocalypse

Comments

@aguibert
Copy link
Contributor

aguibert commented Dec 4, 2019

Goal

Combine MP Health checks and the recently delivered Validator REST API for DataSources so that user applications have more meaningful health checks out of the box by testing DB connections before claiming that an application is ready/healthy.

Background

When initially working on the Validator feature I had this idea and asked our MP Health SME if it was good practice to have health checks call out to external resources or not and was told "no". However, since that discussion I have seen several sources that indicate it is an acceptable thing to do.

  1. IETF Health Check (draft) section 5 shows an example of including response time and number of connections to a Cassandra DB
  2. Quarkus datasource health check is enabled by default
  3. IBM Garage method: Health checks says:

At minimum, a health check API is a separate REST service that is implemented within a microservice component that quickly returns the operational status of the service and an indication of its ability to connect to downstream dependent services.

Proposed solution

Create a jdbc + mpHealth auto feature that enables health checks for all configured datasources by default. Similar to the IETF draft, the health check could include 2 components:

  1. Validate that a connection could be made
  2. Validate that the connection pool is below 100% capacity (if MP Health had a WARN function like the EITF draft suggests, we could warn for a threshold such as 80-99%)

If health checks are enabled by default, this raises a few concerns:

  1. Zero migration. Customers that have servers with jdbc + mpHealth today could have their health checks start to fail after updating to a newer version of Liberty if they had invalid datasources in their configuration. Options here could be:
    A. Wait for new version of mpHealth and key the auto feature off of that. MP Health 2.2 should be coming out in the next few months
    B. Have datasource health checks disabled by default (not ideal)
  2. False negatives. We can easily simulate connectivity for container-managed-auth resources (which is the dominant use case), but we cannot simulate connectivity for app-managed-auth resources (e.g. app code does ds.getConnection(user, pass)) ootb because it requires user input. As discussed in the comments, this may be an acceptable limitation because of the prevalence of container-auth
  3. The ability to disable the datasource health checks. Some solutions here could include:
    A. Adding a <dataSource checkHealth="true|false> attribute. If mpHealth is not enabled then this attribute is ignored
    B. Add some new metatype for enabling/disabling health checks by element ID. For example:
<server>
   <!-- features... -->
  <dataSource id="myDS"/>
  <mpHealth disabledChecks="myDS"/>
</server>
@frowe
Copy link
Member

frowe commented Dec 4, 2019

I like the idea. I'm a bit leery of enabled by default for current version of mpHealth. For a new version of mpHelath, I could see making it the default. I'm also not sure of adding new attribute on dataSource element, but I like it better than the mpHealth equivalent because trying to get the element id for an unnamed dataSource is less than obvious

@aguibert
Copy link
Contributor Author

aguibert commented Dec 4, 2019

According to @pgunapal there is an MP Health 2.2 spec coming out soon (next few months)

@frowe
Copy link
Member

frowe commented Dec 4, 2019

That's reasonable. How would the auto feature know what exactly what REST URL to generate? For example, I'm using auth=application, how would it know what to put in the X-Validation-User and X-Validation-Password headers?
Since there is a default instance of DefaultDataSource, wouldn't it be flagged as unhealthy unless the admin overrides the dataSource def and hooked it to a valid destination? It would seem confusing to a user to have a datasource that is flagged as unhealthy, but is not in their server.xml.

@aguibert
Copy link
Contributor Author

aguibert commented Dec 4, 2019

The auto feature would probably call an internal SPI instead of looping back to itself via an actual HTTP call, but it would have to do the default validation mode of auth=container, which covers >80% I expect. If a user does need app-auth, then they can disable the built-in check and write their own health check using MP Health API.

Good point about the DefaultDataSource, that would contaminate the health checks for a lot of valid configurations. Perhaps we can rework how we do DefaultDataSource so that it doesn't use a partial defaultInstances.xml and instead we insert the javaCompDefaultName attribute at runtime if the user has actually configured a DefaultDataSource

@njr-11
Copy link
Contributor

njr-11 commented Dec 4, 2019

This could apply to other connection factories (JMS, generic JCA) as well, and possibly to other resource types such as Cloudant, just as the validator API does.

I like the idea of validating by default, but the scenarios with application authentication and in the case of container authentication where the bindings identify the proper auth data/login module will be a problem. The default health check indicating these resources are bad when they actually work perfectly fine in the app will look really bad. Any idea how other implementations which have it default (Quarkus) behave here?

@aguibert
Copy link
Contributor Author

aguibert commented Dec 4, 2019

In Quarkus there isn't really any separation between "app config" or "server config", there is just an application.properties (also similar to Vertx and Spring Boot) where you can configure your datasource. Also, in Quarkus they only inject Datasources with CDI @Inject, not with Java EE @Resource so the user doesn't really have a choice for app auth vs. container auth.

I agree that there isn't a good solution for app auth here (aside from disabling the check), but I believe container-auth use case is dominant enough that enabling this ootb is justified. Additionally, I don't think we should attempt to complicate this proposal by trying to accommodate the app auth scenario, because for these instances users could simply disable the built-in check and write their own check by implement the MP Health interface inside their app.

@frowe
Copy link
Member

frowe commented Dec 4, 2019

But it's not all container auth, it would be another subset, because of the potential for resource bindings as mentioned by Nathan

@njr-11
Copy link
Contributor

njr-11 commented Dec 4, 2019

Even with CDI inject of a DataSource, unless they have gone out of the way to reject the datasource.getConnection(user, password) method, there is a way for the Quarkus application to get what is, in effect, the equivalent of Liberty's application authentication. What happens in this case? Does Quarkus report these data sources as invalid? Or somehow skip over them?

@aguibert
Copy link
Contributor Author

aguibert commented Dec 4, 2019

Quarkus only tests datasource.getConnection(), so it would report the data source as invalid. Here is their health check code for reference:
https://github.com/quarkusio/quarkus/blob/master/extensions/agroal/runtime/src/main/java/io/quarkus/agroal/runtime/health/DataSourceHealthCheck.java#L43

@frowe
Copy link
Member

frowe commented Dec 4, 2019

I could be convinced to leave aside app auth and container auth with bindings, IFF, it is clearly documented that the health check only works for container auth where the credentials are somehow configured in server.xml (like uid/pwd, auth alias, URL, etc). We get way too many skill cases where customer is complaining "test connection fails, but my kerberos datasource works from app", "test connection fails from dmgr", etc, because we only intended to support a subset of cases, but never documented those limitations.

@aguibert
Copy link
Contributor Author

aguibert commented Dec 4, 2019

I believe a lot of those scenarios are solved with the default liberty implementation of datasource validation (including server-defined custom login modules and I think kerberos too) but I agree with your point about documenting limitations.

@NottyCode NottyCode added this to Backlog in Design Issues Jan 21, 2020
@scottkurz
Copy link
Member

Do we see this as more of a readiness or liveness check? I'm thinking readiness, since there's a decent chance the app container can recover from certain conditions (e.g. temporarily out of connections).

@aguibert
Copy link
Contributor Author

@scottkurz agreed, Readiness seems more appropriate here (and that is also what Quarkus classifies the check as also)

@tevans78
Copy link
Member

tevans78 commented Dec 8, 2020

Agreed that this should be implemented as a generic framework for anything that plugs into the validation mechanisms. Should be put on the backlog and prioritized.

@tevans78 tevans78 added the Epic Used to track Feature Epics that are following the UFO process label Dec 8, 2020
@tevans78 tevans78 moved this from Backlog to Needs Implemention in Design Issues Dec 8, 2020
@NottyCode NottyCode moved this from New to Foundation in Open Liberty Roadmap Jan 4, 2021
@frowe
Copy link
Member

frowe commented Jan 16, 2023

@malincoln Per your request for status, this issue is not currently being worked, and based on it's relative priority, I'd guess it will start after completion of Prepopulation of DB connections

@cbridgha
Copy link
Member

After evaluating the relative priorities and constraints this item is not likely to complete in next 2 years, we are going to close for now - we can revisit if needed.

@cbridgha cbridgha removed this from Core Runtime in Open Liberty Roadmap May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design-issue Epic Used to track Feature Epics that are following the UFO process in:JDBC/JCA in:MicroProfile/Health team:Zombie Apocalypse
Projects
Design Issues
Needs Implemention
Development

No branches or pull requests

6 participants