Improve DatabaseHealthCheck robustness and connection cleanup#34942
Conversation
…n leak anti-patterns #34839 - Replace DbConnectionFactory.getConnection() with getDataSource().getConnection() to bypass ThreadLocal caching and test actual database connectivity - Switch from prepareStatement("SELECT 1") to JDBC4 conn.isValid(2) for efficient health validation without statement allocation - Add interrupt-resilient connection cleanup matching wrapConnection() pattern to prevent orphaned connections when shutdownNow() fires during close() - Replace shutdownNow() with graceful shutdown→awaitTermination→shutdownNow fallback to allow in-flight cleanup to complete https://claude.ai/code/session_01KKx8BrFmhNVAA4JSTvhDXu
wezell
left a comment
There was a problem hiding this comment.
fine. lotta logic to run a test query.
wezell
left a comment
There was a problem hiding this comment.
Seems like a lot of code to run a db query.
…d configurable timeout #34839 Replace manual interrupt-resilient cleanup with try-with-resources (safe here because getDataSource().getConnection() bypasses ThreadLocal — no deeper code relies on this connection). Drop redundant isValid() since HikariCP validates on borrow. Make timeout configurable via health.check.database.timeout.seconds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fair point — the code isn't dramatically shorter, but each piece now has a clear reason for being there, and the accidental complexity is gone. The previous version manually managed connection cleanup with a Thread.interrupted() / It's interactions with the ThreadLocal connection (via DbConnectionFactory.getConnection()) that make try-with-resources tricky — if code called at a deeper level expects that same connection to still be open, closing it out from Kept isValid() as the validation — while HikariCP validates connections on borrow, it skips validation for connections idle less than 500ms and doesn't validate newly created connections. Since the health check's job is to confirm The timeout is now configurable via health.check.database.timeout.seconds (default 2s). This is intentionally lower than HikariCP's connectionTimeout (default 5s) — the health check should fail fast and report status quickly, not Also replaced the per-call Executors.newSingleThreadExecutor() with HealthCheckUtils.executeWithTimeout() — a shared cached thread pool with daemon threads, the same pattern used by CacheHealthCheck and DatabaseHealthEventManager. This also aligns with the direction in #34832 — once keepaliveTime and tcpKeepAlive land, HikariCP will proactively validate idle connections, further strengthening the signal from a simple borrow + isValid() check. |
… SingleThreadExecutor #34839 Replace per-call Executors.newSingleThreadExecutor() with the shared HealthCheckUtils.executeWithTimeout() cached thread pool — same pattern used by CacheHealthCheck and DatabaseHealthEventManager. Eliminates thread allocation on every health check poll and the manual executor shutdown/awaitTermination boilerplate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
### Proposed Changes
* Replace `prepareStatement("SELECT 1")` with JDBC4 `isValid()` for more
efficient database health checks
* Bypass ThreadLocal connection caching by calling
`getDataSource().getConnection()` directly to test actual pool
connectivity
* Implement interrupt-resilient connection cleanup that preserves thread
interrupt status during timeout scenarios
* Replace aggressive `shutdownNow()` with graceful `shutdown()` followed
by `awaitTermination()` to allow in-flight cleanup to complete
* Add comprehensive comments explaining the rationale for connection
handling and executor shutdown strategy
* Organize imports alphabetically for consistency
### Rationale
The previous implementation had several issues:
1. **Inefficient health check**: `prepareStatement("SELECT 1")`
allocates unnecessary statement resources, while `isValid()` is the
standard JDBC4 approach that pgjdbc implements without statement
allocation
2. **Cached connection testing**: Using
`DbConnectionFactory.getConnection()` may return cached ThreadLocal
connections, not testing actual pool availability
3. **Connection leak risk**: Aggressive `shutdownNow()` could interrupt
connection cleanup, potentially orphaning connections when health checks
timeout
4. **Thread interrupt loss**: The previous cleanup didn't preserve the
interrupted flag, which could mask timeout conditions in calling code
### Checklist
- [ ] Tests
- [ ] Translations
- [ ] Security Implications Contemplated
### Additional Info
The changes maintain backward compatibility while improving reliability.
The health check now:
- Tests actual database pool connectivity rather than cached state
- Uses the standard JDBC4 validation method
- Properly handles thread interruption during executor shutdown
- Allows graceful cleanup before forcing termination
No functional behavior changes from a user perspective—the health check
still validates database availability, just more robustly.
https://claude.ai/code/session_01KKx8BrFmhNVAA4JSTvhDXu
This PR fixes: #34839
---------
Co-authored-by: Claude <noreply@anthropic.com>
Proposed Changes
prepareStatement("SELECT 1")with JDBC4isValid()for more efficient database health checksgetDataSource().getConnection()directly to test actual pool connectivityshutdownNow()with gracefulshutdown()followed byawaitTermination()to allow in-flight cleanup to completeRationale
The previous implementation had several issues:
prepareStatement("SELECT 1")allocates unnecessary statement resources, whileisValid()is the standard JDBC4 approach that pgjdbc implements without statement allocationDbConnectionFactory.getConnection()may return cached ThreadLocal connections, not testing actual pool availabilityshutdownNow()could interrupt connection cleanup, potentially orphaning connections when health checks timeoutChecklist
Additional Info
The changes maintain backward compatibility while improving reliability. The health check now:
No functional behavior changes from a user perspective—the health check still validates database availability, just more robustly.
https://claude.ai/code/session_01KKx8BrFmhNVAA4JSTvhDXu
This PR fixes: #34839