Skip to content

Improve DatabaseHealthCheck robustness and connection cleanup#34942

Merged
wezell merged 3 commits into
mainfrom
claude/fix-issue-34839-xXIgF
Mar 13, 2026
Merged

Improve DatabaseHealthCheck robustness and connection cleanup#34942
wezell merged 3 commits into
mainfrom
claude/fix-issue-34839-xXIgF

Conversation

@spbolton
Copy link
Copy Markdown
Contributor

@spbolton spbolton commented Mar 11, 2026

Proposed Changes

  • Replace prepareStatement("SELECT 1") with JDBC4 isValid() for more efficient database health checks
  • Bypass ThreadLocal connection caching by calling getDataSource().getConnection() directly to test actual pool connectivity
  • Implement interrupt-resilient connection cleanup that preserves thread interrupt status during timeout scenarios
  • Replace aggressive shutdownNow() with graceful shutdown() followed by awaitTermination() to allow in-flight cleanup to complete
  • Add comprehensive comments explaining the rationale for connection handling and executor shutdown strategy
  • Organize imports alphabetically for consistency

Rationale

The previous implementation had several issues:

  1. Inefficient health check: prepareStatement("SELECT 1") allocates unnecessary statement resources, while isValid() is the standard JDBC4 approach that pgjdbc implements without statement allocation
  2. Cached connection testing: Using DbConnectionFactory.getConnection() may return cached ThreadLocal connections, not testing actual pool availability
  3. Connection leak risk: Aggressive shutdownNow() could interrupt connection cleanup, potentially orphaning connections when health checks timeout
  4. Thread interrupt loss: The previous cleanup didn't preserve the interrupted flag, which could mask timeout conditions in calling code

Checklist

  • Tests
  • Translations
  • Security Implications Contemplated

Additional Info

The changes maintain backward compatibility while improving reliability. The health check now:

  • Tests actual database pool connectivity rather than cached state
  • Uses the standard JDBC4 validation method
  • Properly handles thread interruption during executor shutdown
  • Allows graceful cleanup before forcing termination

No functional behavior changes from a user perspective—the health check still validates database availability, just more robustly.

https://claude.ai/code/session_01KKx8BrFmhNVAA4JSTvhDXu

This PR fixes: #34839

…n leak anti-patterns #34839

- Replace DbConnectionFactory.getConnection() with getDataSource().getConnection()
  to bypass ThreadLocal caching and test actual database connectivity
- Switch from prepareStatement("SELECT 1") to JDBC4 conn.isValid(2) for
  efficient health validation without statement allocation
- Add interrupt-resilient connection cleanup matching wrapConnection() pattern
  to prevent orphaned connections when shutdownNow() fires during close()
- Replace shutdownNow() with graceful shutdown→awaitTermination→shutdownNow
  fallback to allow in-flight cleanup to complete

https://claude.ai/code/session_01KKx8BrFmhNVAA4JSTvhDXu
Comment thread dotCMS/src/main/java/com/dotcms/health/checks/cdi/DatabaseHealthCheck.java Outdated
Copy link
Copy Markdown
Member

@wezell wezell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

Copy link
Copy Markdown
Member

@wezell wezell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine. lotta logic to run a test query.

Copy link
Copy Markdown
Member

@wezell wezell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a lot of code to run a db query.

…d configurable timeout #34839

Replace manual interrupt-resilient cleanup with try-with-resources (safe here
because getDataSource().getConnection() bypasses ThreadLocal — no deeper code
relies on this connection). Drop redundant isValid() since HikariCP validates
on borrow. Make timeout configurable via health.check.database.timeout.seconds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@spbolton
Copy link
Copy Markdown
Contributor Author

spbolton commented Mar 12, 2026

Seems like a lot of code to run a db query.

Fair point — the code isn't dramatically shorter, but each piece now has a clear reason for being there, and the accidental complexity is gone. The previous version manually managed connection cleanup with a Thread.interrupted() /
restore pattern to prevent pgjdbc from throwing during close() on an interrupted thread. But try-with-resources works fine here because we're calling getDataSource().getConnection() directly and nothing inside the try block calls
deeper code that relies on this connection. We just borrow it from the pool, validate, and release it — close() returns the HikariCP proxy straight back to the pool.

It's interactions with the ThreadLocal connection (via DbConnectionFactory.getConnection()) that make try-with-resources tricky — if code called at a deeper level expects that same connection to still be open, closing it out from
under them breaks the flow. That doesn't apply here since no other code touches this connection.

Kept isValid() as the validation — while HikariCP validates connections on borrow, it skips validation for connections idle less than 500ms and doesn't validate newly created connections. Since the health check's job is to confirm
the database can actually process requests (not just that a TCP connection exists), isValid() closes that gap. pgjdbc implements it with the empty query protocol — one round-trip, no statement allocation.

The timeout is now configurable via health.check.database.timeout.seconds (default 2s). This is intentionally lower than HikariCP's connectionTimeout (default 5s) — the health check should fail fast and report status quickly, not
wait as long as a normal request would. If the pool can't hand out and validate a connection within 2s, that's already a signal for the readiness probe to remove the pod from load balancing.

Also replaced the per-call Executors.newSingleThreadExecutor() with HealthCheckUtils.executeWithTimeout() — a shared cached thread pool with daemon threads, the same pattern used by CacheHealthCheck and DatabaseHealthEventManager.
This eliminates thread allocation on every health check poll and the manual executor shutdown/awaitTermination boilerplate that was needed to prevent pgjdbc interrupt-during-close issues (#34490). The shared pool reuses threads
across polls and lets them expire after 60s idle.

This also aligns with the direction in #34832 — once keepaliveTime and tcpKeepAlive land, HikariCP will proactively validate idle connections, further strengthening the signal from a simple borrow + isValid() check.

… SingleThreadExecutor #34839

Replace per-call Executors.newSingleThreadExecutor() with the shared
HealthCheckUtils.executeWithTimeout() cached thread pool — same pattern
used by CacheHealthCheck and DatabaseHealthEventManager. Eliminates
thread allocation on every health check poll and the manual executor
shutdown/awaitTermination boilerplate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@dcolina dcolina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏽

@wezell wezell added this pull request to the merge queue Mar 13, 2026
Merged via the queue into main with commit f7f60ef Mar 13, 2026
43 checks passed
@wezell wezell deleted the claude/fix-issue-34839-xXIgF branch March 13, 2026 15:16
spbolton added a commit that referenced this pull request Mar 24, 2026
### Proposed Changes
* Replace `prepareStatement("SELECT 1")` with JDBC4 `isValid()` for more
efficient database health checks
* Bypass ThreadLocal connection caching by calling
`getDataSource().getConnection()` directly to test actual pool
connectivity
* Implement interrupt-resilient connection cleanup that preserves thread
interrupt status during timeout scenarios
* Replace aggressive `shutdownNow()` with graceful `shutdown()` followed
by `awaitTermination()` to allow in-flight cleanup to complete
* Add comprehensive comments explaining the rationale for connection
handling and executor shutdown strategy
* Organize imports alphabetically for consistency

### Rationale
The previous implementation had several issues:
1. **Inefficient health check**: `prepareStatement("SELECT 1")`
allocates unnecessary statement resources, while `isValid()` is the
standard JDBC4 approach that pgjdbc implements without statement
allocation
2. **Cached connection testing**: Using
`DbConnectionFactory.getConnection()` may return cached ThreadLocal
connections, not testing actual pool availability
3. **Connection leak risk**: Aggressive `shutdownNow()` could interrupt
connection cleanup, potentially orphaning connections when health checks
timeout
4. **Thread interrupt loss**: The previous cleanup didn't preserve the
interrupted flag, which could mask timeout conditions in calling code

### Checklist
- [ ] Tests
- [ ] Translations
- [ ] Security Implications Contemplated

### Additional Info
The changes maintain backward compatibility while improving reliability.
The health check now:
- Tests actual database pool connectivity rather than cached state
- Uses the standard JDBC4 validation method
- Properly handles thread interruption during executor shutdown
- Allows graceful cleanup before forcing termination

No functional behavior changes from a user perspective—the health check
still validates database availability, just more robustly.

https://claude.ai/code/session_01KKx8BrFmhNVAA4JSTvhDXu

This PR fixes: #34839

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

5 participants