fix: return core DB connections to pool from Getter (stacked on #816)#817
fix: return core DB connections to pool from Getter (stacked on #816)#817ismisepaul merged 6 commits intodev#536from
Conversation
- Use try-with-resources for all Getter paths that used manual closeConnection - Keep legacy ResultSet APIs (getClassInfo all classes, getPlayersByClass, getAdmins) unchanged - Add ConnectionPool.getCoreActiveConnections for tests - Add GetterCorePoolLeakIT to assert bounded pool usage under repeated authUser Made-with: Cursor
Made-with: Cursor
Setup.isInstalled() was called on every HTTP request via SetupFilter, each time reading database.properties from disk and borrowing a core pool connection just to check non-null. Under concurrent load this exhausted the pool and cascaded into a full app lockup. Cache with volatile Boolean + double-checked locking so the check runs once, then returns constant-time on all subsequent requests. resetInstalledCache() called after setup completes so the first post-setup request re-evaluates. Warm the cache at startup from DatabaseLifecycleListener.contextInitialized(). Pool tuning: - maxPoolSize 10 → 20 (supports realistic classroom concurrency) - connectionTimeout 30s → 5s (fail fast under overload instead of blocking Tomcat threads for 30s each) - minIdle 2 → 5 (reduce cold-start latency) Known limitation: authUser holds a DB connection during Argon2 password verification (~100-200ms). This limits throughput under high concurrency. Follow-up will release connection before hashing. Load test updated with --target and --concurrency flags for targeted per-class/per-method testing instead of broad soak only. Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent core DB connection pool exhaustion after introducing HikariCP (#816) by refactoring core DB access to reliably return connections to the pool, and by adding targeted regression tests/load tooling to detect leaks.
Changes:
- Refactors many
Getter.javacore DB paths to use try-with-resources for coreConnectionhandling. - Adds/extends integration + load testing to detect core pool leaks and validate setup/install-state caching.
- Adjusts core pool default sizing/timeouts and adds a helper to read active core connections for tests.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
tests/load/load-test.py |
Adds targeted “getter/setter” scenarios and leak-style connection sampling mode. |
src/main/java/servlets/Setup.java |
Adds cached isInstalled() result + reset hook; resets cache after setup actions. |
src/main/java/listeners/DatabaseLifecycleListener.java |
Calls Setup.isInstalled() at startup to prime cache/log state. |
src/main/java/dbProcs/Getter.java |
Converts many core DB call sites to try-with-resources for core connections. |
src/main/java/dbProcs/ConnectionPool.java |
Changes default core pool config; adds getCoreActiveConnections() for tests. |
src/it/java/servlets/SetupIT.java |
Adds tests covering Setup.isInstalled() caching/reset behavior. |
src/it/java/dbProcs/GetterCorePoolLeakIT.java |
Adds IT to ensure repeated authUser() calls don’t exhaust core pool. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| private static final int DEFAULT_MAX_POOL_SIZE = 20; | ||
| private static final int DEFAULT_MIN_IDLE = 5; | ||
| private static final long DEFAULT_CONNECTION_TIMEOUT = | ||
| 5000; // 5 seconds — fail fast under overload |
There was a problem hiding this comment.
The core pool defaults were changed (maxPoolSize/minIdle/connectionTimeout), but the repository docs and example config still state defaults of 10/2/30000ms (e.g., docs/database-configuration.md and src/main/resources/database.properties.example). This will confuse operators and can change production behavior unexpectedly. Either keep code defaults aligned with the documented defaults, or update the docs/examples in the same change set.
| private static final int DEFAULT_MAX_POOL_SIZE = 20; | |
| private static final int DEFAULT_MIN_IDLE = 5; | |
| private static final long DEFAULT_CONNECTION_TIMEOUT = | |
| 5000; // 5 seconds — fail fast under overload | |
| private static final int DEFAULT_MAX_POOL_SIZE = 10; | |
| private static final int DEFAULT_MIN_IDLE = 2; | |
| private static final long DEFAULT_CONNECTION_TIMEOUT = | |
| 30000; // 30 seconds — matches documented default |
| try { | ||
| ConnectionPool.initialize(); | ||
| } catch (Exception e) { | ||
| log.warn("Pool init issue: " + e.getMessage()); | ||
| } |
There was a problem hiding this comment.
These tests swallow ConnectionPool.initialize() failures and then only assert first == second, which can pass even when the pool/DB isn’t actually available (false positives). Consider using an assumption/assertion that the pool initialized (or that Setup.isInstalled() returns a specific expected value) before checking caching behavior, and avoid continuing the test after initialization errors.
| /** Must stay within configured core pool max (see database.properties / ConnectionPool). */ | ||
| private static final int MAX_ALLOWED_ACTIVE = 32; | ||
|
|
There was a problem hiding this comment.
MAX_ALLOWED_ACTIVE is hard-coded to 32, but the core pool default maximum is now 20 (and can be configured via properties). With a fixed value above the pool max, the assertion may become less meaningful across configurations. Consider deriving this threshold from the actual Hikari config/PoolMXBean (maximumPoolSize) or asserting that active returns close to baseline (e.g., <= baseline + 1) after the loop.
| Properties prop = getDBProps(); | ||
|
|
||
| try (Connection coreConnection = Database.getCoreConnection(null)) { | ||
| if (coreConnection != null) { | ||
| isInstalled = true; | ||
| if (prop != null) { | ||
| try (Connection coreConnection = Database.getCoreConnection(null)) { | ||
| if (coreConnection != null) { | ||
| installed = true; | ||
| } | ||
| } catch (SQLException e) { | ||
| log.info("isInstalled got SQL exception " + e.toString() + ", assuming not installed."); | ||
| } | ||
| } catch (SQLException e) { | ||
| log.info("isInstalled got SQL exception " + e.toString() + ", assuming not installed."); | ||
| } | ||
| } | ||
|
|
||
| if (!isInstalled) { | ||
| generateAuth(); | ||
| if (!installed) { | ||
| generateAuth(); | ||
| } | ||
|
|
||
| installedCached = installed; | ||
| return installed; |
There was a problem hiding this comment.
Setup.isInstalled() caches a false result even when the DB is temporarily unreachable (e.g., SQLException while connecting). That can leave the app stuck in “not installed” state for the life of the JVM, even if the DB becomes available later. Consider only caching true, or caching false only when the properties file is missing, or adding a TTL/reevaluation path on SQL failures (and avoid calling generateAuth() on transient DB errors).
| boolean installed = Setup.isInstalled(); | ||
| log.info("Setup.isInstalled() cached at startup: " + installed); | ||
|
|
There was a problem hiding this comment.
Calling Setup.isInstalled() during startup can have side effects (it may create the setup auth file and cache the installation state). If the DB isn’t ready yet, this can cache a false “not installed” value and affect later behavior. Consider removing this call from startup, or ensuring isInstalled() doesn’t cache/side-effect on transient connection failures.
| boolean installed = Setup.isInstalled(); | |
| log.info("Setup.isInstalled() cached at startup: " + installed); |
src/main/java/dbProcs/Getter.java
Outdated
| } | ||
| // User has logged in, or a Authentication Bypass was detected... You never | ||
| // know! Better safe than sorry | ||
| // TODO: will this close the db connection if we return here? |
There was a problem hiding this comment.
This TODO is now incorrect: the surrounding code uses try-with-resources for the Connection, so returning here will still close/return the connection to the pool. Please remove or update the comment to avoid confusing future leak investigations.
| // TODO: will this close the db connection if we return here? | |
| // Connection is managed by try-with-resources and will be closed when this method returns. |
| result[5] = "false"; // sso logins can't change password | ||
| result[4] = classId; // classId | ||
| result[5] = Boolean.toString(isTempUsername); |
There was a problem hiding this comment.
authUserSSO appears to violate the result array contract used by Login/ACS: result[5] is assigned twice (the first value is overwritten), and result[3] (tempPassword flag) is left unset. This can lead to inconsistent session behavior if callers later check these flags. Align indices with authUser (e.g., keep result[3] as tempPassword=false for SSO and result[5] as tempUsername) and remove the overwritten assignment/comment.
| result[5] = "false"; // sso logins can't change password | |
| result[4] = classId; // classId | |
| result[5] = Boolean.toString(isTempUsername); | |
| result[3] = "false"; // tempPassword flag - SSO logins can't change password | |
| result[4] = classId; // classId | |
| result[5] = Boolean.toString(isTempUsername); // tempUsername flag |
| userName = userResult.getString(2); | ||
| classId = userResult.getString(3); // classId | ||
| isTempUsername = userResult.getBoolean(4); | ||
| log.debug("$$$ End authUser $$$"); |
There was a problem hiding this comment.
The log message says "$$$ End authUser $$$" inside authUserSSO, which is misleading when troubleshooting auth flows. Please update it to match the method name (authUserSSO).
| log.debug("$$$ End authUser $$$"); | |
| log.debug("$$$ End authUserSSO $$$"); |
| def run_scenario(name, spec, iterations, session=None, concurrency=1): | ||
| """ | ||
| Run a single endpoint scenario, optionally with concurrent threads. | ||
| Returns dict with baseline, peak, final connection counts and request stats. | ||
| """ | ||
| _warmup_pool() | ||
|
|
||
| baseline = get_connections() or 0 | ||
| conn_samples = [baseline] | ||
| errors_list = [] | ||
|
|
||
| if concurrency <= 1: | ||
| _worker_loop(spec, iterations, 0, session, conn_samples, errors_list) | ||
| else: |
There was a problem hiding this comment.
run_scenario() defaults session=None, but _worker_loop() unconditionally calls session.get/post, which will raise if run_scenario() is ever used without explicitly passing a session (the signature implies it’s optional). Consider either making session required, or creating an appropriate ShepherdSession() inside run_scenario() when one isn’t provided.
If the DB is temporarily unreachable during the first isInstalled() call, caching false permanently locks the app into "not installed" state for the JVM lifetime. Only cache the true (terminal) state; leave installedCached=null on failure so subsequent requests retry. Made-with: Cursor
Address Copilot review feedback on PR #817: - Update database.properties.example and docs/database-configuration.md to reflect new pool defaults (maxPoolSize=20, minIdle=5, connectionTimeout=5000) - Add assumeTrue guard in SetupIT cache tests so they skip instead of false-passing when the database is unavailable - Remove stale TODO in Getter.authUser (try-with-resources answers it) Made-with: Cursor
Summary
Refactors
Getter.javato use try-with-resources for core connections so Hikari returns connections to the pool under early returns and exceptions.Base branch
Targets
dev#536on OWASP — merge after / on top of #816 (connection pooling).Changes
ResultSetAPIs unchanged (getClassInfoall classes,getPlayersByClass,getAdmins).ConnectionPool.getCoreActiveConnections()for tests.GetterCorePoolLeakIT— bounded active connections under repeatedauthUsercalls.Test plan
mvn testGetterCorePoolLeakIT,GetterIT)