Skip to content

fix: return core DB connections to pool from Getter (stacked on #816)#817

Merged
ismisepaul merged 6 commits intodev#536from
dev#536-getter-leak-fix
Mar 29, 2026
Merged

fix: return core DB connections to pool from Getter (stacked on #816)#817
ismisepaul merged 6 commits intodev#536from
dev#536-getter-leak-fix

Conversation

@ismisepaul
Copy link
Copy Markdown
Member

Summary

Refactors Getter.java to use try-with-resources for core connections so Hikari returns connections to the pool under early returns and exceptions.

Base branch

Targets dev#536 on OWASP — merge after / on top of #816 (connection pooling).

Changes

  • try-with-resources across Getter core paths; legacy ResultSet APIs unchanged (getClassInfo all classes, getPlayersByClass, getAdmins).
  • ConnectionPool.getCoreActiveConnections() for tests.
  • GetterCorePoolLeakIT — bounded active connections under repeated authUser calls.

Test plan

  • mvn test
  • Integration tests with DB (GetterCorePoolLeakIT, GetterIT)

- Use try-with-resources for all Getter paths that used manual closeConnection
- Keep legacy ResultSet APIs (getClassInfo all classes, getPlayersByClass, getAdmins) unchanged
- Add ConnectionPool.getCoreActiveConnections for tests
- Add GetterCorePoolLeakIT to assert bounded pool usage under repeated authUser

Made-with: Cursor
Setup.isInstalled() was called on every HTTP request via SetupFilter,
each time reading database.properties from disk and borrowing a core
pool connection just to check non-null. Under concurrent load this
exhausted the pool and cascaded into a full app lockup.

Cache with volatile Boolean + double-checked locking so the check
runs once, then returns constant-time on all subsequent requests.
resetInstalledCache() called after setup completes so the first
post-setup request re-evaluates. Warm the cache at startup from
DatabaseLifecycleListener.contextInitialized().

Pool tuning:
- maxPoolSize 10 → 20 (supports realistic classroom concurrency)
- connectionTimeout 30s → 5s (fail fast under overload instead of
  blocking Tomcat threads for 30s each)
- minIdle 2 → 5 (reduce cold-start latency)

Known limitation: authUser holds a DB connection during Argon2
password verification (~100-200ms). This limits throughput under
high concurrency. Follow-up will release connection before hashing.

Load test updated with --target and --concurrency flags for targeted
per-class/per-method testing instead of broad soak only.

Made-with: Cursor
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent core DB connection pool exhaustion after introducing HikariCP (#816) by refactoring core DB access to reliably return connections to the pool, and by adding targeted regression tests/load tooling to detect leaks.

Changes:

  • Refactors many Getter.java core DB paths to use try-with-resources for core Connection handling.
  • Adds/extends integration + load testing to detect core pool leaks and validate setup/install-state caching.
  • Adjusts core pool default sizing/timeouts and adds a helper to read active core connections for tests.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tests/load/load-test.py Adds targeted “getter/setter” scenarios and leak-style connection sampling mode.
src/main/java/servlets/Setup.java Adds cached isInstalled() result + reset hook; resets cache after setup actions.
src/main/java/listeners/DatabaseLifecycleListener.java Calls Setup.isInstalled() at startup to prime cache/log state.
src/main/java/dbProcs/Getter.java Converts many core DB call sites to try-with-resources for core connections.
src/main/java/dbProcs/ConnectionPool.java Changes default core pool config; adds getCoreActiveConnections() for tests.
src/it/java/servlets/SetupIT.java Adds tests covering Setup.isInstalled() caching/reset behavior.
src/it/java/dbProcs/GetterCorePoolLeakIT.java Adds IT to ensure repeated authUser() calls don’t exhaust core pool.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +44 to +47
private static final int DEFAULT_MAX_POOL_SIZE = 20;
private static final int DEFAULT_MIN_IDLE = 5;
private static final long DEFAULT_CONNECTION_TIMEOUT =
5000; // 5 seconds — fail fast under overload
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core pool defaults were changed (maxPoolSize/minIdle/connectionTimeout), but the repository docs and example config still state defaults of 10/2/30000ms (e.g., docs/database-configuration.md and src/main/resources/database.properties.example). This will confuse operators and can change production behavior unexpectedly. Either keep code defaults aligned with the documented defaults, or update the docs/examples in the same change set.

Suggested change
private static final int DEFAULT_MAX_POOL_SIZE = 20;
private static final int DEFAULT_MIN_IDLE = 5;
private static final long DEFAULT_CONNECTION_TIMEOUT =
5000; // 5 seconds — fail fast under overload
private static final int DEFAULT_MAX_POOL_SIZE = 10;
private static final int DEFAULT_MIN_IDLE = 2;
private static final long DEFAULT_CONNECTION_TIMEOUT =
30000; // 30 seconds — matches documented default

Copilot uses AI. Check for mistakes.
Comment on lines +162 to +166
try {
ConnectionPool.initialize();
} catch (Exception e) {
log.warn("Pool init issue: " + e.getMessage());
}
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests swallow ConnectionPool.initialize() failures and then only assert first == second, which can pass even when the pool/DB isn’t actually available (false positives). Consider using an assumption/assertion that the pool initialized (or that Setup.isInstalled() returns a specific expected value) before checking caching behavior, and avoid continuing the test after initialization errors.

Copilot uses AI. Check for mistakes.
Comment on lines +25 to +27
/** Must stay within configured core pool max (see database.properties / ConnectionPool). */
private static final int MAX_ALLOWED_ACTIVE = 32;

Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAX_ALLOWED_ACTIVE is hard-coded to 32, but the core pool default maximum is now 20 (and can be configured via properties). With a fixed value above the pool max, the assertion may become less meaningful across configurations. Consider deriving this threshold from the actual Hikari config/PoolMXBean (maximumPoolSize) or asserting that active returns close to baseline (e.g., <= baseline + 1) after the loop.

Copilot uses AI. Check for mistakes.
Comment on lines +386 to +403
Properties prop = getDBProps();

try (Connection coreConnection = Database.getCoreConnection(null)) {
if (coreConnection != null) {
isInstalled = true;
if (prop != null) {
try (Connection coreConnection = Database.getCoreConnection(null)) {
if (coreConnection != null) {
installed = true;
}
} catch (SQLException e) {
log.info("isInstalled got SQL exception " + e.toString() + ", assuming not installed.");
}
} catch (SQLException e) {
log.info("isInstalled got SQL exception " + e.toString() + ", assuming not installed.");
}
}

if (!isInstalled) {
generateAuth();
if (!installed) {
generateAuth();
}

installedCached = installed;
return installed;
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setup.isInstalled() caches a false result even when the DB is temporarily unreachable (e.g., SQLException while connecting). That can leave the app stuck in “not installed” state for the life of the JVM, even if the DB becomes available later. Consider only caching true, or caching false only when the properties file is missing, or adding a TTL/reevaluation path on SQL failures (and avoid calling generateAuth() on transient DB errors).

Copilot uses AI. Check for mistakes.
Comment on lines +48 to +50
boolean installed = Setup.isInstalled();
log.info("Setup.isInstalled() cached at startup: " + installed);

Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling Setup.isInstalled() during startup can have side effects (it may create the setup auth file and cache the installation state). If the DB isn’t ready yet, this can cache a false “not installed” value and affect later behavior. Consider removing this call from startup, or ensuring isInstalled() doesn’t cache/side-effect on transient connection failures.

Suggested change
boolean installed = Setup.isInstalled();
log.info("Setup.isInstalled() cached at startup: " + installed);

Copilot uses AI. Check for mistakes.
}
// User has logged in, or a Authentication Bypass was detected... You never
// know! Better safe than sorry
// TODO: will this close the db connection if we return here?
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO is now incorrect: the surrounding code uses try-with-resources for the Connection, so returning here will still close/return the connection to the pool. Please remove or update the comment to avoid confusing future leak investigations.

Suggested change
// TODO: will this close the db connection if we return here?
// Connection is managed by try-with-resources and will be closed when this method returns.

Copilot uses AI. Check for mistakes.
Comment on lines +457 to +459
result[5] = "false"; // sso logins can't change password
result[4] = classId; // classId
result[5] = Boolean.toString(isTempUsername);
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

authUserSSO appears to violate the result array contract used by Login/ACS: result[5] is assigned twice (the first value is overwritten), and result[3] (tempPassword flag) is left unset. This can lead to inconsistent session behavior if callers later check these flags. Align indices with authUser (e.g., keep result[3] as tempPassword=false for SSO and result[5] as tempUsername) and remove the overwritten assignment/comment.

Suggested change
result[5] = "false"; // sso logins can't change password
result[4] = classId; // classId
result[5] = Boolean.toString(isTempUsername);
result[3] = "false"; // tempPassword flag - SSO logins can't change password
result[4] = classId; // classId
result[5] = Boolean.toString(isTempUsername); // tempUsername flag

Copilot uses AI. Check for mistakes.
userName = userResult.getString(2);
classId = userResult.getString(3); // classId
isTempUsername = userResult.getBoolean(4);
log.debug("$$$ End authUser $$$");
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log message says "$$$ End authUser $$$" inside authUserSSO, which is misleading when troubleshooting auth flows. Please update it to match the method name (authUserSSO).

Suggested change
log.debug("$$$ End authUser $$$");
log.debug("$$$ End authUserSSO $$$");

Copilot uses AI. Check for mistakes.
Comment on lines +460 to +473
def run_scenario(name, spec, iterations, session=None, concurrency=1):
"""
Run a single endpoint scenario, optionally with concurrent threads.
Returns dict with baseline, peak, final connection counts and request stats.
"""
_warmup_pool()

baseline = get_connections() or 0
conn_samples = [baseline]
errors_list = []

if concurrency <= 1:
_worker_loop(spec, iterations, 0, session, conn_samples, errors_list)
else:
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_scenario() defaults session=None, but _worker_loop() unconditionally calls session.get/post, which will raise if run_scenario() is ever used without explicitly passing a session (the signature implies it’s optional). Consider either making session required, or creating an appropriate ShepherdSession() inside run_scenario() when one isn’t provided.

Copilot uses AI. Check for mistakes.
If the DB is temporarily unreachable during the first isInstalled()
call, caching false permanently locks the app into "not installed"
state for the JVM lifetime. Only cache the true (terminal) state;
leave installedCached=null on failure so subsequent requests retry.

Made-with: Cursor
Address Copilot review feedback on PR #817:

- Update database.properties.example and docs/database-configuration.md
  to reflect new pool defaults (maxPoolSize=20, minIdle=5,
  connectionTimeout=5000)
- Add assumeTrue guard in SetupIT cache tests so they skip instead of
  false-passing when the database is unavailable
- Remove stale TODO in Getter.authUser (try-with-resources answers it)

Made-with: Cursor
@ismisepaul ismisepaul merged commit c88250e into dev#536 Mar 29, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants