Bad Behavior: Handling Database Down

Brett Wooldridge edited this page Jan 10, 2017 · 1 revision

Here's the scenario. You've configured your pool (HikariCP or other) to timeout on calls to DataSource.getConnection() after 5 seconds. A connection timeout is a contract between you and the pool; your application should get a connection within the specified time, or get an exception. Various different pools provide such timeout mechanisms, but getting timeouts right under adverse network conditions is more difficult than it seems.

We setup a test that configures 4 pools (HikariCP, C3P0, Dbcp2, Vibur) against a remote MySQL server. Every 2 seconds getConnection() is called, a query is executed, and then the connection is closed (returned to the pool).

Each pool is configured (where possible) to validate connections on check-out:

Pool Validation Enablement
HikariCP default
Dbcp2 setTestOnBorrow(true)
C3P0 setTestConnectionOnCheckout(true)
Vibur setTestConnectionQuery("isValid")

Pool timeouts configured as follows:

Pool Timeout
HikariCP setConnectionTimeout(5000)
Tomcat setMaxWait(5000) + setValidationQueryTimeout(5)
C3P0 setCheckoutTimeout(5000)
Vibur setConnectionTimeoutInMs(5000)
setUseNetworkTimeout(true)

TODO: Update test run output.

Start of test run:

04:49:337 [Timer-0] INFO DbDownTest - HikariDataSource got a connection.
04:49:404 [Timer-4] INFO DbDownTest - BasicDataSource got a connection.
04:49:418 [Timer-1] INFO DbDownTest - ComboPooledDataSource got a connection.
04:49:427 [Timer-3] INFO DbDownTest - BoneCPDataSource got a connection.
04:49:440 [Timer-2] INFO DbDownTest - ViburDBCPDataSource got a connection.
04:51:332 [Timer-3] INFO DbDownTest - BoneCPDataSource got a connection.
04:51:332 [Timer-4] INFO DbDownTest - BasicDataSource got a connection.
04:51:332 [Timer-2] INFO DbDownTest - ViburDBCPDataSource got a connection.
04:51:333 [Timer-1] INFO DbDownTest - ComboPooledDataSource got a connection.
04:51:333 [Timer-0] INFO DbDownTest - HikariDataSource got a connection.
04:53:336 [Timer-3] INFO DbDownTest - BoneCPDataSource got a connection.
04:53:336 [Timer-2] INFO DbDownTest - ViburDBCPDataSource got a connection.
04:53:336 [Timer-4] INFO DbDownTest - BasicDataSource got a connection.
04:53:337 [Timer-1] INFO DbDownTest - ComboPooledDataSource got a connection.
04:53:337 [Timer-0] INFO DbDownTest - HikariDataSource got a connection.

1Note BasicDataSource above is the class name of the Tomcat DataSource, ComboPooledDataSource is C3P0

All pools are running without issue.

At this instant (04:54:000) the MySQL server network cable is disconnected. The next execution for each pool should be at ~04:55:300.

We expect every pool to fail within their configured timeout (5 seconds). Here is what happens:

04:55:340 [Timer-3] INFO DbDownTest - BoneCPDataSource got a connection.
04:55:340 [Timer-4] INFO DbDownTest - DataSource got a connection.
04:55:340 [Timer-2] INFO DbDownTest - ViburDBCPDataSource got a connection.
05:00:367 [Timer-0] ERROR DbDownTest - HikariDataSource Exception getting connection: Timeout after 5027ms of waiting for a connection.
05:05:371 [Timer-0] ERROR DbDownTest - HikariDataSource Exception getting connection: Timeout after 5004ms of waiting for a connection.
05:10:375 [Timer-0] ERROR DbDownTest - HikariDataSource Exception getting connection: Timeout after 5004ms of waiting for a connection.
05:15:377 [Timer-0] ERROR DbDownTest - HikariDataSource Exception getting connection: Timeout after 5002ms of waiting for a connection.
05:20:382 [Timer-0] ERROR DbDownTest - HikariDataSource Exception getting connection: Timeout after 5005ms of waiting for a connection.
05:25:386 [Timer-0] ERROR DbDownTest - HikariDataSource Exception getting connection: Timeout after 5004ms of waiting for a connection.
05:30:389 [Timer-0] ERROR DbDownTest - HikariDataSource Exception getting connection: Timeout after 5002ms of waiting for a connection.
05:35:392 [Timer-0] ERROR DbDownTest - HikariDataSource Exception getting connection: Timeout after 5003ms of waiting for a connection.
05:40:393 [Timer-0] ERROR DbDownTest - HikariDataSource Exception getting connection: Timeout after 5001ms of waiting for a connection.
05:45:396 [Timer-0] ERROR DbDownTest - HikariDataSource Exception getting connection: Timeout after 5003ms of waiting for a connection.
What is going on?

HikariCP starts throwing SQLExceptions reflecting a timeout exceeding 5 seconds (5000ms), and continues to do so every time the harness makes an attempt.

C3P0 did not return a connection, and also did not timeout. The execution of the validation query is stuck due to unacknowledged TCP traffic. The setCheckoutTimeout(5000) call is useless for handling network outages. This method is only useful if the pool is "alive" but oversubscribed. There are no other meaningful timeout settings that apply to a network failure.

Dbcp2 did not return a connection, and also did not timeout. The execution of the validation query is stuck due to unacknowledged TCP traffic. Subsequently, the SQL Statement run on the (bad) connection by the test harness hangs (due to unacknowledged TCP). setMaxWait(5000) is seemingly useless for handling network outages. There are no other meaningful timeout settings that apply to a network failure.

Vibur starts throwing SQLExceptions reflecting a timeout exceeding 5 seconds (5000ms), and continues to do so every time the harness makes an attempt.

Continuing...

[snip]
06:00:211 [Timer-4] ERROR DbDownTest - DataSource Exception getting connection: Communications link failure
06:00:222 [Timer-2] ERROR DbDownTest - ViburDBCPDataSource Exception getting connection: Communications link failure
06:00:408 [Timer-0] ERROR DbDownTest - HikariDataSource Exception getting connection: Timeout after 5003ms of waiting for a connection.
06:02:211 [Timer-4] ERROR DbDownTest - DataSource Exception getting connection: Communications link failure
06:03:236 [Timer-2] ERROR DbDownTest - ViburDBCPDataSource Exception getting connection: Communications link failure
06:04:200 [Timer-3] ERROR DbDownTest - BoneCPDataSource Exception getting connection: Timed out waiting for a free available connection.
06:04:215 [Timer-4] ERROR DbDownTest - DataSource Exception getting connection: Communications link failure
06:05:411 [Timer-0] ERROR DbDownTest - HikariDataSource Exception getting connection: Timeout after 5003ms of waiting for a connection.
06:06:219 [Timer-4] ERROR DbDownTest - DataSource Exception getting connection: Communications link failure
06:06:248 [Timer-2] ERROR DbDownTest - ViburDBCPDataSource Exception getting connection: Communications link failure
06:08:224 [Timer-4] ERROR DbDownTest - DataSource Exception getting connection: Communications link failure
06:09:203 [Timer-3] ERROR DbDownTest - BoneCPDataSource Exception getting connection: Timed out waiting for a free available connection.
06:09:257 [Timer-2] ERROR DbDownTest - ViburDBCPDataSource Exception getting connection: Communications link failure
06:10:228 [Timer-4] ERROR DbDownTest - DataSource Exception getting connection: Communications link failure
06:10:415 [Timer-0] ERROR DbDownTest - HikariDataSource Exception getting connection: Timeout after 5004ms of waiting for a connection.
06:14:207 [Timer-3] ERROR DbDownTest - BoneCPDataSource Exception getting connection: Timed out waiting for a free available connection.
06:15:420 [Timer-0] ERROR DbDownTest - HikariDataSource Exception getting connection: Timeout after 5005ms of waiting for a connection.
[snip]
06:35:431 [Timer-0] ERROR DbDownTest - HikariDataSource Exception getting connection: Timeout after 5001ms of waiting for a connection.
06:36:262 [Timer-4] ERROR DbDownTest - DataSource Exception getting connection: Communications link failure
06:37:332 [Timer-2] ERROR DbDownTest - ViburDBCPDataSource Exception getting connection: Communications link failure
06:38:267 [Timer-4] ERROR DbDownTest - DataSource Exception getting connection: Communications link failure
06:39:223 [Timer-3] ERROR DbDownTest - BoneCPDataSource Exception getting connection: Timed out waiting for a free available connection.
06:40:271 [Timer-4] ERROR DbDownTest - DataSource Exception getting connection: Communications link failure
06:40:344 [Timer-2] ERROR DbDownTest - ViburDBCPDataSource Exception getting connection: Communications link failure
06:40:435 [Timer-0] ERROR DbDownTest - HikariDataSource Exception getting connection: Timeout after 5004ms of waiting for a connection.
06:43:058 [Timer-1] ERROR DbDownTest - ComboPooledDataSource Exception getting connection: An SQLException was provoked by the following failure: com.mchange.v2.resourcepool.ResourcePoolException: Attempted to use a closed or broken resource pool
06:43:058 [Timer-1] ERROR DbDownTest - ComboPooledDataSource Exception getting connection: An SQLException was provoked by the following failure: com.mchange.v2.resourcepool.ResourcePoolException: Attempted to use a closed or broken resource pool
06:44:228 [Timer-3] ERROR DbDownTest - BoneCPDataSource Exception getting connection: Timed out waiting for a free available connection.
[...]

Now the other pools are finally coming back to life.

HikariCP continues to throw timeout exceptions every 5 seconds.

C3P0 didn't respond at first with a connection, because it was stuck in the validation query, but after approximately 2 minutes C3P0 finally wakes up with an error. Unfortunately, it does not seem to honor the 5 second timeout, and continues to fail immediately when the harness calls getConnection() every 2 seconds.

Tomcat finally wakes up after ~55 seconds when the harness query against the bad connection finally fails. Unfortunately, it does not seem to honor the 5 second timeout, and continues to fail immediately when the harness calls getConnection() every 2 seconds.

HikariCP continues to throw timeout exceptions every 5 seconds.

Our Subjective Grading

Pool Grade Reason
HikariCP A Properly handles connection timeouts.
Vibur B+ Properly handles connection timeouts. Unfortunately, setUseNetworkTimeout(true) is not the default and its importance likely missed by casual users.
Dbcp2 D Seemingly hangs until the OS TCP timeout limit.
C3P0 D Seemingly hangs until the OS TCP timeout limit.

2 Results obtained with a JDBC 4.1 compliant driver, results may vary with older drivers.


If you find inaccuracies running the test yourself, or with our pool configurations, please open an issue and we will make corrections.