Deadlock during concurrent updates on Derby #2420

lmsurpre · 2021-05-24T14:42:32Z

The check to catch the version conflict was a relatively simple fix. Cleaned things up a bit because for this implementation there's no need to support an asserted version id (which might not be the latest version id - useful for logical replication). However the fix did expose a deadlock scenario for Derby caused by the default isolation level being TRANSACTION_REPEATABLE_READ. This resulted in shared locks lingering on rows which were selected when reading the current version of the resource earlier in the transaction. In Derby, these locks are not compatible with the exclusive lock requested by the UPDATE xx_logical_resources statement. The resulting deadlock graph looks something like this:

A lock could not be obtained due to a deadlock, cycle of locks and waiters is:
Lock : ROW, LOGICAL_RESOURCES, (115,11)
  Waiting XID : {74421, U} , APP, SELECT logical_resource_id FROM logical_resources WHERE resource_type_id = ? AND logical_id = ? FOR UPDATE WITH RS
  Granted XID : {74409, U}
Lock : ROW, PATIENT_LOGICAL_RESOURCES, (8,29)
  Waiting XID : {74409, X} , APP, UPDATE Patient_logical_resources SET current_resource_id = ?, is_deleted = ?, last_updated = ?, version_id = ? WHERE logical_resource_id = ?
  Granted XID : {74401, S} , {74416, S} , {74409, S} , {74421, S}
. The selected victim is XID : 74421.

The solution is to set the isolation level for Derby datasources in openliberty:

     <dataSource id="fhirDatasourceBootstrapDefaultDefault" jndiName="jdbc/fhir_default_default" type="javax.sql.XADataSource" statementCacheSize="200" syncQueryTimeoutWithTransactionTimeout="true" isolationLevel="TRANSACTION_READ_COMMITTED">
        <jdbcDriver javax.sql.XADataSource="org.apache.derby.jdbc.EmbeddedXADataSource" libraryRef="fhirSharedLib"/>
            <properties.derby.embedded createDatabase="create" databaseName="derby/fhirDB"/>
        <connectionManager maxPoolSize="50" minPoolSize="10"/>
    </dataSource>

As part of the effort to uncover the deadlock, the stored procedure implementations were simplified by removing an update on xx_logical_resources when the resource was new. The new solution allocates the resource_id early and uses it to set the xx_logical_resources.current_resource_id column before the actual resource row is created. This should be particularly helpful for PostgreSQL, because it avoids creating a tombstone which then needs to be vacuumed.

Originally posted by @punktilious in #2405 (comment)

The text was updated successfully, but these errors were encountered:

Previously this was flagged as a db2-only test, which helped contribute to us missing this issue. Additionally, this test was written to highlight a potential issue but didn't yet take into account our solution which is to return a 409 for one of the updates (in the case of a conflict). The updated test will retry on a tight loop, ensuring that we can continue to make progress in the face of contentious PUT requests to the same resource. Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>

Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>

lmsurpre · 2021-06-02T21:46:13Z

I updated our existing ConcurrentUpdateTest to retry in the case of a 409 Conflict response.
Additionally I removed the flags that caused us to skip this test in our CI and so now we should be running this against all db types on each PR.

Specifically, the test forks 12 threads (easy to change) and asks each one to update the same resource in parallel. Because each thread will retry (in a tight loop) upon failure, this shows that one thread should always win (i.e. it continues to make progress despite the contention), that we avoid a deadlock in such cases, and that no updates are lost.

lmsurpre assigned punktilious May 24, 2021

lmsurpre mentioned this issue May 24, 2021

Concurrent updates to the same resource can result in lost updates #2405

Closed

prb112 added persistence bug Something isn't working labels May 24, 2021

lmsurpre added a commit that referenced this issue May 24, 2021

issue #2420 - add TRANSACTION_READ_COMMITTED isolationLevel for Db2

7f77aef

Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>

lmsurpre added a commit that referenced this issue May 24, 2021

issue #2420 - add TRANSACTION_READ_COMMITTED isolationLevel for Db2

21c7f6e

Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>

lmsurpre closed this as completed Jun 2, 2021

lmsurpre added the configuration label Jun 4, 2021

lmsurpre mentioned this issue Jun 6, 2021

issue 2405 fixed concurrent updates to correctly return 409 on version conflict #2408

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock during concurrent updates on Derby #2420

Deadlock during concurrent updates on Derby #2420

lmsurpre commented May 24, 2021

lmsurpre commented Jun 2, 2021

Deadlock during concurrent updates on Derby #2420

Deadlock during concurrent updates on Derby #2420

Comments

lmsurpre commented May 24, 2021

lmsurpre commented Jun 2, 2021