You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The system tests occasionally fail due to locking in Derby:
[10/5/21, 22:15:54:977 UTC] 0000002e com.ibm.fhir.persistence.jdbc.dao.impl.ResourceReferenceDAO E SELECT code_system_name, code_system_id FROM code_systems WHERE code_system_name IN (?,?,?,?,?)^M
java.sql.SQLTransactionRollbackException: A lock could not be obtained due to a deadlock, cycle of locks and waiters is:
Lock : ROW, CODE_SYSTEMS, (1,30)
Waiting XID : {28669, S} , APP, SELECT code_system_name, code_system_id FROM code_systems WHERE code_system_name IN (?,?,?,?,?)
Granted XID : {28664, X}
Lock : ROW, CODE_SYSTEMS, (1,25)
Waiting XID : {28664, S} , APP, SELECT code_system_name, code_system_id FROM code_systems WHERE code_system_name IN (?,?,?,?,?,?,?,?)
Granted XID : {28669, X}
. The selected victim is XID : 28669.^M
Environment
4.10.0
To Reproduce
Run the server integration test suite using Derby multiple times until the failure occurs.
Expected behavior
No deadlock.
Additional context
The deadlock happens when two threads are attempting to populate the common values table - in this case CODE_SYSTEMS. This typically happens early on when the table is empty, or a batch of data being ingested in parallel contains new values which are shared among resources.
The conflict occurs when the transaction data is being flushed to the database:
at com.ibm.fhir.persistence.jdbc.dao.impl.ResourceReferenceDAO.upsertCodeSystems(ResourceReferenceDAO.java:388)^M
at com.ibm.fhir.persistence.jdbc.dao.impl.ResourceReferenceDAO.persist(ResourceReferenceDAO.java:825)^M
at com.ibm.fhir.persistence.jdbc.impl.FHIRPersistenceJDBCImpl.persistResourceTokenValueRecords(FHIRPersistenceJDBCImpl.java:2801)^M
at com.ibm.fhir.persistence.jdbc.impl.ParameterTransactionDataImpl.persist(ParameterTransactionDataImpl.java:66)^M
at com.ibm.fhir.persistence.jdbc.dao.impl.TransactionDataImpl.lambda$persist$0(TransactionDataImpl.java:46)^M
at java.base/java.util.HashMap$Values.forEach(HashMap.java:976)^M
at com.ibm.fhir.persistence.jdbc.dao.impl.TransactionDataImpl.persist(TransactionDataImpl.java:46)^M
at com.ibm.fhir.persistence.jdbc.impl.CacheTransactionSync.beforeCompletion(CacheTransactionSync.java:57)^
To avoid a deadlock, we should endeavour to always request/obtain locks in the same order. This means:
Sort the list of values before we insert.
Sort the list of values in the above select statement so that the IN list is collated in the same way as the values sorted in item 1.
This will ensure that inserts don't conflict with other inserts, and the read (S) locks will be requested in the same order as the write (X) locks are requested and granted.
In ResourceReferenceDAO#persist, we need to sort systemMisses before the upsert:
// Grab the ids for all the code-systems, and upsert any misses
List<ResourceTokenValueRec> systemMisses = new ArrayList<>();
cache.resolveCodeSystems(records, systemMisses);
cache.resolveCodeSystems(tagRecs, systemMisses);
cache.resolveCodeSystems(securityRecs, systemMisses);
upsertCodeSystems(systemMisses);
Do the same for the other collections before their upserts.
The sort needs to happen lower in the code, within:
public void upsertCodeSystems(List<ResourceTokenValueRec> systems) {
This is because the list of system names is converted to a Set for uniqueness. The number of objects in this set will generally be relatively small (<<100). Either we insert the values into a SortedSet, or convert the Set to a List and sort that separately. There's probably not a lot of difference in performance when dealing with small object counts.
The text was updated successfully, but these errors were encountered:
Due to enhancements in the FHIR search implementation, new parameter names are created at runtime which can trigger more deadlocks with Derby. This has been addressed in #2857 by preloading the parameter_names table with all the parameter names discovered after a successful run-through of the system tests.
The deadlock issue seems to be addressed (per recent CI results).
Unfortunately, there is still an intermittent failure in the CI from actions running with derby...specifically contention on the sequence allocation during ingestion of many resources.
Describe the bug
The system tests occasionally fail due to locking in Derby:
[10/5/21, 22:15:54:977 UTC] 0000002e com.ibm.fhir.persistence.jdbc.dao.impl.ResourceReferenceDAO E SELECT code_system_name, code_system_id FROM code_systems WHERE code_system_name IN (?,?,?,?,?)^M
java.sql.SQLTransactionRollbackException: A lock could not be obtained due to a deadlock, cycle of locks and waiters is:
Lock : ROW, CODE_SYSTEMS, (1,30)
Waiting XID : {28669, S} , APP, SELECT code_system_name, code_system_id FROM code_systems WHERE code_system_name IN (?,?,?,?,?)
Granted XID : {28664, X}
Lock : ROW, CODE_SYSTEMS, (1,25)
Waiting XID : {28664, S} , APP, SELECT code_system_name, code_system_id FROM code_systems WHERE code_system_name IN (?,?,?,?,?,?,?,?)
Granted XID : {28669, X}
. The selected victim is XID : 28669.^M
Environment
4.10.0
To Reproduce
Run the server integration test suite using Derby multiple times until the failure occurs.
Expected behavior
No deadlock.
Additional context
The deadlock happens when two threads are attempting to populate the common values table - in this case CODE_SYSTEMS. This typically happens early on when the table is empty, or a batch of data being ingested in parallel contains new values which are shared among resources.
The conflict occurs when the transaction data is being flushed to the database:
To avoid a deadlock, we should endeavour to always request/obtain locks in the same order. This means:
This will ensure that inserts don't conflict with other inserts, and the read (S) locks will be requested in the same order as the write (X) locks are requested and granted.
In ResourceReferenceDAO#persist, we need to sort systemMisses before the upsert:
Do the same for the other collections before their upserts.
The sort needs to happen lower in the code, within:
This is because the list of system names is converted to a Set for uniqueness. The number of objects in this set will generally be relatively small (<<100). Either we insert the values into a SortedSet, or convert the Set to a List and sort that separately. There's probably not a lot of difference in performance when dealing with small object counts.
The text was updated successfully, but these errors were encountered: