Because the locking methods call blocking methods that throw InterruptedException, the handling of these needs more care, so as not to leave the lock in a deadlocked or bugged state. Logically, this can happen during shutdown. Current symptom is that the DistributedReadWriteLockIT integration test fails if the tests of lockInterruptibly are run. The solution is first to define cancellation policies for each lock method (e.g. will it complete or try to return early? Is it interruptible only during method entry? What guarantees are we making about the state?) and then implement and document them. Cancellation anywhere other than on method entry can be difficult and expensive because it requires cleaning up after yourself. All lock methods have to at least leave the lock in a consistent state, otherwise thread interruption (e.g. Shutting down one server in a cluster) can cause deadlock across the whole cluster. Oops! That would be bad. Already working on a fix.
The text was updated successfully, but these errors were encountered:
fridgebuzz
changed the title
Interruptible tests in DistributedReadWriteLockIT causing other tests to fail
Interrupting a thread during lock acquisition can cause deadlock
Jun 2, 2014
Because the locking methods call blocking methods that throw InterruptedException, the handling of these needs more care, so as not to leave the lock in a deadlocked or bugged state. Logically, this can happen during shutdown. Current symptom is that the DistributedReadWriteLockIT integration test fails if the tests of lockInterruptibly are run. The solution is first to define cancellation policies for each lock method (e.g. will it complete or try to return early? Is it interruptible only during method entry? What guarantees are we making about the state?) and then implement and document them. Cancellation anywhere other than on method entry can be difficult and expensive because it requires cleaning up after yourself. All lock methods have to at least leave the lock in a consistent state, otherwise thread interruption (e.g. Shutting down one server in a cluster) can cause deadlock across the whole cluster. Oops! That would be bad. Already working on a fix.
The text was updated successfully, but these errors were encountered: