Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IllegalMonitorStateException in Transaction handling over HTTP #630

Closed
abrokenjester opened this issue Oct 26, 2016 · 7 comments
Closed
Assignees
Labels
🐞 bug issue is a bug
Milestone

Comments

@abrokenjester
Copy link
Contributor

abrokenjester commented Oct 26, 2016

During transactions executed over HTTP an exception can occur when commit is invoked.

Stacktrace:

org.eclipse.rdf4j.http.server.ServerHTTPException: Transaction handling error: null
        at org.eclipse.rdf4j.http.server.repository.transaction.TransactionController.processModific
ationOperation(TransactionController.java:311)
        at org.eclipse.rdf4j.http.server.repository.transaction.TransactionController.handleRequestI
nternal(TransactionController.java:194)
        at org.springframework.web.servlet.mvc.AbstractController.handleRequest(AbstractController.j
ava:147)
        at org.springframework.web.servlet.mvc.SimpleControllerHandlerAdapter.handle(SimpleControlle
rHandlerAdapter.java:50)
        at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:959)
        at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:893)
        at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970
)
        at org.springframework.web.servlet.FrameworkServlet.doPut(FrameworkServlet.java:883)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:651)

....

Caused by: java.lang.IllegalMonitorStateException: null
        at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
        at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
        at org.eclipse.rdf4j.sail.nativerdf.NativeSailStore$NativeSailSink.close(NativeSailStore.java:341)
        at org.eclipse.rdf4j.sail.nativerdf.SailSourceModel.dataset(SailSourceModel.java:397)
        at org.eclipse.rdf4j.sail.nativerdf.SailSourceModel.iterator(SailSourceModel.java:299)
        at org.eclipse.rdf4j.sail.nativerdf.MemoryOverflowModel.iterator(MemoryOverflowModel.java:128)
        at org.eclipse.rdf4j.sail.base.SailSourceBranch.flush(SailSourceBranch.java:536)
        at org.eclipse.rdf4j.sail.base.SailSourceBranch.flush(SailSourceBranch.java:492)
        at org.eclipse.rdf4j.sail.base.SailSourceBranch.flush(SailSourceBranch.java:275)
        at org.eclipse.rdf4j.sail.base.SailSourceBranch.autoFlush(SailSourceBranch.java:359)
        at org.eclipse.rdf4j.sail.base.SailSourceBranch$2.close(SailSourceBranch.java:221)
        at org.eclipse.rdf4j.sail.base.SailSourceConnection.endUpdateInternal(SailSourceConnection.java:611)
        at org.eclipse.rdf4j.sail.helpers.AbstractSailConnection.endUpdate(AbstractSailConnection.java:573)
        at org.eclipse.rdf4j.sail.helpers.AbstractSailConnection.flush(AbstractSailConnection.java:374)
        at org.eclipse.rdf4j.sail.helpers.SailConnectionWrapper.flush(SailConnectionWrapper.java:247)
        at org.eclipse.rdf4j.repository.sail.SailRepositoryConnection.commit(SailRepositoryConnection.java:167)
        at org.eclipse.rdf4j.http.server.repository.transaction.TransactionController.processModificationOperation(TransactionController.java:293)

The problem is caused bythe fact that NativeSailStore keeps an internal ReentrantLock, and that (due to the way the T ransactionController is set up), this lock may try to be released by a different thread than the thread that originally locked it.

This bug was likely introduced by fixes for issue #293 and associated cleanup.

@abrokenjester abrokenjester added the 🐞 bug issue is a bug label Oct 26, 2016
@abrokenjester
Copy link
Contributor Author

@ansell in NativeSailStore you introduced, as part of #293, this use of a variable nextTxnLockAcquired as an additional check around the locking and unlocking of the ReentrantLock:

        public synchronized void close() {
        boolean nextTxnLockAcquired = txnLockAcquired;
        txnLockAcquired = false;
        if (nextTxnLockAcquired) {
            txnLockManager.unlock();
        }
    }

and

        private synchronized void acquireExclusiveTransactionLock()
        throws SailException
    {
        boolean nextTxnLockAcquired = txnLockAcquired;
        if (!nextTxnLockAcquired) {
            txnLockManager.lock();
            try {
                if (txnLockManager.getHoldCount() == 1) {
                    // first object
                    tripleStore.startTransaction();
                }
                nextTxnLockAcquired = txnLockAcquired = true;
            }
            catch (IOException e) {
                throw new SailException(e);
            }
            finally {
                if (!nextTxnLockAcquired) {
                    txnLockManager.unlock();
                }
            }
        }
    }

Can you briefly talk me through why this extra boolean is necessary? I get the feeling the bug we encounter is caused by this extra check: some thread doesn't release the lock, which causes another thread later on in the transaction to fail when trying to release it.

@ansell
Copy link
Contributor

ansell commented Oct 26, 2016

The reasoning behind why I refactored it to match the pattern used in other cases is that if you are going to do a guard on a volatile value locally in a threaded situation, you should be taking a local copy of it to ensure the guard value matches the value that you see inside of the guarded block. It doesn't necessarily apply to synchronized methods, but there may still be side effects from non-synchronized methods that become more deterministic with a safe guard pattern.

I may have made a mistake but I can't see it right now.

They key to the pattern here appears to be that the unlock should only successfully complete if either getHoldCount was a value other than 1 (not sure what values that could be) or optionally if both getHoldCount was 1 and the transaction was started successfully.

I don't fully understand the getHoldCount significance and why it should pass through to get an exclusive lock if getHoldCount was a value other than 1, and perhaps more importantly why it could do be a value other than 1 given that it is checking from within a synchronized method.

The fact that it was all occurring within a synchronized method seemed to mean if there was a bug it was likely to be outside of that method, and the tests never started to fail over the time period that I was working through it, or in the weeks since. The local variable assignments are null within each synchronized method, but fit the general pattern after the refactoring so that their effects are easier to reason about if someone touched the variable outside of a synchronized method, which seems to point to outside interference with that variable that was previously being allowed for in part of the pattern that I don't understand.

@abrokenjester
Copy link
Contributor Author

I believe there may be quite a bit of legacy here confusing the issue. The variable txnLockAcquired was originally introduced by me in NativeStoreConnection, before the entire native store was refactored for optimistic locking. I get the feeling the mechanism as currently in place is redundant. I'll take a closer look.

@abrokenjester
Copy link
Contributor Author

I've done some simplifying, essentially just ripping this boolean variable out completely, and instead relying on Lock.isHeldByCurrentThread() to verify locking state. Will put up a PR soon (currently waiting for the build tests to complete), can you eyeball and see if it makes sense?

abrokenjester added a commit that referenced this issue Oct 26, 2016
…tLock object, removing separate boolean

Signed-off-by: Jeen Broekstra <jeen.broekstra@gmail.com>
abrokenjester added a commit that referenced this issue Oct 26, 2016
Signed-off-by: Jeen Broekstra <jeen.broekstra@gmail.com>
abrokenjester added a commit that referenced this issue Oct 26, 2016
Signed-off-by: Jeen Broekstra <jeen.broekstra@gmail.com>
@abrokenjester
Copy link
Contributor Author

I've decided to start again - I think that instead of treating this as a bug in the native store, I should look at this as a fundamental flaw in the TransactionController. RepositoryConnections are not guaranteed thread-safe, yet TransactionController still tries to let multiple threads do things on the same connection.

A possible fix is to let TransactionController create dedicated threads, one for each transaction, and ensure that that same thread is used for all incoming operations on that transaction.

@ansell
Copy link
Contributor

ansell commented Oct 27, 2016

In the distant past I added synchronized statements around places where the repository connections were used in a few places, such as:

https://github.com/eclipse/rdf4j/blob/master/core/http/server-spring/src/main/java/org/eclipse/rdf4j/http/server/repository/RepositoryController.java#L155

Is it possible there are new locations that have popped up since then that also need synchronized statements around them?

@abrokenjester
Copy link
Contributor Author

Possibly, but the problem is more fundamental than that. The REST API supports transactions by means of individual PUT/POST operations against a single "transaction resource". Since these are multiple Http requests that need to be mapped to the same transaction, the TransactionController uses a cache (the ActiveTransactionRegistry) to get a RepositoryConnection object for the identified transaction.

This is all fine and dandy but the problem is that each new HttpRequest runs on a new thread! So, we get that Thread A starts the transaction, Thread B then adds some data, and finally Thread C commits. It does all happen sequentially (that is why there is so much locking etc going on in the ActiveTransactionRegistry), but at the end of the day they're different threads - and NativeStoreConnection can't handle that, because internally it uses ReentrantLocks for the duration of a transaction - which can only be unlocked by the same thread that locked them first.

So I think the solution is that the TransactionController spawns a new, dedicated Thread for each new transaction, and keeps that Thread alive until the transaction is complete, making sure that each transaction operation is executed on that thread, instead of on the http request worker thread itself.

I'm currently working on a simple redesign using a Executors.singleThreadExecutor per transaction.

@abrokenjester abrokenjester self-assigned this Oct 27, 2016
abrokenjester added a commit that referenced this issue Oct 28, 2016
…d txn threads

Signed-off-by: Jeen Broekstra <jeen.broekstra@gmail.com>
abrokenjester added a commit that referenced this issue Oct 28, 2016
Transaction.close now ensures a second call is silently ignored

Signed-off-by: Jeen Broekstra <jeen.broekstra@gmail.com>
abrokenjester added a commit that referenced this issue Oct 28, 2016
…en checking for active operations

Signed-off-by: Jeen Broekstra <jeen.broekstra@gmail.com>
abrokenjester added a commit that referenced this issue Oct 28, 2016
Signed-off-by: Jeen Broekstra <jeen.broekstra@gmail.com>
@abrokenjester abrokenjester added this to the 2.1.1 milestone Oct 28, 2016
abrokenjester added a commit that referenced this issue Oct 31, 2016
abrokenjester added a commit that referenced this issue Oct 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug issue is a bug
Projects
None yet
Development

No branches or pull requests

2 participants