MODE-1279 Corrected deadlock between save() and query execution.#207
Closed
rhauch wants to merge 1 commit intoModeShape:masterfrom
Closed
MODE-1279 Corrected deadlock between save() and query execution.#207rhauch wants to merge 1 commit intoModeShape:masterfrom
rhauch wants to merge 1 commit intoModeShape:masterfrom
Conversation
I've replicated the issue with a new unit test and discovered the bug was caused by a deadlock between the indexing that resulted from a save() and the user-invoked query. I found a fix for this that is minimally intrusive and low-risk. It basically involves a few things: 1) A few changes to the federation connector to do a better job about knowing whether the incoming requests are all read-only, and if so to submit this info with the corresponding requests to the source connector(s). IOW, passing down a bit more state. This change is very low-risk. 2) Eliminating a read-write lock within the search engine component that wraps Lucene. Lucene 2.x used a pretty poor locking mechanism for the indexes, so we had to put a read-write lock in our layer to prevent concurrent writes. However, Lucene 3.x (which we've been using for a while), Lucene has had a really good native file system locking mechanism, so our lock was actually superfluous. Yet that lock was the thing that was causing our contention, and eliminating it results in my unit test passing (even when repeating it over and over). 3) Changes to some test cases to accommodate the changes in several constructors. With these changes, all unit and integration tests pass.
Contributor
Author
|
Holding off on merging, since I periodically get a deadlock in one of the integration tests now. |
Contributor
Author
|
Closing WITHOUT merging because of the aforementioned difficulties. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I've replicated the issue with a new unit test and discovered the bug was caused by a deadlock between the indexing that resulted from a save() and the user-invoked query.
I found a fix for this that is minimally intrusive and low-risk. It basically involves a few things:
A few changes to the federation connector to do a better job about knowing whether the incoming requests are all read-only, and if so to submit this info with the corresponding requests to the source connector(s). IOW, passing down a bit more state. This change is very low-risk.
Eliminating a read-write lock within the search engine component that wraps Lucene. Lucene 2.x used a pretty poor locking mechanism for the indexes, so we had to put a read-write lock in our layer to prevent concurrent writes. However, Lucene 3.x (which we've been using for a while), Lucene has had a really good native file system locking mechanism, so our lock was actually superfluous. Yet that lock was the thing that was causing our contention, and eliminating it results in my unit test passing (even when repeating it over and over).
Changes to some test cases to accommodate the changes in several constructors.
With these changes, all unit and integration tests pass.