Skip to content

Comments

MODE-1279 Corrected deadlock between save() and query execution.#207

Closed
rhauch wants to merge 1 commit intoModeShape:masterfrom
rhauch:mode-1279
Closed

MODE-1279 Corrected deadlock between save() and query execution.#207
rhauch wants to merge 1 commit intoModeShape:masterfrom
rhauch:mode-1279

Conversation

@rhauch
Copy link
Contributor

@rhauch rhauch commented Oct 17, 2011

I've replicated the issue with a new unit test and discovered the bug was caused by a deadlock between the indexing that resulted from a save() and the user-invoked query.

I found a fix for this that is minimally intrusive and low-risk. It basically involves a few things:

  1. A few changes to the federation connector to do a better job about knowing whether the incoming requests are all read-only, and if so to submit this info with the corresponding requests to the source connector(s). IOW, passing down a bit more state. This change is very low-risk.

  2. Eliminating a read-write lock within the search engine component that wraps Lucene. Lucene 2.x used a pretty poor locking mechanism for the indexes, so we had to put a read-write lock in our layer to prevent concurrent writes. However, Lucene 3.x (which we've been using for a while), Lucene has had a really good native file system locking mechanism, so our lock was actually superfluous. Yet that lock was the thing that was causing our contention, and eliminating it results in my unit test passing (even when repeating it over and over).

  3. Changes to some test cases to accommodate the changes in several constructors.

With these changes, all unit and integration tests pass.

I've replicated the issue with a new unit test and discovered the bug was caused by a deadlock between the indexing that resulted from a save() and the user-invoked query.

I found a fix for this that is minimally intrusive and low-risk. It basically involves a few things:

1) A few changes to the federation connector to do a better job about knowing whether the incoming requests are all read-only, and if so to submit this info with the corresponding requests to the source connector(s). IOW, passing down a bit more state. This change is very low-risk.

2) Eliminating a read-write lock within the search engine component that wraps Lucene. Lucene 2.x used a pretty poor locking mechanism for the indexes, so we had to put a read-write lock in our layer to prevent concurrent writes. However, Lucene 3.x (which we've been using for a while), Lucene has had a really good native file system locking mechanism, so our lock was actually superfluous. Yet that lock was the thing that was causing our contention, and eliminating it results in my unit test passing (even when repeating it over and over).

3) Changes to some test cases to accommodate the changes in several constructors.

With these changes, all unit and integration tests pass.
@rhauch
Copy link
Contributor Author

rhauch commented Oct 17, 2011

Holding off on merging, since I periodically get a deadlock in one of the integration tests now.

@rhauch
Copy link
Contributor Author

rhauch commented Oct 27, 2011

Closing WITHOUT merging because of the aforementioned difficulties.

@rhauch rhauch closed this Oct 27, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant