Fetch / Count might fail if executed on a relocated shard. #4273

s1monw · 2013-11-27T14:59:33Z

When we relocate a shard we might still have pending SearchContext
instances hanging around that will be used in "in-flight" searches
on the already relocated shard. This is a valid operation but if
we have already closed the underlying directory which happens during
cleanup concurrently the close call on the IndexReader can trigger
an AlreadyClosedException when the NRT reader tries to cleanup files
via the IndexWriter. This kind of smells like a bug in Lucene, a close should never throw that exception IMO

s1monw · 2013-11-27T15:03:49Z

Here is an example exception showing the issue

[2013-11-27 13:58:18,425][DEBUG][action.search.type       ] [node_1] [21276] Failed to execute fetch phase
org.apache.lucene.store.AlreadyClosedException: this Directory is closed
    at org.apache.lucene.store.BaseDirectory.ensureOpen(BaseDirectory.java:66)
    at org.elasticsearch.index.store.Store$StoreDirectory.deleteFile(Store.java:370)
    at org.apache.lucene.index.IndexFileDeleter.deleteFile(IndexFileDeleter.java:584)
    at org.apache.lucene.index.IndexFileDeleter.deletePendingFiles(IndexFileDeleter.java:407)
    at org.apache.lucene.index.IndexWriter.deletePendingFiles(IndexWriter.java:4559)
    at org.apache.lucene.index.StandardDirectoryReader.doClose(StandardDirectoryReader.java:371)
    at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:231)
    at org.apache.lucene.search.SearcherManager.decRef(SearcherManager.java:111)
    at org.apache.lucene.search.SearcherManager.decRef(SearcherManager.java:58)
    at org.apache.lucene.search.ReferenceManager.release(ReferenceManager.java:253)
    at org.elasticsearch.index.engine.robin.RobinEngine$RobinSearcher.release(RobinEngine.java:1559)
    at org.elasticsearch.test.engine.MockRobinEngine$AssertingSearcher.release(MockRobinEngine.java:128)
    at org.elasticsearch.search.internal.SearchContext.release(SearchContext.java:210)
    at org.elasticsearch.search.SearchService.freeContext(SearchService.java:514)
    at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:426)
    at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:406)
    at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.executeFetch(TransportSearchQueryThenFetchAction.java:150)
    at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction$2.run(TransportSearchQueryThenFetchAction.java:134)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
    at java.lang.Thread.run(Thread.java:662)

When we relocate a shard we might still have pending SearchContext instances hanging around that will be used in "in-flight" searches on the already relocated shard. This is a valid operation but if we have already closed the underlying directory which happens during cleanup concurrently the close call on the IndexReader can trigger an AlreadyClosedException when the NRT reader tries to cleanup files via the IndexWriter. Closes #4273

When we relocate a shard we might still have pending SearchContext instances hanging around that will be used in "in-flight" searches on the already relocated shard. This is a valid operation but if we have already closed the underlying directory which happens during cleanup concurrently the close call on the IndexReader can trigger an AlreadyClosedException when the NRT reader tries to cleanup files via the IndexWriter. Closes elastic#4273

ghost assigned s1monw Nov 27, 2013

s1monw closed this as completed in 71eb453 Nov 27, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fetch / Count might fail if executed on a relocated shard. #4273

Fetch / Count might fail if executed on a relocated shard. #4273

s1monw commented Nov 27, 2013

s1monw commented Nov 27, 2013

Fetch / Count might fail if executed on a relocated shard. #4273

Fetch / Count might fail if executed on a relocated shard. #4273

Comments

s1monw commented Nov 27, 2013

s1monw commented Nov 27, 2013