Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SearchContext is occasionally closed prematurely #5165

Closed
jaymode opened this issue Feb 18, 2014 · 3 comments
Closed

SearchContext is occasionally closed prematurely #5165

jaymode opened this issue Feb 18, 2014 · 3 comments

Comments

@jaymode
Copy link
Member

jaymode commented Feb 18, 2014

We have noticed that sometime all shards do not respond to a DFS query then fetch and we get only 4/5 shards responding (and in debugging noticed the same for query then fetch). Turning the logs to debug we see the following message when only 4/5 shards respond.

2014-02-18 00:01:19,574 DEBUG (elasticsearch[dev][search][T#2]) log4j.Log4jESLogger<109>: [dev] [17] Failed to execute query phase
org.elasticsearch.search.SearchContextMissingException: No search context found for id [17] 
        at org.elasticsearch.search.SearchService.findContext(SearchService.java:455) 
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:279) 
        at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:236) 
        at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.executeQuery(TransportSearchDfsQueryThenFetchAction.java:148)
        at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction$2.run(TransportSearchDfsQueryThenFetchAction.java:132) 
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744) 

Looking into what causes this, I was able to reproduce the issue more quickly by setting the SearchService reaper thread to run almost continuously by explicitly setting "search.keep_alive_interval" to a low value in the milliseconds range vs every minute. (we do see the same behavior without modifying this value, but

I saw two issues occur with some extra debugging. The first is that when SearchContext is created, the default value of lastAccessedTime is 0 and if the reaper runs against that context quickly enough, the context will be freed before it is used.

2014-02-18 15:32:53,394 DEBUG (elasticsearch[dev][scheduler][T#1]) log4j.Log4jESLogger<104>: [dev] freeing search context 1390 time: 1392737573376 lastAccessTime: 0 keepAlive: 300000
2014-02-18 15:32:53,399 DEBUG (elasticsearch[dev][search][T#3]) log4j.Log4jESLogger<109>: [dev] [1390] Failed to execute query phase
org.elasticsearch.search.SearchContextMissingException: No search context found for id [1390]
        at org.elasticsearch.search.SearchService.findContext(SearchService.java:455)
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:279)
        at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:236)
        at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.executeQuery(TransportSearchDfsQueryThenFetchAction.java:148)
        at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction$2.run(TransportSearchDfsQueryThenFetchAction.java:132)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

The second the reaper calls context.lastAccessTime() multiple times, but the value can change after the first if statement and an incorrect value will be used in the next statement (such as -1 when the context is being used).

2014-02-18 15:24:38,721 DEBUG (elasticsearch[dev][scheduler][T#1]) log4j.Log4jESLogger<104>: [dev] freeing search context 1691 time: 1392737078619 lastAccessTime: -1 keepAlive: 300000
2014-02-18 15:24:38,725 DEBUG (elasticsearch[dev][search][T#4]) log4j.Log4jESLogger<109>: [dev] [1691] Failed to execute query phase
org.elasticsearch.search.SearchContextMissingException: No search context found for id [1691]
        at org.elasticsearch.search.SearchService.findContext(SearchService.java:455)
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:279)
        at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:236)
        at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.executeQuery(TransportSearchDfsQueryThenFetchAction.java:148)
        at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction$2.run(TransportSearchDfsQueryThenFetchAction.java:132)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

This gist contains code that I have used to resolve the issues. If this needs to be submitted as a pull request, I can do that as well.

@s1monw
Copy link
Contributor

s1monw commented Feb 18, 2014

I looked at the gist and I think you should submit this as a PR! Good catch! Can you make sure you sign the CLA as well so we can pull this in quickly.

One comment about the initialization I think we should initialize the new context with -1 instead so we just skip it on the reaper?

@jaymode
Copy link
Member Author

jaymode commented Feb 18, 2014

Will create a PR and sign the CLA shortly.

Regarding initialization, I thought about using -1 but decided against it as there could be some unknown case, where lastAccessTime is always -1, so the search context will never be cleaned up by the reaper. Initializing to the estimated time allows it to be cleaned up by the reaper in that case. Let me know what you think.

@s1monw
Copy link
Contributor

s1monw commented Feb 18, 2014

@jaymode I agree on the -1 and the special casing. We actually set it to -1 each time we access the context. to prevent the reaper from closing the context while we are using it. I think we should do the same when initialising. There should be some logic that frees the context if there is an exception thrown too.

jaymode pushed a commit to jaymode/elasticsearch that referenced this issue Feb 18, 2014
Fixes SearchContext from being closed during initialization or immediately
after processing is started

Closes elastic#5165
s1monw pushed a commit that referenced this issue Feb 19, 2014
Fixes SearchContext from being closed during initialization or immediately
after processing is started

Closes #5165
s1monw added a commit that referenced this issue Feb 19, 2014
s1monw pushed a commit that referenced this issue Feb 19, 2014
Fixes SearchContext from being closed during initialization or immediately
after processing is started

Closes #5165
s1monw added a commit that referenced this issue Feb 19, 2014
s1monw pushed a commit that referenced this issue Feb 19, 2014
Fixes SearchContext from being closed during initialization or immediately
after processing is started

Closes #5165
s1monw added a commit that referenced this issue Feb 19, 2014
@s1monw s1monw closed this as completed in 3dec916 Feb 19, 2014
s1monw added a commit that referenced this issue Feb 19, 2014
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
Fixes SearchContext from being closed during initialization or immediately
after processing is started

Closes elastic#5165
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
Fixes SearchContext from being closed during initialization or immediately
after processing is started

Closes elastic#5165
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants