Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait until engine is started up when acquiring searcher #7456

Merged
merged 1 commit into from Aug 26, 2014

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented Aug 26, 2014

Today we have a small window where a searcher can be acquired but the
engine is in the state of starting up. This causes a NPE triggering a
shard failure if we are fast enough. This commit fixes this situation
gracefully.

Closes #7455

@s1monw s1monw added the review label Aug 26, 2014
@bleskes
Copy link
Contributor

bleskes commented Aug 26, 2014

LGTM

@bleskes bleskes removed the review label Aug 26, 2014
Today we have a small window where a searcher can be acquired but the
engine is in the state of starting up. This causes a NPE triggering a
shard failure if we are fast enough. This commit fixes this situation
gracefully.

Closes elastic#7455
@s1monw s1monw merged commit 0676869 into elastic:master Aug 26, 2014
@clintongormley clintongormley changed the title [ENGINE] Wait until engine is started up when acquireing searcher [ENGINE] Wait until engine is started up when acquiring searcher Aug 26, 2014
@clintongormley clintongormley changed the title [ENGINE] Wait until engine is started up when acquiring searcher Internal: Wait until engine is started up when acquiring searcher Sep 8, 2014
@ajhalani
Copy link

We are seeing an issue with v1.3.2 which may be related to this. Would really appreciate if someone could confirm it's indeed related or not..

After node restart, sometimes one of the shard's recovery is indefinitely stuck with no updates after the trace [index.engine.internal ] [linux01.node] [myindex][0] starting engine. Doing hot threads (few times with a minute break), get dumps like below -

   102.9% (514.5ms out of 500ms) cpu usage by thread 'elasticsearch[linux01.node][generic][T#5]'
     10/10 snapshots sharing following 15 elements
       java.util.Collections$UnmodifiableCollection$1.hasNext(Collections.java:1066)
       org.elasticsearch.index.engine.internal.InternalEngine$SearchFactory.newSearcher(InternalEngine.java:1561)
       org.apache.lucene.search.SearcherManager.getSearcher(SearcherManager.java:160)
       org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:122)
       org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)
       org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176)
       org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:225)
       org.elasticsearch.index.engine.internal.InternalEngine.refresh(InternalEngine.java:766)
       org.elasticsearch.index.engine.internal.InternalEngine.delete(InternalEngine.java:686)
       org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:780)
       org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:250)
       org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
       java.lang.Thread.run(Thread.java:722)

@clintongormley
Copy link

@bleskes can you confirm that #7456 (comment) is a symptom of this (fixed) issue?

@bleskes
Copy link
Contributor

bleskes commented Oct 14, 2014

@clintongormley this bug caused an NPE (because a variable was not set yet). The stack trace is indicating that the recovery is in it's translog phase, meaning the engine has already started.

@ajhalani it looks like you use a lot of delete-by-query operations. Is that true? Those are replayed when you recover the shard, which takes longer then simple indexing (and we refresh after them by default, which is not needed during recovery - a potential optimization we can do here).

@ajhalani
Copy link

Thanks for response.

@bleskes - you're right. My issue didn't seem to be related to this. It was indeed due to slowness in recovery from translog. In hot threads dump, I see frequent refresh and flush, and it was much slower than regular indexing..

@clintongormley clintongormley changed the title Internal: Wait until engine is started up when acquiring searcher Wait until engine is started up when acquiring searcher Jun 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Internal: Wait until engine has started up when acquiring searcher
4 participants