Wait until engine is started up when acquiring searcher #7456

s1monw · 2014-08-26T11:56:20Z

Today we have a small window where a searcher can be acquired but the
engine is in the state of starting up. This causes a NPE triggering a
shard failure if we are fast enough. This commit fixes this situation
gracefully.

Closes #7455

bleskes · 2014-08-26T12:00:44Z

LGTM

Today we have a small window where a searcher can be acquired but the engine is in the state of starting up. This causes a NPE triggering a shard failure if we are fast enough. This commit fixes this situation gracefully. Closes elastic#7455

ajhalani · 2014-09-30T15:28:43Z

We are seeing an issue with v1.3.2 which may be related to this. Would really appreciate if someone could confirm it's indeed related or not..

After node restart, sometimes one of the shard's recovery is indefinitely stuck with no updates after the trace [index.engine.internal ] [linux01.node] [myindex][0] starting engine. Doing hot threads (few times with a minute break), get dumps like below -

   102.9% (514.5ms out of 500ms) cpu usage by thread 'elasticsearch[linux01.node][generic][T#5]'
     10/10 snapshots sharing following 15 elements
       java.util.Collections$UnmodifiableCollection$1.hasNext(Collections.java:1066)
       org.elasticsearch.index.engine.internal.InternalEngine$SearchFactory.newSearcher(InternalEngine.java:1561)
       org.apache.lucene.search.SearcherManager.getSearcher(SearcherManager.java:160)
       org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:122)
       org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)
       org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176)
       org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:225)
       org.elasticsearch.index.engine.internal.InternalEngine.refresh(InternalEngine.java:766)
       org.elasticsearch.index.engine.internal.InternalEngine.delete(InternalEngine.java:686)
       org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:780)
       org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:250)
       org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
       java.lang.Thread.run(Thread.java:722)

clintongormley · 2014-10-14T13:22:37Z

@bleskes can you confirm that #7456 (comment) is a symptom of this (fixed) issue?

bleskes · 2014-10-14T13:34:50Z

@clintongormley this bug caused an NPE (because a variable was not set yet). The stack trace is indicating that the recovery is in it's translog phase, meaning the engine has already started.

@ajhalani it looks like you use a lot of delete-by-query operations. Is that true? Those are replayed when you recover the shard, which takes longer then simple indexing (and we refresh after them by default, which is not needed during recovery - a potential optimization we can do here).

ajhalani · 2014-10-14T15:34:32Z

Thanks for response.

@bleskes - you're right. My issue didn't seem to be related to this. It was indeed due to slowness in recovery from translog. In hot threads dump, I see frequent refresh and flush, and it was much slower than regular indexing..

s1monw added the review label Aug 26, 2014

bleskes removed the review label Aug 26, 2014

s1monw force-pushed the issues/7455 branch from 02649c0 to 0676869 Compare August 26, 2014 12:07

s1monw merged commit 0676869 into elastic:master Aug 26, 2014

clintongormley changed the title ~~[ENGINE] Wait until engine is started up when acquireing searcher~~ [ENGINE] Wait until engine is started up when acquiring searcher Aug 26, 2014

clintongormley added >bug v1.3.3 v1.4.0.Beta1 v2.0.0-beta1 labels Sep 8, 2014

clintongormley changed the title ~~[ENGINE] Wait until engine is started up when acquiring searcher~~ Internal: Wait until engine is started up when acquiring searcher Sep 8, 2014

clintongormley changed the title ~~Internal: Wait until engine is started up when acquiring searcher~~ Wait until engine is started up when acquiring searcher Jun 7, 2015

clintongormley added the :Internal label Jun 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wait until engine is started up when acquiring searcher #7456

Wait until engine is started up when acquiring searcher #7456

s1monw commented Aug 26, 2014

bleskes commented Aug 26, 2014

ajhalani commented Sep 30, 2014

clintongormley commented Oct 14, 2014

bleskes commented Oct 14, 2014

ajhalani commented Oct 14, 2014

Wait until engine is started up when acquiring searcher #7456

Wait until engine is started up when acquiring searcher #7456

Conversation

s1monw commented Aug 26, 2014

bleskes commented Aug 26, 2014

ajhalani commented Sep 30, 2014

clintongormley commented Oct 14, 2014

bleskes commented Oct 14, 2014

ajhalani commented Oct 14, 2014