Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] AssertionError in ShardSearchStats #37185

Closed
droberts195 opened this issue Jan 7, 2019 · 3 comments
Closed

[CI] AssertionError in ShardSearchStats #37185

droberts195 opened this issue Jan 7, 2019 · 3 comments
Assignees
Labels
:Search/Search Search-related issues that do not fall into other categories >test-failure Triaged test failures from CI v7.2.0 v8.0.0-alpha1

Comments

@droberts195
Copy link
Contributor

A 6.6 ML test timed out in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.6+matrix-java-periodic/ES_BUILD_JAVA=java11,ES_RUNTIME_JAVA=zulu11,nodes=virtual&&linux/38/consoleText because the thread that was initializing a scroll threw an assertion error:

ERROR   0.00s J3 | TooManyJobsIT (suite) <<< FAILURES!
   > Throwable #1: java.lang.Exception: Suite timeout exceeded (>= 1200000 msec).
   >    at __randomizedtesting.SeedInfo.seed([ACEE7CCD596311C5]:0)
   > Throwable #2: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1882, name=elasticsearch[node_t1][search][T#3], state=RUNNABLE, group=TGRP-TooManyJobsIT]
   > Caused by: java.lang.AssertionError
   >    at __randomizedtesting.SeedInfo.seed([ACEE7CCD596311C5]:0)
   >    at org.elasticsearch.index.search.stats.ShardSearchStats.lambda$onQueryPhase$2(ShardSearchStats.java:101)
   >    at org.elasticsearch.index.search.stats.ShardSearchStats.computeStats(ShardSearchStats.java:142)
   >    at org.elasticsearch.index.search.stats.ShardSearchStats.onQueryPhase(ShardSearchStats.java:93)
   >    at org.elasticsearch.index.shard.SearchOperationListener$CompositeListener.onQueryPhase(SearchOperationListener.java:155)
   >    at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:407)
   >    at org.elasticsearch.search.SearchService.access$100(SearchService.java:126)
   >    at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:360)
   >    at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:356)
   >    at org.elasticsearch.search.SearchService$4.doRun(SearchService.java:1117)
   >    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:759)
   >    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
   >    at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41)
   >    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
   >    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
   >    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
   >    at java.base/java.lang.Thread.run(Thread.java:834)

The REPRO command is this:

./gradlew :x-pack:plugin:ml:internalClusterTest \
  -Dtests.seed=ACEE7CCD596311C5 \
  -Dtests.class=org.elasticsearch.xpack.ml.integration.TooManyJobsIT \
  -Dtests.method="testSingleNode" \
  -Dtests.security.manager=true \
  -Dtests.locale=yue-Hans-CN \
  -Dtests.timezone=UCT \
  -Dcompiler.java=11 \
  -Druntime.java=11

However, it doesn't reproduce locally for me.

The scroll that was being initialized when this happened is the one that is set up in

SearchRequest searchRequest = new SearchRequest(index);
searchRequest.indicesOptions(MlIndicesUtils.addIgnoreUnavailable(SearchRequest.DEFAULT_INDICES_OPTIONS));
searchRequest.scroll(CONTEXT_ALIVE_DURATION);
searchRequest.source(new SearchSourceBuilder()
.size(BATCH_SIZE)
.query(getQuery())
.fetchSource(shouldFetchSource())
.sort(SortBuilders.fieldSort(ElasticsearchMappings.ES_DOC)));
SearchResponse searchResponse = client.search(searchRequest).actionGet();

@droberts195 droberts195 added :Search/Search Search-related issues that do not fall into other categories >test-failure Triaged test failures from CI v6.6.0 labels Jan 7, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@alpar-t alpar-t added the v7.0.0 label Jan 9, 2019
@s1monw s1monw assigned s1monw and unassigned s1monw Jan 10, 2019
s1monw added a commit to s1monw/elasticsearch that referenced this issue Jan 15, 2019
…called

Today we have several implementations of executing SearchOperationListener
in SearchService. While all of them seem to be safe at least on, the one that
executes scroll searches can cause illegal execution of SearchOperationListener
that can then in-turn trigger assertions in ShardSearchStats. This change
adds a SearchOperationListenerExecutor that uses try-with blocks to ensure
listeners are called in a safe way.

Relates to elastic#37185
@pcsanwald pcsanwald added v6.7.0 and removed v6.6.0 labels Jan 17, 2019
s1monw added a commit that referenced this issue Jan 23, 2019
…called (#37467)

Today we have several implementations of executing SearchOperationListener
in SearchService. While all of them seem to be safe at least on, the one that
executes scroll searches can cause illegal execution of SearchOperationListener
that can then in-turn trigger assertions in ShardSearchStats. This change
adds a SearchOperationListenerExecutor that uses try-with blocks to ensure
listeners are called in a safe way.

Relates to #37185
s1monw added a commit that referenced this issue Jan 23, 2019
…called (#37467)

Today we have several implementations of executing SearchOperationListener
in SearchService. While all of them seem to be safe at least on, the one that
executes scroll searches can cause illegal execution of SearchOperationListener
that can then in-turn trigger assertions in ShardSearchStats. This change
adds a SearchOperationListenerExecutor that uses try-with blocks to ensure
listeners are called in a safe way.

Relates to #37185
@jasontedor jasontedor added v8.0.0 and removed v7.0.0 labels Feb 6, 2019
@s1monw
Copy link
Contributor

s1monw commented Feb 19, 2019

we just spoke about this again and decided to close it since we added infra to prevent it in #37467

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search/Search Search-related issues that do not fall into other categories >test-failure Triaged test failures from CI v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

No branches or pull requests

8 participants