Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReindexIT#testReindexTask can fail with assertion error. #46144

Closed
jtibshirani opened this issue Aug 29, 2019 · 6 comments
Closed

ReindexIT#testReindexTask can fail with assertion error. #46144

jtibshirani opened this issue Aug 29, 2019 · 6 comments
Assignees
Labels
:Distributed/Reindex Issues relating to reindex that are not caused by issues further down >test-failure Triaged test failures from CI

Comments

@jtibshirani
Copy link
Contributor

Unfortunately the failure doesn't reproduce for me locally, and I didn't see any clues in the logs as to what could be causing the reindex to fail to complete. This failure first popped up in a master intake build.


Link to the build: https://elasticsearch-ci.elastic.co/view/Elasticsearch%20master/job/elastic+elasticsearch+master+multijob+fast+part1/1032/

Command to reproduce:

./gradlew :client:rest-high-level:integTestRunner \
  --tests "org.elasticsearch.client.ReindexIT.testReindexTask" \
  -Dtests.seed=81E17437413A554A \
  -Dtests.security.manager=true \
  -Dtests.locale=sr-ME \
  -Dtests.timezone=Iran \
  -Dcompiler.java=12 \
  -Druntime.java=11

Relevant excerpt from the logs:

org.elasticsearch.client.ReindexIT > testReindexTask FAILED
    java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([81E17437413A554A:194B793877AFBFD6]:0)
        at org.junit.Assert.fail(Assert.java:86)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at org.junit.Assert.assertTrue(Assert.java:52)
        at org.elasticsearch.client.ReindexIT.lambda$checkCompletionStatus$0(ReindexIT.java:464)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:869)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:843)
        at org.elasticsearch.client.ReindexIT.testReindexTask(ReindexIT.java:152)
@jtibshirani jtibshirani added >test-failure Triaged test failures from CI :Distributed/Reindex Issues relating to reindex that are not caused by issues further down labels Aug 29, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@jtibshirani jtibshirani changed the title ReindexIT.testReindexTask can fail with assertion error. ReindexIT#testReindexTask can fail with assertion error. Aug 29, 2019
@henningandersen henningandersen self-assigned this Aug 29, 2019
@henningandersen
Copy link
Contributor

The server log file contents are here:

[2019-08-29T16:33:32,840][INFO ][o.e.c.m.TemplateUpgradeService] [integTest-0] Starting template upgrade to version 8.0.0, 1 templates will be updated and 0 will be removed
[2019-08-29T16:33:32,843][INFO ][o.e.c.m.MetaDataIndexTemplateService] [integTest-0] adding template [.logstash-management] for index patterns [.logstash]
[2019-08-29T16:33:32,860][INFO ][o.e.c.m.TemplateUpgradeService] [integTest-0] Templates were upgraded successfully to version 8.0.0
[2019-08-29T16:33:32,882][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [integTest-0] adding index lifecycle policy [watch-history-ilm-policy]
[2019-08-29T16:33:32,933][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [integTest-0] adding index lifecycle policy [slm-history-ilm-policy]
[2019-08-29T16:33:33,000][INFO ][o.e.c.m.MetaDataCreateIndexService] [integTest-0] [source123] creating index, cause [api], templates [], shards [1]/[0], mappings []
[2019-08-29T16:33:33,061][INFO ][o.e.c.r.a.AllocationService] [integTest-0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[source123][0]]]).
[2019-08-29T16:33:33,083][INFO ][o.e.c.m.MetaDataCreateIndexService] [integTest-0] [dest2] creating index, cause [api], templates [], shards [1]/[0], mappings []
[2019-08-29T16:33:33,152][INFO ][o.e.c.r.a.AllocationService] [integTest-0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[dest2][0]]]).
[2019-08-29T16:33:33,192][INFO ][o.e.c.m.MetaDataMappingService] [integTest-0] [source123/rtNISclfSl6m6El5alvufA] create_mapping [_doc]
[2019-08-29T16:33:34,702][INFO ][o.e.c.m.MetaDataMappingService] [integTest-0] [source123/rtNISclfSl6m6El5alvufA] update_mapping [_doc]
[2019-08-29T16:33:37,149][INFO ][o.e.c.m.MetaDataMappingService] [integTest-0] [dest2/U4aGRqR4SQm70i7nyuia9A] create_mapping [_doc]
[2019-08-29T16:33:38,592][INFO ][o.e.c.m.MetaDataMappingService] [integTest-0] [dest2/U4aGRqR4SQm70i7nyuia9A] update_mapping [_doc]
[2019-08-29T16:33:43,683][INFO ][o.e.c.m.MetaDataCreateIndexService] [integTest-0] [.tasks] creating index, cause [auto(task api)], templates [], shards [1]/[1], mappings [task]
[2019-08-29T16:33:43,684][INFO ][o.e.c.r.a.AllocationService] [integTest-0] updating number_of_replicas to [0] for indices [.tasks]
[2019-08-29T16:33:49,158][INFO ][o.e.c.m.MetaDataDeleteIndexService] [integTest-0] [dest2/U4aGRqR4SQm70i7nyuia9A] deleting index
[2019-08-29T16:33:49,158][INFO ][o.e.c.m.MetaDataDeleteIndexService] [integTest-0] [source123/rtNISclfSl6m6El5alvufA] deleting index
[2019-08-29T16:33:53,234][INFO ][o.e.c.m.MetaDataIndexTemplateService] [integTest-0] removing template [.logstash-management]
[2019-08-29T16:33:57,340][INFO ][o.e.c.m.TemplateUpgradeService] [integTest-0] Starting template upgrade to version 8.0.0, 1 templates will be updated and 0 will be removed
[2019-08-29T16:33:57,344][INFO ][o.e.c.m.MetaDataIndexTemplateService] [integTest-0] adding template [.logstash-management] for index patterns [.logstash]
[2019-08-29T16:33:59,295][INFO ][o.e.c.m.TemplateUpgradeService] [integTest-0] Templates were upgraded successfully to version 8.0.0
[2019-08-29T16:33:59,297][INFO ][o.e.c.r.a.AllocationService] [integTest-0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.tasks][0]]]).
[2019-08-29T16:33:59,327][INFO ][o.e.t.LoggingTaskListener] [integTest-0] 27270 finished with response BulkByScrollResponse[took=6.5s,timed_out=false,sliceId=null,updated=0,created=2,deleted=0,batches=1,versionConflicts=0,noops=0,retries=0,throttledUntil=0s,bulk_failures=[],search_failures=[]]
[2019-08-29T16:33:59,339][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [integTest-0] adding index lifecycle policy [watch-history-ilm-policy]
[2019-08-29T16:33:59,375][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [integTest-0] adding index lifecycle policy [slm-history-ilm-policy]
[2019-08-29T16:33:59,456][INFO ][o.e.c.m.MetaDataCreateIndexService] [integTest-0] [testupdatebyqueryconflict] creating index, cause [api], templates [], shards [1]/[0], mappings []
[2019-08-29T16:33:59,532][INFO ][o.e.c.r.a.AllocationService] [integTest-0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[testupdatebyqueryconflict][0]]]).
[2019-08-29T16:33:59,558][INFO ][o.e.c.m.MetaDataMappingService] [integTest-0] [testupdatebyqueryconflict/0P4bGiQjSAixEP-z0gIvEw] create_mapping [_doc]
[2019-08-29T16:33:59,667][INFO ][o.e.c.m.MetaDataDeleteIndexService] [integTest-0] [testupdatebyqueryconflict/0P4bGiQjSAixEP-z0gIvEw] deleting index
[2019-08-29T16:33:59,667][INFO ][o.e.c.m.MetaDataDeleteIndexService] [integTest-0] [.tasks/ZWziASpISrysirpbsi71OQ] deleting index
[2019-08-29T16:33:59,704][INFO ][o.e.c.m.MetaDataIndexTemplateService] [integTest-0] removing template [.logstash-management]

Looks like it was very slow in running this. Not only reindex was slow, but also storing the task result afterwards.

@henningandersen
Copy link
Contributor

@henningandersen
Copy link
Contributor

Following is some of the log of activity right before/after:

18:33:18 > Task :distribution:archives:buildNoJdkDarwinTar
18:33:18 
18:33:18 > Task :distribution:tools:plugin-cli:test
18:33:18 Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
18:33:18 Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
18:33:20 
18:33:20 > Task :qa:wildfly:writeElasticsearchProperties
18:33:21 > Task :qa:smoke-test-plugins:integTest
18:33:21 > Task :qa:smoke-test-plugins:check
18:33:21 
18:33:21 > Task :distribution:tools:keystore-cli:test
18:33:21 Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
18:33:21 
18:33:21 > Task :distribution:archives:no-jdk-darwin-tar:checkExtraction
18:33:21 > Task :qa:smoke-test-plugins:checkPart1
18:33:21 > Task :distribution:archives:no-jdk-darwin-tar:checkLicense
18:33:21 > Task :distribution:archives:no-jdk-darwin-tar:checkNotice
18:33:21 > Task :distribution:archives:no-jdk-darwin-tar:check
18:33:21 > Task :distribution:archives:no-jdk-darwin-tar:checkPart1
18:33:22 
18:33:22 > Task :plugins:repository-hdfs:integTestSecureHaRunner
18:33:22 WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/var/lib/jenkins/.gradle/caches/modules-2/files-2.1/org.apache.hadoop/hadoop-auth/2.8.1/688ccccc0e0739d8737a93b0039a4a661e52084b/hadoop-auth-2.8.1.jar) to method sun.security.krb5.Config.getInstance()
18:33:22 WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/var/lib/jenkins/.gradle/caches/modules-2/files-2.1/org.apache.hadoop/hadoop-auth/2.8.1/688ccccc0e0739d8737a93b0039a4a661e52084b/hadoop-auth-2.8.1.jar) to method sun.security.krb5.Config.getDefaultRealm()
18:33:22 WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/var/lib/jenkins/.gradle/caches/modules-2/files-2.1/org.apache.hadoop/hadoop-auth/2.8.1/688ccccc0e0739d8737a93b0039a4a661e52084b/hadoop-auth-2.8.1.jar) to method sun.security.krb5.Config.getInstance()
18:33:22 WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/var/lib/jenkins/.gradle/caches/modules-2/files-2.1/org.apache.hadoop/hadoop-auth/2.8.1/688ccccc0e0739d8737a93b0039a4a661e52084b/hadoop-auth-2.8.1.jar) to method sun.security.krb5.Config.getDefaultRealm()
18:33:27 
18:33:27 > Task :distribution:tools:keystore-cli:test
18:33:27 Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
18:33:27 Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
18:33:27 Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
18:33:27 
18:33:27 > Task :plugins:repository-hdfs:secureHaHdfsFixture#stop
18:33:27 > Task :plugins:repository-hdfs:integTestSecureHa
18:33:27 
18:33:27 > Task :distribution:tools:plugin-cli:test
18:33:27 Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
18:33:27 Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
18:33:28 
18:33:28 > Task :plugins:repository-hdfs:check
18:33:28 > Task :plugins:repository-hdfs:checkPart1
18:33:29 
18:33:29 > Task :test:fixtures:krb5kdc-fixture:composeDown
18:33:29 Stopping 1ce265e3fc8c0780c028523db0e56d51_krb5kdc-fixture__hdfs_1  ... 
18:33:29 Stopping 1ce265e3fc8c0780c028523db0e56d51_krb5kdc-fixture__peppa_1 ... 
18:33:30 
18:33:30 > Task :distribution:tools:keystore-cli:check
18:33:30 > Task :distribution:tools:keystore-cli:checkPart1
18:33:35 > Task :qa:wildfly:startWildfly
18:33:36 > Task :distribution:packages:buildDeb
18:33:37 > Task :qa:wildfly:configureClient
18:33:37 > Task :distribution:tools:plugin-cli:check
18:33:37 > Task :distribution:tools:plugin-cli:checkPart1
18:33:38 > Task :distribution:archives:buildNoJdkLinuxTar
18:33:38 > Task :plugins:repository-s3:s3Fixture#stop
18:33:38 > Task :plugins:repository-s3:integTest
18:33:38 > Task :plugins:repository-s3:integTestECS
18:33:39 
18:33:39 > Task :qa:wildfly:integTestRunner
18:33:39 Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
18:33:46 
18:33:46 > Task :distribution:archives:no-jdk-linux-tar:checkExtraction
18:33:46 > Task :distribution:archives:no-jdk-linux-tar:checkLicense
18:33:46 > Task :distribution:archives:no-jdk-linux-tar:checkNotice
18:33:46 > Task :distribution:archives:no-jdk-linux-tar:check
18:33:46 > Task :distribution:archives:no-jdk-linux-tar:checkPart1
18:33:52 > Task :qa:wildfly:stopWildfly
18:33:52 > Task :qa:wildfly:integTest
18:33:52 > Task :qa:wildfly:check
18:33:52 > Task :qa:wildfly:checkPart1
18:33:53 
18:33:53 > Task :test:fixtures:krb5kdc-fixture:composeDown
18:33:53 Stopping 1ce265e3fc8c0780c028523db0e56d51_krb5kdc-fixture__peppa_1 ... done
18:33:53 
18:33:53 > Task :distribution:packages:deb:checkExtraction SKIPPED
18:33:53 > Task :distribution:packages:deb:checkLicense SKIPPED
18:33:53 > Task :distribution:packages:deb:checkLicenseMetadata SKIPPED
18:33:53 > Task :distribution:packages:deb:checkNotice SKIPPED
18:33:53 > Task :distribution:packages:deb:check UP-TO-DATE
18:33:53 > Task :distribution:packages:deb:checkPart1 UP-TO-DATE
18:33:57 
18:33:57 > Task :test:fixtures:krb5kdc-fixture:composeDown
18:33:57 Stopping 1ce265e3fc8c0780c028523db0e56d51_krb5kdc-fixture__hdfs_1  ... done
18:33:58 Removing 1ce265e3fc8c0780c028523db0e56d51_krb5kdc-fixture__hdfs_1  ... 
18:33:58 Removing 1ce265e3fc8c0780c028523db0e56d51_krb5kdc-fixture__peppa_1 ... 
18:33:59 
18:33:59 > Task :distribution:packages:processNoJdkDebFiles
18:33:59 
18:33:59 > Task :client:rest-high-level:integTestRunner
18:33:59 
18:33:59 
18:33:59 org.elasticsearch.client.ReindexIT > testReindexTask FAILED
18:33:59     java.lang.AssertionError
18:33:59         at __randomizedtesting.SeedInfo.seed([81E17437413A554A:194B793877AFBFD6]:0)
18:33:59         at org.junit.Assert.fail(Assert.java:86)
18:33:59         at org.junit.Assert.assertTrue(Assert.java:41)
18:33:59         at org.junit.Assert.assertTrue(Assert.java:52)
18:33:59         at org.elasticsearch.client.ReindexIT.lambda$checkCompletionStatus$0(ReindexIT.java:464)
18:33:59         at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:869)
18:33:59         at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:843)
18:33:59         at org.elasticsearch.client.ReindexIT.testReindexTask(ReindexIT.java:152)
18:33:59 
18:33:59 > Task :test:fixtures:krb5kdc-fixture:composeDown
18:33:59 Removing 1ce265e3fc8c0780c028523db0e56d51_krb5kdc-fixture__hdfs_1  ... done
18:33:59 Removing 1ce265e3fc8c0780c028523db0e56d51_krb5kdc-fixture__peppa_1 ... done
18:33:59 Removing network 1ce265e3fc8c0780c028523db0e56d51_krb5kdc-fixture__default
18:34:00 
18:34:00 > Task :plugins:repository-s3:integTestMinioRunner
18:34:00 Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
18:34:00 
18:34:00 > Task :distribution:archives:buildNoJdkWindowsZip

It looks like archives are being build concurrently with test running, which I imagine could be the cause of the slowness depending on available iops/bandwidth on disks.

Following are my main suspects of harmful parallel activity (which is rather much guesswork):

18:33:18 > Task :distribution:archives:buildNoJdkDarwinTar
18:33:29 > Task :test:fixtures:krb5kdc-fixture:composeDown
18:33:36 > Task :distribution:packages:buildDeb
18:33:38 > Task :distribution:archives:buildNoJdkLinuxTar

A hypothesis here is that there is heavy writing (possibly in multiple threads) going on while the test runs and that this delays the test enough that it fails.

@henningandersen
Copy link
Contributor

henningandersen commented Sep 2, 2019

Ping @elastic/es-core-infra

I hope you can comment on this and maybe read more out of the logs (on what activity was really parallel) than I can. Maybe also add info on how our CI servers are sized (seems like the job used 16 parallel threads)?

@henningandersen
Copy link
Contributor

The resolution done for #46091 should also ensure this test stays within its 10s limit, so closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Reindex Issues relating to reindex that are not caused by issues further down >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

3 participants