test fails with: can't resolve processed cluster state with uuid #32767

alpar-t · 2018-08-10T07:23:09Z

Example build failure

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=amazon/2608/console

Reproduction line (does not reproduce locally)

./gradlew :server:integTest \
  -Dtests.seed=CFEB21B624409D08 \
  -Dtests.class=org.elasticsearch.cluster.ack.AckIT \
  -Dtests.method="testIndicesAliasesAcknowledgement" \
  -Dtests.security.manager=true \
  -Dtests.locale=ar-EG \
  -Dtests.timezone=Europe/Busingen

Example relevant log

12:05:51   1> [2018-08-09T14:05:49,772][INFO ][o.e.c.a.AckIT            ] [testIndicesAliasesAcknowledgement] before test
12:05:51   1> [2018-08-09T14:05:49,772][INFO ][o.e.c.a.AckIT            ] [testIndicesAliasesAcknowledgement] [AckIT#testIndicesAliasesAcknowledgement]: setting up test
12:05:51   1> [2018-08-09T14:05:49,773][WARN ][o.e.d.z.PublishClusterStateAction] [node_sm0] timed out waiting for all nodes to process published state [98] (timeout [0s], pending nodes: [{node_sd1}{SUmmm71tS4S8Rjjmu7RRzQ}{KMqfr68_QR2_o6qIfel-pw}{127.0.0.1}{127.0.0.1:37251}, {node_sc3}{D3fZVHArSOm2z7CSJQjLLQ}{pdC1isitQaSDqccMh9K1ZQ}{127.0.0.1}{127.0.0.1:36167}, {node_sd2}{jqkfT8R_Q_ihxv7ORnNGcQ}{mRSClYlNTSu04uSSA50YbQ}{127.0.0.1}{127.0.0.1:39169}])
12:05:51   1> [2018-08-09T14:05:49,786][INFO ][o.e.c.m.MetaDataIndexTemplateService] [node_sm0] adding template [random_index_template] for index patterns [*]
12:05:51   1> [2018-08-09T14:05:49,787][WARN ][o.e.d.z.PublishClusterStateAction] [node_sm0] timed out waiting for all nodes to process published state [99] (timeout [0s], pending nodes: [{node_sd1}{SUmmm71tS4S8Rjjmu7RRzQ}{KMqfr68_QR2_o6qIfel-pw}{127.0.0.1}{127.0.0.1:37251}, {node_sc3}{D3fZVHArSOm2z7CSJQjLLQ}{pdC1isitQaSDqccMh9K1ZQ}{127.0.0.1}{127.0.0.1:36167}, {node_sd2}{jqkfT8R_Q_ihxv7ORnNGcQ}{mRSClYlNTSu04uSSA50YbQ}{127.0.0.1}{127.0.0.1:39169}])
12:05:51   1> [2018-08-09T14:05:49,790][INFO ][o.e.c.a.AckIT            ] [testIndicesAliasesAcknowledgement] [AckIT#testIndicesAliasesAcknowledgement]: all set up test
12:05:51   1> [2018-08-09T14:05:49,790][INFO ][o.e.c.a.AckIT            ] [testIndicesAliasesAcknowledgement] using custom data_path for index: [bZFgSZDbso]
12:05:51   1> [2018-08-09T14:05:49,802][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_sm0] [test] creating index, cause [api], templates [random_index_template], shards [1]/[0], mappings []
12:05:51   1> [2018-08-09T14:05:49,803][WARN ][o.e.d.z.PublishClusterStateAction] [node_sm0] timed out waiting for all nodes to process published state [100] (timeout [0s], pending nodes: [{node_sd1}{SUmmm71tS4S8Rjjmu7RRzQ}{KMqfr68_QR2_o6qIfel-pw}{127.0.0.1}{127.0.0.1:37251}, {node_sc3}{D3fZVHArSOm2z7CSJQjLLQ}{pdC1isitQaSDqccMh9K1ZQ}{127.0.0.1}{127.0.0.1:36167}, {node_sd2}{jqkfT8R_Q_ihxv7ORnNGcQ}{mRSClYlNTSu04uSSA50YbQ}{127.0.0.1}{127.0.0.1:39169}])
12:05:51   1> [2018-08-09T14:05:49,813][INFO ][o.e.c.a.AckIT            ] [testIndicesAliasesAcknowledgement] [AckIT#testIndicesAliasesAcknowledgement]: cleaning up after test
12:05:51   1> [2018-08-09T14:05:49,815][ERROR][o.e.t.d.TestZenDiscovery ] [node_sd2] unexpected failure applying [master {node_sm0}{hGe5mTwjSEeYCIoxAs1RNQ}{AIWCpEcCT2-pAsyJJvAk6w}{127.0.0.1}{127.0.0.1:36719} committed version [100]]
12:05:51   1> java.lang.IllegalStateException: can't resolve processed cluster state with uuid [8KrcUpcjR4ulxE1zmnc2Xg], version [100]
12:05:51   1> 	at org.elasticsearch.discovery.zen.PendingClusterStatesQueue.markAsProcessed(PendingClusterStatesQueue.java:147) ~[main/:?]
12:05:51   1> 	at org.elasticsearch.discovery.zen.ZenDiscovery$3.onSuccess(ZenDiscovery.java:795) [main/:?]
12:05:51   1> 	at org.elasticsearch.cluster.service.ClusterApplierService$SafeClusterApplyListener.onSuccess(ClusterApplierService.java:533) [main/:?]
12:05:51   1> 	at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:485) [main/:?]
12:05:51   1> 	at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:427) [main/:?]
12:05:51   1> 	at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:157) [main/:?]
12:05:51   1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) [main/:?]
12:05:51   1> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244) [main/:?]
12:05:51   1> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207) [main/:?]
12:05:51   1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
12:05:51   1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
12:05:51   1> 	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
12:05:51   1> [2018-08-09T14:05:49,815][ERROR][o.e.t.d.TestZenDiscovery ] [node_sd2] unexpected exception while failing [master {node_sm0}{hGe5mTwjSEeYCIoxAs1RNQ}{AIWCpEcCT2-pAsyJJvAk6w}{127.0.0.1}{127.0.0.1:36719} committed version [100]]
12:05:51   1> java.lang.IllegalArgumentException: can't resolve failed cluster state with uuid [8KrcUpcjR4ulxE1zmnc2Xg], version [100]
12:05:51   1> 	at org.elasticsearch.discovery.zen.PendingClusterStatesQueue.markAsFailed(PendingClusterStatesQueue.java:110) ~[main/:?]
12:05:51   1> 	at org.elasticsearch.discovery.zen.ZenDiscovery$3.onFailure(ZenDiscovery.java:807) [main/:?]
12:05:51   1> 	at org.elasticsearch.discovery.zen.ZenDiscovery$3.onSuccess(ZenDiscovery.java:797) [main/:?]
12:05:51   1> 	at org.elasticsearch.cluster.service.ClusterApplierService$SafeClusterApplyListener.onSuccess(ClusterApplierService.java:533) [main/:?]
12:05:51   1> 	at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:485) [main/:?]
12:05:51   1> 	at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:427) [main/:?]
12:05:51   1> 	at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:157) [main/:?]
12:05:51   2> REPRODUCE WITH: ./gradlew :server:integTest -Dtests.seed=CFEB21B624409D08 -Dtests.class=org.elasticsearch.cluster.ack.AckIT -Dtests.method="testIndicesAliasesAcknowledgement" -Dtests.security.manager=true -Dtests.locale=ar-EG -Dtests.timezone=Europe/Busingen
12:05:51   1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) [main/:?]
12:05:51   1> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244) [main/:?]
12:05:51   1> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207) [main/:?]
12:05:51   1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
12:05:51   1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
12:05:51   1> 	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
12:05:51   1> 	Suppressed: java.lang.IllegalStateException: can't resolve processed cluster state with uuid [8KrcUpcjR4ulxE1zmnc2Xg], version [100]
12:05:51   1> 		at org.elasticsearch.discovery.zen.PendingClusterStatesQueue.markAsProcessed(PendingClusterStatesQueue.java:147) ~[main/:?]
12:05:51   1> 		at org.elasticsearch.discovery.zen.ZenDiscovery$3.onSuccess(ZenDiscovery.java:795) [main/:?]
12:05:51   1> 		at org.elasticsearch.cluster.service.ClusterApplierService$SafeClusterApplyListener.onSuccess(ClusterApplierService.java:533) [main/:?]
12:05:51   1> 		at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:485) [main/:?]
12:05:51   1> 		at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:427) [main/:?]
12:05:51   1> 		at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:157) [main/:?]
12:05:51   1> 		at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) [main/:?]
12:05:51   1> 		at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244) [main/:?]
12:05:51   1> 		at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207) [main/:?]
12:05:51   1> 		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
12:05:51   1> 		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
12:05:51   1> 		at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
12:05:51   1> [2018-08-09T14:05:49,819][INFO ][o.e.c.r.a.AllocationService] [node_sm0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[test][0]] ...]).
12:05:51   1> [2018-08-09T14:05:49,820][WARN ][o.e.d.z.PublishClusterStateAction] [node_sm0] timed out waiting for all nodes to process published state [101] (timeout [0s], pending nodes: [{node_sd1}{SUmmm71tS4S8Rjjmu7RRzQ}{KMqfr68_QR2_o6qIfel-pw}{127.0.0.1}{127.0.0.1:37251}, {node_sc3}{D3fZVHArSOm2z7CSJQjLLQ}{pdC1isitQaSDqccMh9K1ZQ}{127.0.0.1}{127.0.0.1:36167}, {node_sd2}{jqkfT8R_Q_ihxv7ORnNGcQ}{mRSClYlNTSu04uSSA50YbQ}{127.0.0.1}{127.0.0.1:39169}])
12:05:51   1> [2018-08-09T14:05:49,823][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_sm0] [test/TdvnaRhdSu290Bl-3Dj4qg] deleting index
12:05:51   1> [2018-08-09T14:05:49,823][WARN ][o.e.d.z.PublishClusterStateAction] [node_sm0] timed out waiting for all nodes to process published state [102] (timeout [0s], pending nodes: [{node_sd1}{SUmmm71tS4S8Rjjmu7RRzQ}{KMqfr68_QR2_o6qIfel-pw}{127.0.0.1}{127.0.0.1:37251}, {node_sc3}{D3fZVHArSOm2z7CSJQjLLQ}{pdC1isitQaSDqccMh9K1ZQ}{127.0.0.1}{127.0.0.1:36167}, {node_sd2}{jqkfT8R_Q_ihxv7ORnNGcQ}{mRSClYlNTSu04uSSA50YbQ}{127.0.0.1}{127.0.0.1:39169}])
12:05:51   1> [2018-08-09T14:05:49,833][INFO ][o.e.c.m.MetaDataIndexTemplateService] [node_sm0] removing template [random_index_template]
12:05:51   1> [2018-08-09T14:05:49,833][WARN ][o.e.d.z.PublishClusterStateAction] [node_sm0] timed out waiting for all nodes to process published state [103] (timeout [0s], pending nodes: [{node_sd1}{SUmmm71tS4S8Rjjmu7RRzQ}{KMqfr68_QR2_o6qIfel-pw}{127.0.0.1}{127.0.0.1:37251}, {node_sc3}{D3fZVHArSOm2z7CSJQjLLQ}{pdC1isitQaSDqccMh9K1ZQ}{127.0.0.1}{127.0.0.1:36167}, {node_sd2}{jqkfT8R_Q_ihxv7ORnNGcQ}{mRSClYlNTSu04uSSA50YbQ}{127.0.0.1}{127.0.0.1:39169}])
12:05:51   1> [2018-08-09T14:05:49,836][INFO ][o.e.c.a.AckIT            ] [testIndicesAliasesAcknowledgement] [AckIT#testIndicesAliasesAcknowledgement]: cleaned up after test
12:05:51   1> [2018-08-09T14:05:49,836][INFO ][o.e.c.a.AckIT            ] [testIndicesAliasesAcknowledgement] after test
12:05:51 FAILURE 0.12s J2 | AckIT.testIndicesAliasesAcknowledgement <<< FAILURES!
12:05:51    > Throwable #1: java.lang.AssertionError: CreateIndexResponse failed - not acked
12:05:51    > Expected: <true>
12:05:51    >      but: was <false>
12:05:51    > 	at __randomizedtesting.SeedInfo.seed([CFEB21B624409D08:4BF121D23848E9F9]:0)
12:05:51    > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
12:05:51    > 	at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked(ElasticsearchAssertions.java:108)
12:05:51    > 	at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked(ElasticsearchAssertions.java:96)
12:05:51    > 	at org.elasticsearch.test.ESIntegTestCase.createIndex(ESIntegTestCase.java:741)
12:05:51    > 	at org.elasticsearch.cluster.ack.AckIT.testIndicesAliasesAcknowledgement(AckIT.java:214)
12:05:51    > 	at java.lang.Thread.run(Thread.java:748)

Frequency

This particular test did not fail before however test logs shows that can't resolve processed cluster state with uuid started happening on 2018-06-24 with failures almost every week since.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-08-10T07:23:13Z

Pinging @elastic/es-distributed

ywelsch · 2018-08-14T09:33:05Z

The reason these acking tests started failing since 2018-06-24 is that #30672 fixed the actual acking behavior, which would previously just erronously report failures as successful acks. The tests are completely bogus and would have never worked if the acking implementation had not been broken since its inception. The reason that the acking cannot work in this class is that it uses a publish timeout of 0, which means that cluster states can arrive out-of-order on nodes, which can in certain situations lead to cluster states being rejected as a better cluster state was already fully applied on the node. One way to work around the limitations in the test here might be to bring the cluster to a stable state first, and only then change the publish timeout for the one publishing round we want to check the acking for.

If there are more frequent failures of this, we can disable the whole class (AckIT) for now.

Relates #32767

jpountz · 2018-08-16T14:36:55Z

I AwaitFix'ed this test.

Relates #32767

ywelsch · 2018-09-13T14:54:15Z

Closed in favour of #33673

Those tests are broken like explained in #33673. Similar tests have already been muted (see #32767) and this specific class has been muted in master already (see #34053). Closes #37478

alpar-t added >test-failure Triaged test failures from CI v7.0.0 :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Aug 10, 2018

ywelsch added :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. and removed :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Aug 10, 2018

jpountz added a commit that referenced this issue Aug 16, 2018

AwaitFix AckIT.

4b68d42

Relates #32767

jpountz added a commit that referenced this issue Aug 16, 2018

AwaitFix AckIT.

e35be01

Relates #32767

jpountz added a commit that referenced this issue Aug 16, 2018

AwaitFix AckIT.

226aad1

Relates #32767

jimczi pushed a commit that referenced this issue Aug 17, 2018

AwaitFix AckIT.

73d8a5d

Relates #32767

ywelsch mentioned this issue Sep 13, 2018

Rework AckIT tests #33673

Closed

ywelsch closed this as completed Sep 13, 2018

ywelsch mentioned this issue Sep 25, 2018

Rare failures in AckClusterUpdateSettingsIT#testClusterUpdateSettingsNoAcknowledgement #34053

Closed

tlrx mentioned this issue Feb 1, 2019

[CI] AckClusterUpdateSettingsIT.testOpenIndexNoAcknowledgement failure in 6.x. #37478

Closed

jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test fails with: can't resolve processed cluster state with uuid #32767

test fails with: can't resolve processed cluster state with uuid #32767

alpar-t commented Aug 10, 2018 •

edited

elasticmachine commented Aug 10, 2018

ywelsch commented Aug 14, 2018

jpountz commented Aug 16, 2018

ywelsch commented Sep 13, 2018

test fails with: can't resolve processed cluster state with uuid #32767

test fails with: can't resolve processed cluster state with uuid #32767

Comments

alpar-t commented Aug 10, 2018 • edited

Example build failure

Reproduction line (does not reproduce locally)

Example relevant log

Frequency

elasticmachine commented Aug 10, 2018

ywelsch commented Aug 14, 2018

jpountz commented Aug 16, 2018

ywelsch commented Sep 13, 2018

alpar-t commented Aug 10, 2018 •

edited