Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After relocation shards might temporarily not be searchable if still in POST_RECOVERY #9421

Closed
brwe opened this issue Jan 26, 2015 · 10 comments · Fixed by #13246
Closed

After relocation shards might temporarily not be searchable if still in POST_RECOVERY #9421

brwe opened this issue Jan 26, 2015 · 10 comments · Fixed by #13246
Assignees
Labels

Comments

@brwe
Copy link
Contributor

brwe commented Jan 26, 2015

SimpleSortTests.testIssue8226 for example fails about once a week. Example failure:
http://build-us-00.elasticsearch.org/job/es_g1gc_1x_metal/3129/

I can reproduce it locally (although very rarely) with some additional logging (action.search.type: TRACE).

Here is a brief analysis of what happened. Would be great if someone could take a look and let me know if this makes sense.

Failure:

1> REPRODUCE WITH  : mvn clean test -Dtests.seed=774A2866F1B6042D -Dtests.class=org.elasticsearch.search.sort.SimpleSortTests -Dtests.method="testIssue8226 {#76 seed=[774A2866F1B6042D:ACB4FF9F8C8CA341]}" -Des.logger.level=DEBUG -Des.node.mode=network -Dtests.security.manager=true -Dtests.nightly=false -Dtests.client.ratio=0.0 -Dtests.heap.size=512m -Dtests.jvm.argline="-server -XX:+UseConcMarkSweepGC -XX:-UseCompressedOops -XX:+AggressiveOpts -Djava.net.preferIPv4Stack=true" -Dtests.locale=fi_FI -Dtests.timezone=Etc/GMT+9 -Dtests.processors=4
  1> Throwable:
  1> java.lang.AssertionError: One or more shards were not successful but didn't trigger a failure
  1> Expected: <47>
  1>      but: was <46>

Here is an example failure in detail, the relevant parts of the logs are below:

State

node_0 is master.
[test_5][0] is relocating from node_1 to node_0.
Cluster state 3673 has the shard as relocating, in cluster state 3674 it is started.
node_0 is the coordinating node for the search request.

In brief, the request fails for shard [test_5][0] because node_0 operates on an older cluster state 3673 when processing the search request, while node_1 is already on 3674.

Course of events:

  1. node_0 sends shard started, but the shard is still in state POST_RECOVERY and will remain so until it receives the new cluster state and applies it locally
  2. node_0(master) receives the shard started request and publishes the new cluster state 3674 to node_0 and node_1
  3. node_1 receives the cluster state 3674 and applies it locally
  4. node_0 sends search request for [test_5][0] to node_1 because according to cluster state 3673 the shard is there and relocating
    -> request fails with IndexShardMissingException because node_1 already applied cluster state 3674 and deleted the shard.
  5. node_0 then sends request for [test_5][0] to node_0 because the shard is there as well (according to cluster state 3673 it is and initializing)
    -> request fails with IllegalIndexShardStateException because node_0 has not yet processed cluster state 3674 and therefore the shard is in POST_RECOVERY instead of STARTED
    No shard failure is logged because IndexShardMissingException and IllegalIndexShardStateException are explicitly excluded from shard failures.
  6. node_0 finally also gets to process the new cluster state and moves the shard [test_5][0] to STARTED but it is too late

This is a very rare condition and maybe too bad on client side because the information that one shard did not deliver results is there although it is not explicitly listed as shard failure. We can probably make the test pass easily be just waiting for relocations before executing the search request but that seems wrong because any search request can fail this way.

Sample log

[....]

  1> [2015-01-26 09:27:14,435][DEBUG][indices.recovery         ] [node_0] [test_5][0] recovery completed from [[node_1][G4AEDzbrRae5BC_UD9zItA][schmusi][inet[/192.168.2.102:9401]]{mode=network, enable_custom_paths=true}], took [84ms]
  1> [2015-01-26 09:27:14,435][DEBUG][cluster.action.shard     ] [node_0] sending shard started for [test_5][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], relocating [G4AEDzbrRae5BC_UD9zItA], [P], s[INITIALIZING], indexUUID [E3T8J7CaRkyK533W0hMBPw], reason [after recovery (replica) from node [[node_1][G4AEDzbrRae5BC_UD9zItA][schmusi][inet[/192.168.2.102:9401]]{mode=network, enable_custom_paths=true}]]
  1> [2015-01-26 09:27:14,435][DEBUG][cluster.action.shard     ] [node_0] received shard started for [test_5][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], relocating [G4AEDzbrRae5BC_UD9zItA], [P], s[INITIALIZING], indexUUID [E3T8J7CaRkyK533W0hMBPw], reason [after recovery (replica) from node [[node_1][G4AEDzbrRae5BC_UD9zItA][schmusi][inet[/192.168.2.102:9401]]{mode=network, enable_custom_paths=true}]]
  1> [2015-01-26 09:27:14,436][DEBUG][cluster.service          ] [node_0] processing [shard-started ([test_5][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], relocating [G4AEDzbrRae5BC_UD9zItA], [P], s[INITIALIZING]), reason [after recovery (replica) from node [[node_1][G4AEDzbrRae5BC_UD9zItA][schmusi][inet[/192.168.2.102:9401]]{mode=network, enable_custom_paths=true}]]]: execute
  1> [2015-01-26 09:27:14,436][DEBUG][cluster.action.shard     ] [node_0] [test_5][0] will apply shard started [test_5][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], relocating [G4AEDzbrRae5BC_UD9zItA], [P], s[INITIALIZING], indexUUID [E3T8J7CaRkyK533W0hMBPw], reason [after recovery (replica) from node [[node_1][G4AEDzbrRae5BC_UD9zItA][schmusi][inet[/192.168.2.102:9401]]{mode=network, enable_custom_paths=true}]]


[....]


  1> [2015-01-26 09:27:14,441][DEBUG][cluster.service          ] [node_0] cluster state updated, version [3674], source [shard-started ([test_5][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], relocating [G4AEDzbrRae5BC_UD9zItA], [P], s[INITIALIZING]), reason [after recovery (replica) from node [[node_1][G4AEDzbrRae5BC_UD9zItA][schmusi][inet[/192.168.2.102:9401]]{mode=network, enable_custom_paths=true}]]]
  1> [2015-01-26 09:27:14,441][DEBUG][cluster.service          ] [node_0] publishing cluster state version 3674
  1> [2015-01-26 09:27:14,442][DEBUG][discovery.zen.publish    ] [node_1] received cluster state version 3674
  1> [2015-01-26 09:27:14,443][DEBUG][cluster.service          ] [node_1] processing [zen-disco-receive(from master [[node_0][GQ6yYxmyRT-sfvT0cmuqQQ][schmusi][inet[/192.168.2.102:9400]]{mode=network, enable_custom_paths=true}])]: execute
  1> [2015-01-26 09:27:14,443][DEBUG][cluster.service          ] [node_1] cluster state updated, version [3674], source [zen-disco-receive(from master [[node_0][GQ6yYxmyRT-sfvT0cmuqQQ][schmusi][inet[/192.168.2.102:9400]]{mode=network, enable_custom_paths=true}])]
  1> [2015-01-26 09:27:14,443][DEBUG][cluster.service          ] [node_1] set local cluster state to version 3674
  1> [2015-01-26 09:27:14,443][DEBUG][indices.cluster          ] [node_1] [test_5][0] removing shard (not allocated)
  1> [2015-01-26 09:27:14,443][DEBUG][index                    ] [node_1] [test_5] [0] closing... (reason: [removing shard (not allocated)])
  1> [2015-01-26 09:27:14,443][INFO ][test.store               ] [node_1] [test_5][0] Shard state before potentially flushing is STARTED
  1> [2015-01-26 09:27:14,453][DEBUG][search.sort              ] cluster state:
  1> version: 3673
  1> meta data version: 2043
  1> nodes:
  1>    [node_1][G4AEDzbrRae5BC_UD9zItA][schmusi][inet[/192.168.2.102:9401]]{mode=network, enable_custom_paths=true}
  1>    [node_0][GQ6yYxmyRT-sfvT0cmuqQQ][schmusi][inet[/192.168.2.102:9400]]{mode=network, enable_custom_paths=true}, local, master
  1> routing_table (version 3006):
  1> -- index [test_4]
  1> ----shard_id [test_4][4]
  1> --------[test_4][4], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_4][4], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_4][7]
  1> --------[test_4][7], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_4][7], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_4][0]
  1> --------[test_4][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_4][0], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_4][3]
  1> --------[test_4][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_4][3], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_4][1]
  1> --------[test_4][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_4][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_4][5]
  1> --------[test_4][5], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_4][5], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_4][6]
  1> --------[test_4][6], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_4][6], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_4][2]
  1> --------[test_4][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_4][2], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1>
  1> -- index [test_3]
  1> ----shard_id [test_3][2]
  1> --------[test_3][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_3][2], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_3][0]
  1> --------[test_3][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_3][0], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_3][3]
  1> --------[test_3][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_3][3], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_3][1]
  1> --------[test_3][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_3][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_3][5]
  1> --------[test_3][5], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_3][5], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_3][6]
  1> --------[test_3][6], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_3][6], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_3][4]
  1> --------[test_3][4], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_3][4], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1>
  1> -- index [test_2]
  1> ----shard_id [test_2][2]
  1> --------[test_2][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_2][2], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_2][0]
  1> --------[test_2][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_2][0], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_2][3]
  1> --------[test_2][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_2][3], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_2][1]
  1> --------[test_2][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_2][1], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_2][5]
  1> --------[test_2][5], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_2][5], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_2][4]
  1> --------[test_2][4], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_2][4], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1>
  1> -- index [test_1]
  1> ----shard_id [test_1][0]
  1> --------[test_1][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> ----shard_id [test_1][1]
  1> --------[test_1][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1>
  1> -- index [test_0]
  1> ----shard_id [test_0][4]
  1> --------[test_0][4], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_0][4], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_0][0]
  1> --------[test_0][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_0][0], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_0][7]
  1> --------[test_0][7], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_0][7], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_0][3]
  1> --------[test_0][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_0][3], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_0][1]
  1> --------[test_0][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_0][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_0][5]
  1> --------[test_0][5], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_0][5], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_0][6]
  1> --------[test_0][6], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_0][6], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_0][2]
  1> --------[test_0][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_0][2], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1>
  1> -- index [test_8]
  1> ----shard_id [test_8][0]
  1> --------[test_8][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> ----shard_id [test_8][1]
  1> --------[test_8][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_8][2]
  1> --------[test_8][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1>
  1> -- index [test_7]
  1> ----shard_id [test_7][2]
  1> --------[test_7][2], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_7][0]
  1> --------[test_7][0], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_7][3]
  1> --------[test_7][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> ----shard_id [test_7][1]
  1> --------[test_7][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> ----shard_id [test_7][4]
  1> --------[test_7][4], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1>
  1> -- index [test_6]
  1> ----shard_id [test_6][0]
  1> --------[test_6][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_6][0], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_6][3]
  1> --------[test_6][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_6][3], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_6][1]
  1> --------[test_6][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_6][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_6][2]
  1> --------[test_6][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_6][2], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1>
  1> -- index [test_5]
  1> ----shard_id [test_5][0]
  1> --------[test_5][0], node[G4AEDzbrRae5BC_UD9zItA], relocating [GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[RELOCATING]
  1> ----shard_id [test_5][3]
  1> --------[test_5][3], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_5][1]
  1> --------[test_5][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_5][2]
  1> --------[test_5][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1>
  1> routing_nodes:
  1> -----node_id[GQ6yYxmyRT-sfvT0cmuqQQ][V]
  1> --------[test_4][4], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_4][7], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_4][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_4][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_4][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_4][5], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_4][6], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_4][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_3][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_3][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_3][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_3][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_3][5], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_3][6], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_3][4], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_2][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_2][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_2][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_2][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_2][5], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_2][4], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_1][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_0][4], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_0][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_0][7], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_0][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_0][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_0][5], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_0][6], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_0][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_8][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_8][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_7][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_7][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_6][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_6][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_6][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_6][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_5][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], relocating [G4AEDzbrRae5BC_UD9zItA], [P], s[INITIALIZING]
  1> --------[test_5][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> -----node_id[G4AEDzbrRae5BC_UD9zItA][V]
  1> --------[test_4][4], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_4][7], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_4][0], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_4][3], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_4][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_4][5], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_4][6], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_4][2], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_3][2], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_3][0], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_3][3], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_3][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_3][5], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_3][6], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_3][4], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_2][2], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_2][0], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_2][3], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_2][1], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_2][5], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_2][4], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_1][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_0][4], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_0][0], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_0][7], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_0][3], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_0][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_0][5], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_0][6], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_0][2], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_8][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_7][2], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_7][0], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_7][4], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_6][0], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_6][3], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_6][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_6][2], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_5][0], node[G4AEDzbrRae5BC_UD9zItA], relocating [GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[RELOCATING]
  1> --------[test_5][3], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_5][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ---- unassigned
  1>
  1> tasks: (1):
  1> 13638/URGENT/shard-started ([test_5][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], relocating [G4AEDzbrRae5BC_UD9zItA], [P], s[INITIALIZING]), reason [after recovery (replica) from node [[node_1][G4AEDzbrRae5BC_UD9zItA][schmusi][inet[/192.168.2.102:9401]]{mode=network, enable_custom_paths=true}]]/13ms
  1>



[...]


 1> [2015-01-26 09:27:14,459][TRACE][action.search.type       ] [node_0] got first-phase result from [G4AEDzbrRae5BC_UD9zItA][test_3][3]
  1> [2015-01-26 09:27:14,460][TRACE][action.search.type       ] [node_0] [test_5][0], node[G4AEDzbrRae5BC_UD9zItA], relocating [GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[RELOCATING]: Failed to execute [org.elasticsearch.action.search.SearchRequest@1469040a] lastShard [false]
  1> org.elasticsearch.transport.RemoteTransportException: [node_1][inet[/192.168.2.102:9401]][indices:data/read/search[phase/dfs]]
  1> Caused by: org.elasticsearch.index.IndexShardMissingException: [test_5][0] missing
  1>     at org.elasticsearch.index.IndexService.shardSafe(IndexService.java:203)
  1>     at org.elasticsearch.search.SearchService.createContext(SearchService.java:539)
  1>     at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:523)
  1>     at org.elasticsearch.search.SearchService.executeDfsPhase(SearchService.java:208)
  1>     at org.elasticsearch.search.action.SearchServiceTransportAction$SearchDfsTransportHandler.messageReceived(SearchServiceTransportAction.java:757)
  1>     at org.elasticsearch.search.action.SearchServiceTransportAction$SearchDfsTransportHandler.messageReceived(SearchServiceTransportAction.java:748)
  1>     at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:275)
  1>     at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
  1>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  1>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  1>     at java.lang.Thread.run(Thread.java:745)
  1> [2015-01-26 09:27:14,455][TRACE][action.search.type       ] [node_0] got first-phase result from [GQ6yYxmyRT-sfvT0cmuqQQ][test_6][1]
  1> [2015-01-26 09:27:14,455][TRACE][action.search.type       ] [node_0] got first-phase result from [GQ6yYxmyRT-sfvT0cmuqQQ][test_5][2]
  1> [2015-01-26 09:27:14,455][TRACE][action.search.type       ] [node_0] got first-phase result from [GQ6yYxmyRT-sfvT0cmuqQQ][test_7][1]
  1> [2015-01-26 09:27:14,455][TRACE][action.search.type       ] [node_0] got first-phase result from [GQ6yYxmyRT-sfvT0cmuqQQ][test_3][2]
  1> [2015-01-26 09:27:14,455][TRACE][action.search.type       ] [node_0] got first-phase result from [GQ6yYxmyRT-sfvT0cmuqQQ][test_8][2]
  1> [2015-01-26 09:27:14,455][TRACE][action.search.type       ] [node_0] got first-phase result from [GQ6yYxmyRT-sfvT0cmuqQQ][test_1][0]


[...]


  1> [2015-01-26 09:27:14,463][TRACE][action.search.type       ] [node_0] [test_5][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], relocating [G4AEDzbrRae5BC_UD9zItA], [P], s[INITIALIZING]: Failed to execute [org.elasticsearch.action.search.SearchRequest@1469040a] lastShard [true]
  1> org.elasticsearch.index.shard.IllegalIndexShardStateException: [test_5][0] CurrentState[POST_RECOVERY] operations only allowed when started/relocated
  1>     at org.elasticsearch.index.shard.IndexShard.readAllowed(IndexShard.java:839)
  1>     at org.elasticsearch.index.shard.IndexShard.acquireSearcher(IndexShard.java:651)
  1>     at org.elasticsearch.index.shard.IndexShard.acquireSearcher(IndexShard.java:647)
  1>     at org.elasticsearch.search.SearchService.createContext(SearchService.java:543)
  1>     at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:523)
  1>     at org.elasticsearch.search.SearchService.executeDfsPhase(SearchService.java:208)
  1>     at org.elasticsearch.search.action.SearchServiceTransportAction$3.call(SearchServiceTransportAction.java:197)
  1>     at org.elasticsearch.search.action.SearchServiceTransportAction$3.call(SearchServiceTransportAction.java:194)
  1>     at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
  1>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  1>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  1>     at java.lang.Thread.run(Thread.java:745)
  1> [2015-01-26 09:27:14,459][TRACE][action.search.type       ] [node_0] got first-phase result from [G4AEDzbrRae5BC_UD9zItA][test_2][4]



[...]



  1> [2015-01-26 09:27:14,493][DEBUG][cluster.service          ] [node_0] set local cluster state to version 3674
  1> [2015-01-26 09:27:14,493][DEBUG][index.shard              ] [node_0] [test_5][0] state: [POST_RECOVERY]->[STARTED], reason [global state is [STARTED]]
  1> [2015-01-26 09:27:14,493][DEBUG][river.cluster            ] [node_0] processing [reroute_rivers_node_changed]: execute
  1> [2015-01-26 09:27:14,493][DEBUG][river.cluster            ] [node_0] processing [reroute_rivers_node_changed]: no change in cluster_state
  1> [2015-01-26 09:27:14,493][DEBUG][cluster.service          ] [node_0] processing [shard-started ([test_5][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], relocating [G4AEDzbrRae5BC_UD9zItA], [P], s[INITIALIZING]), reason [after recovery (replica) from node [[node_1][G4AEDzbrRae5BC_UD9zItA][schmusi][inet[/192.168.2.102:9401]]{mode=network, enable_custom_paths=true}]]]: done applying updated cluster_state (version: 3674)
  1> [2015-01-26 09:27:14,456][TRACE][action.search.type       ] [node_0] got first-phase result from [GQ6yYxmyRT-sfvT0cmuqQQ][test_2][3]


[...]

  1> [2015-01-26 09:27:14,527][DEBUG][search.sort              ] cluster state:
  1> version: 3674
  1> meta data version: 2043
  1> nodes:
  1>    [node_1][G4AEDzbrRae5BC_UD9zItA][schmusi][inet[/192.168.2.102:9401]]{mode=network, enable_custom_paths=true}
  1>    [node_0][GQ6yYxmyRT-sfvT0cmuqQQ][schmusi][inet[/192.168.2.102:9400]]{mode=network, enable_custom_paths=true}, local, master
  1> routing_table (version 3007):
  1> -- index [test_4]
  1> ----shard_id [test_4][2]
  1> --------[test_4][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_4][2], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_4][7]
  1> --------[test_4][7], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_4][7], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_4][0]
  1> --------[test_4][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_4][0], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_4][3]
  1> --------[test_4][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_4][3], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_4][1]
  1> --------[test_4][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_4][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_4][5]
  1> --------[test_4][5], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_4][5], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_4][6]
  1> --------[test_4][6], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_4][6], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_4][4]
  1> --------[test_4][4], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_4][4], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1>
  1> -- index [test_3]
  1> ----shard_id [test_3][4]
  1> --------[test_3][4], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_3][4], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_3][0]
  1> --------[test_3][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_3][0], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_3][3]
  1> --------[test_3][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_3][3], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_3][1]
  1> --------[test_3][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_3][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_3][5]
  1> --------[test_3][5], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_3][5], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_3][6]
  1> --------[test_3][6], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_3][6], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_3][2]
  1> --------[test_3][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_3][2], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1>
  1> -- index [test_2]
  1> ----shard_id [test_2][4]
  1> --------[test_2][4], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_2][4], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_2][0]
  1> --------[test_2][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_2][0], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_2][3]
  1> --------[test_2][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_2][3], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_2][1]
  1> --------[test_2][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_2][1], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_2][5]
  1> --------[test_2][5], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_2][5], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_2][2]
  1> --------[test_2][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_2][2], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1>
  1> -- index [test_1]
  1> ----shard_id [test_1][0]
  1> --------[test_1][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> ----shard_id [test_1][1]
  1> --------[test_1][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1>
  1> -- index [test_0]
  1> ----shard_id [test_0][2]
  1> --------[test_0][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_0][2], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_0][0]
  1> --------[test_0][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_0][0], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_0][7]
  1> --------[test_0][7], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_0][7], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_0][3]
  1> --------[test_0][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_0][3], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_0][1]
  1> --------[test_0][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_0][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_0][5]
  1> --------[test_0][5], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_0][5], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_0][6]
  1> --------[test_0][6], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_0][6], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> ----shard_id [test_0][4]
  1> --------[test_0][4], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_0][4], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1>
  1> -- index [test_8]
  1> ----shard_id [test_8][0]
  1> --------[test_8][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> ----shard_id [test_8][1]
  1> --------[test_8][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_8][2]
  1> --------[test_8][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1>
  1> -- index [test_7]
  1> ----shard_id [test_7][4]
  1> --------[test_7][4], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_7][0]
  1> --------[test_7][0], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_7][3]
  1> --------[test_7][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> ----shard_id [test_7][1]
  1> --------[test_7][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> ----shard_id [test_7][2]
  1> --------[test_7][2], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1>
  1> -- index [test_6]
  1> ----shard_id [test_6][0]
  1> --------[test_6][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_6][0], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_6][3]
  1> --------[test_6][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_6][3], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_6][1]
  1> --------[test_6][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_6][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_6][2]
  1> --------[test_6][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_6][2], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1>
  1> -- index [test_5]
  1> ----shard_id [test_5][0]
  1> --------[test_5][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> ----shard_id [test_5][3]
  1> --------[test_5][3], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_5][1]
  1> --------[test_5][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ----shard_id [test_5][2]
  1> --------[test_5][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1>
  1> routing_nodes:
  1> -----node_id[GQ6yYxmyRT-sfvT0cmuqQQ][V]
  1> --------[test_4][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_4][7], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_4][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_4][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_4][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_4][5], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_4][6], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_4][4], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_3][4], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_3][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_3][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_3][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_3][5], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_3][6], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_3][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_2][4], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_2][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_2][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_2][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_2][5], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_2][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_1][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_0][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_0][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_0][7], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_0][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_0][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_0][5], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_0][6], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_0][4], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_8][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_8][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_7][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_7][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_6][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_6][3], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_6][1], node[GQ6yYxmyRT-sfvT0cmuqQQ], [R], s[STARTED]
  1> --------[test_6][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_5][0], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> --------[test_5][2], node[GQ6yYxmyRT-sfvT0cmuqQQ], [P], s[STARTED]
  1> -----node_id[G4AEDzbrRae5BC_UD9zItA][V]
  1> --------[test_4][2], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_4][7], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_4][0], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_4][3], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_4][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_4][5], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_4][6], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_4][4], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_3][4], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_3][0], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_3][3], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_3][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_3][5], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_3][6], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_3][2], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_2][4], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_2][0], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_2][3], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_2][1], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_2][5], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_2][2], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_1][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_0][2], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_0][0], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_0][7], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_0][3], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_0][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_0][5], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_0][6], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_0][4], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_8][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_7][4], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_7][0], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_7][2], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_6][0], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_6][3], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_6][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_6][2], node[G4AEDzbrRae5BC_UD9zItA], [R], s[STARTED]
  1> --------[test_5][3], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> --------[test_5][1], node[G4AEDzbrRae5BC_UD9zItA], [P], s[STARTED]
  1> ---- unassigned
  1>
  1> tasks: (0):
  1>

[...]
@clintongormley clintongormley added >bug >test-failure Triaged test failures from CI labels Jan 26, 2015
@dakrone
Copy link
Member

dakrone commented Jan 27, 2015

dakrone added a commit that referenced this issue Jan 27, 2015
See:
#9421

Conflicts:
	src/test/java/org/elasticsearch/deleteByQuery/DeleteByQueryTests.java
@brwe
Copy link
Contributor Author

brwe commented Jan 27, 2015

I think this is unrelated. I actually fixed the DeleteByQueryTests yesterday (c3f1982) and this commit does not seem to be in the build you linked to.

A brief explanation: DeleteByQuery is a write operation. The shard header returned and checked in DeleteByQueryTests is different from the one return for search requests. The reason why DeleteByQuery failed is because I added the check

assertThat(shardInfo.getSuccessful(), greaterThanOrEqualTo(numShards.totalNumShards));

before which was wrong because there was no ensureGreen() so some of the replicas might not have ben initialized yet. I fixed this in c3f1982 by instead checking

assertThat(shardInfo.getSuccessful(), greaterThanOrEqualTo(numShards.numPrimaries));

@bleskes
Copy link
Contributor

bleskes commented Jan 27, 2015

I wonder if we should just allow reads in the POST_RECOVERY phase. At that point the shards is effectively ready to do everything it needs to do. @brwe this will solve the issue, right?

@dakrone
Copy link
Member

dakrone commented Jan 27, 2015

@brwe okay, does that mean I can unmute the DeleteByQueryTests.testDeleteAllOneIndex?

@brwe
Copy link
Contributor Author

brwe commented Jan 27, 2015

yes

@dakrone
Copy link
Member

dakrone commented Jan 27, 2015

Unmuted the DeleteByQueryTests.testDeleteAllOneIndex test

@brwe brwe self-assigned this Jan 28, 2015
@brwe
Copy link
Contributor Author

brwe commented Jan 28, 2015

@bleskes I think that would fix it. However, before I push I want to try and write a test that reproduces reliably. Will not do before next week.

@bleskes
Copy link
Contributor

bleskes commented Feb 24, 2015

@brwe please ping before starting on this. I want to make sure that we capture the original issue which caused us to introduce POST_RECOVERY. I don't recall exactly recall what the problem was (it was refresh related) and I think it was solved by a more recent change to how refresh work (#6545) but it requires careful thought

@brwe
Copy link
Contributor Author

brwe commented Mar 20, 2015

@bleskes ping :)
I finally came back to this and wrote a test that reproduces the failure reliably (#10194) but I did not quite get what you meant by "capture the original issue". Can you elaborate?

@brwe brwe changed the title [CI Failure] SimpleSortTests.testIssue8226 After relocation shards might temporarily not be searchable if still in POST_RECOVERY Apr 10, 2015
@s1monw
Copy link
Contributor

s1monw commented Apr 13, 2015

@kimchy do you recall why we can't read in that state?

brwe added a commit to brwe/elasticsearch that referenced this issue Aug 11, 2015
…y not be searchable if still in POST_RECOVERY)

see elastic#9421
brwe added a commit to brwe/elasticsearch that referenced this issue Aug 11, 2015
brwe added a commit to brwe/elasticsearch that referenced this issue Aug 11, 2015
When a client indexes a documents and then calls refresh on the index
then the document must be visible after that with search requests.
This might not be the case if refresh is a BroadcastOperationAction,
see DiscoveryWithServiceDisruptionsTests.testReadOnPostRecoveryShards

related to elastic#9421
imotov added a commit to imotov/elasticsearch that referenced this issue Aug 18, 2015
The restore portion of some snapshot/restore test is failing randomly due to elastic#9421. This change suspends rebalance during snapshot/restore operations until elastic#9421 is fixed.

Closes elastic#12855
imotov added a commit that referenced this issue Aug 18, 2015
The restore portion of some snapshot/restore test is failing randomly due to #9421. This change suspends rebalance during snapshot/restore operations until #9421 is fixed.

Closes #12855
brwe added a commit to brwe/elasticsearch that referenced this issue Sep 1, 2015
brwe added a commit to brwe/elasticsearch that referenced this issue Sep 2, 2015
Currently, we do not allow reads on shards which are in POST_RECOVERY which
unfortunately can cause search failures on shards which just recovered if there no replicas (elastic#9421).
The reason why we did not allow reads on shards that are in POST_RECOVERY is
that after relocating a shard might miss a refresh if the node that executed the
refresh is behind with cluster state processing. If that happens, a user might execute
index/refresh/search but still not find the document that was indexed.

We changed how refresh works now in elastic#13068 to make sure that shards cannot miss a refresh this
way by sending refresh requests the same way that we send write requests.

This commit changes IndexShard to allow reads on POST_RECOVERY now.
In addition it adds two test:

- test for issue elastic#9421 (After relocation shards might temporarily not be searchable if still in POST_RECOVERY)
- test for visibility issue with relocation and refresh if reads allowed when shard is in POST_RECOVERY

closes elastic#9421
brwe added a commit that referenced this issue Sep 2, 2015
Currently, we do not allow reads on shards which are in POST_RECOVERY which
unfortunately can cause search failures on shards which just recovered if there no replicas (#9421).
The reason why we did not allow reads on shards that are in POST_RECOVERY is
that after relocating a shard might miss a refresh if the node that executed the
refresh is behind with cluster state processing. If that happens, a user might execute
index/refresh/search but still not find the document that was indexed.

We changed how refresh works now in #13068 to make sure that shards cannot miss a refresh this
way by sending refresh requests the same way that we send write requests.

This commit changes IndexShard to allow reads on POST_RECOVERY now.
In addition it adds two test:

- test for issue #9421 (After relocation shards might temporarily not be searchable if still in POST_RECOVERY)
- test for visibility issue with relocation and refresh if reads allowed when shard is in POST_RECOVERY

closes #9421
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants