You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently we met 3 times split-brain issue of my elasticsearch cluster hold on azure VM.
I have three nodes, each node can be data//master node. I have set the discovery.zen.minimum_master_nodes to 2 and use azure discovery plugin.
Recently azure often maintain their host machines, this cause the network instability between VMs and also caused split-brain issue for my elasticsearch cluster.
We found that the node which has already joined in to one master can be forced to rejoin to another master, and then the split-brain issue happened!
From the log we can see that:
09-22 16:38:27, node1 lost connection.
09-22 16:45:15, node2 lost connection, node 3 became master(node1 has recovered)
09-22 16:45:24, node3 lost connection, node 2 became master.
09-22 16:45:43, node3 recovered and became master again!.
Below is the error or warns for these three nodes when is split-brain issue happened:
[2015-09-22 16:45:43,126][WARN ][index.store ] [caps-prod-wus3] [0222d67c4146405497a70df65629e634][0] Can't open file to read checksums
java.io.FileNotFoundException: No such file [_3app.fdt]
The text was updated successfully, but these errors were encountered:
I see. So I suspect this is #2488, which was fixed in 1.4 . I suggest you upgrade (to 1.7.2) as soon as possible. Many many things have been fixed since 1.3.2.
Hi all,
Recently we met 3 times split-brain issue of my elasticsearch cluster hold on azure VM.
I have three nodes, each node can be data//master node. I have set the discovery.zen.minimum_master_nodes to 2 and use azure discovery plugin.
Recently azure often maintain their host machines, this cause the network instability between VMs and also caused split-brain issue for my elasticsearch cluster.
We found that the node which has already joined in to one master can be forced to rejoin to another master, and then the split-brain issue happened!
From the log we can see that:
09-22 16:38:27, node1 lost connection.
09-22 16:45:15, node2 lost connection, node 3 became master(node1 has recovered)
09-22 16:45:24, node3 lost connection, node 2 became master.
09-22 16:45:43, node3 recovered and became master again!.
Below is the error or warns for these three nodes when is split-brain issue happened:
Node1(search-prod-wus1):
[2015-09-22 16:45:15,618][INFO ][cluster.service ] [caps-prod-wus1] master {new [caps-prod-wus3][qQYWMttZScaqbAfBPkV5gw][search-prod-wbu][inet[/10.3.0.6:9300]], previous [caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]]}, removed {[caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]],}, reason: zen-disco-receive(from master [[caps-prod-wus3][qQYWMttZScaqbAfBPkV5gw][search-prod-wbu][inet[/10.3.0.6:9300]]])
[2015-09-22 16:45:20,566][WARN ][index.store ] [caps-prod-wus1] [32c3c289eef54e42be5913a63dfd280a][2] Can't open file to read checksums
java.io.FileNotFoundException: No such file [_89i0_es090_0.doc]
[2015-09-22 16:45:21,019][WARN ][index.store ] [caps-prod-wus1] [32c3c289eef54e42be5913a63dfd280a][2] Can't open file to read checksums
java.io.FileNotFoundException: No such file [_89i0_es090_0.doc]
[2015-09-22 16:45:24,364][INFO ][cluster.service ] [caps-prod-wus1] master {new [caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]], previous [caps-prod-wus3][qQYWMttZScaqbAfBPkV5gw][search-prod-wbu][inet[/10.3.0.6:9300]]}, removed {[caps-prod-wus3][qQYWMttZScaqbAfBPkV5gw][search-prod-wbu][inet[/10.3.0.6:9300]],}, added {[caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]],}, reason: zen-disco-receive(from master [[caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]]])
[2015-09-22 16:45:43,059][INFO ][cluster.service ] [caps-prod-wus1] master {new [caps-prod-wus3][qQYWMttZScaqbAfBPkV5gw][search-prod-wbu][inet[/10.3.0.6:9300]], previous [caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]]}, removed {[caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]],}, added {[caps-prod-wus3][qQYWMttZScaqbAfBPkV5gw][search-prod-wbu][inet[/10.3.0.6:9300]],}, reason: zen-disco-receive(from master [[caps-prod-wus3][qQYWMttZScaqbAfBPkV5gw][search-prod-wbu][inet[/10.3.0.6:9300]]])
Nodes(search-prod-wus2):
[2015-09-22 16:38:37,162][WARN ][action.index ] [caps-prod-wus2] Failed to perform index on remote replica [caps-prod-wus1][vcBwbCeJTQ61Mw2aOqaflg][search-prod-wbp][inet[/10.3.0.5:9300]][32c3c289eef54e42be5913a63dfd280a][3]
org.elasticsearch.transport.NodeDisconnectedException: [caps-prod-wus1][inet[/10.3.0.5:9300]][index/replica] disconnected
[2015-09-22 16:38:37,162][WARN ][cluster.action.shard ] [caps-prod-wus2] [32c3c289eef54e42be5913a63dfd280a][3] sending failed shard for [32c3c289eef54e42be5913a63dfd280a][3], node[vcBwbCeJTQ61Mw2aOqaflg], [R], s[STARTED], indexUUID [vLXa7OmETyGPGrI82L4SVg], reason [Failed to perform [index] on replica, message [NodeDisconnectedException[[caps-prod-wus1][inet[/10.3.0.5:9300]][index/replica] disconnected]]]
[2015-09-22 16:38:37,162][WARN ][action.index ] [caps-prod-wus2] Failed to perform index on remote replica [caps-prod-wus1][vcBwbCeJTQ61Mw2aOqaflg][search-prod-wbp][inet[/10.3.0.5:9300]][32c3c289eef54e42be5913a63dfd280a][3]
org.elasticsearch.transport.SendRequestTransportException: [caps-prod-wus1][inet[/10.3.0.5:9300]][index/replica]
[2015-09-22 16:38:37,162][WARN ][cluster.action.shard ] [caps-prod-wus2] [32c3c289eef54e42be5913a63dfd280a][3] sending failed shard for [32c3c289eef54e42be5913a63dfd280a][3], node[vcBwbCeJTQ61Mw2aOqaflg], [R], s[STARTED], indexUUID [vLXa7OmETyGPGrI82L4SVg], reason [Failed to perform [index] on replica, message [SendRequestTransportException[[caps-prod-wus1][inet[/10.3.0.5:9300]][index/replica]]; nested: NodeNotConnectedException[[caps-prod-wus1][inet[/10.3.0.5:9300]] Node not connected]; ]]
[2015-09-22 16:45:24,005][INFO ][cluster.service ] [caps-prod-wus2] removed {[caps-prod-wus3][qQYWMttZScaqbAfBPkV5gw][search-prod-wbu][inet[/10.3.0.6:9300]],}, reason: zen-disco-node_failed([caps-prod-wus3][qQYWMttZScaqbAfBPkV5gw][search-prod-wbu][inet[/10.3.0.6:9300]]), reason transport disconnected (with verified connect)
Node3(search-prod-wus3):
[2015-09-22 16:38:38,677][WARN ][action.index ] [caps-prod-wus3] Failed to perform index on remote replica [caps-prod-wus1][vcBwbCeJTQ61Mw2aOqaflg][search-prod-wbp][inet[/10.3.0.5:9300]][32c3c289eef54e42be5913a63dfd280a][2]
org.elasticsearch.transport.SendRequestTransportException: [caps-prod-wus1][inet[/10.3.0.5:9300]][index/replica]
[2015-09-22 16:45:15,156][INFO ][discovery.azure ] [caps-prod-wus3] master_left [[caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]]], reason [transport disconnected (with verified connect)]
[2015-09-22 16:45:15,156][INFO ][cluster.service ] [caps-prod-wus3] master {new [caps-prod-wus3][qQYWMttZScaqbAfBPkV5gw][search-prod-wbu][inet[/10.3.0.6:9300]], previous [caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]]}, removed {[caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]],}, reason: zen-disco-master_failed ([caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]])
[2015-09-22 16:45:43,126][WARN ][index.store ] [caps-prod-wus3] [0222d67c4146405497a70df65629e634][0] Can't open file to read checksums
java.io.FileNotFoundException: No such file [_3app.fdt]
The text was updated successfully, but these errors were encountered: