You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recently upgraded one of our ES Clusters from ES Version 1.1.0 to 1.4.1.
We have dedicated master-data-search deployment in AWS. Cluster settings are same for all the clusters.
Strangely, only in one cluster (6 nodes); we are seeing that nodes are constantly failing to connect to Master node and rejoining back.
It happens all the time, even during idle period (when there are no read or writes).
We keep on seeing following exception in the logs org.elasticsearch.transport.NodeNotConnectedException
Because of this, Cluster has slowed down considerably.
We use kopf plugin for monitoring and it keeps popping up message - "Loading cluster information is talking too long"
There is not much data on individual nodes; almost 80% disk is free. CPU and Heap are doing fine.
Only difference between this cluster and other clusters, is the number of indices and shards. Other clusters have shards in hundreds and indices in double digit.
But this cluster has around 5000 shards and close to 250 indices.
Has there been any change in 1.4.1 which can cause reconnection issues between nodes , if number of shards or indices are high ?
Any help will be appreciated !
PS. After rolling back the cluster to 1.3.2 version, things are back to normal.
Thanks,
The text was updated successfully, but these errors were encountered:
Does the problem disappear if you stop monitoring the cluster? ie no marvel, no kopf, no other tool which is requesting nodes info or stats? I'm wondering if it is related to this: #9683
Link to Google Group Conversation
Hi,
We recently upgraded one of our ES Clusters from ES Version 1.1.0 to 1.4.1.
We have dedicated master-data-search deployment in AWS. Cluster settings are same for all the clusters.
Strangely, only in one cluster (6 nodes); we are seeing that nodes are constantly failing to connect to Master node and rejoining back.
It happens all the time, even during idle period (when there are no read or writes).
We keep on seeing following exception in the logs org.elasticsearch.transport.NodeNotConnectedException
Because of this, Cluster has slowed down considerably.
We use kopf plugin for monitoring and it keeps popping up message - "Loading cluster information is talking too long"
There is not much data on individual nodes; almost 80% disk is free. CPU and Heap are doing fine.
Only difference between this cluster and other clusters, is the number of indices and shards. Other clusters have shards in hundreds and indices in double digit.
But this cluster has around 5000 shards and close to 250 indices.
Has there been any change in 1.4.1 which can cause reconnection issues between nodes , if number of shards or indices are high ?
Any help will be appreciated !
PS. After rolling back the cluster to 1.3.2 version, things are back to normal.
Thanks,
The text was updated successfully, but these errors were encountered: