Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in cluster deployment: failed to send join request to master #25305

Closed
tianmingxing opened this issue Jun 20, 2017 · 12 comments
Closed

Error in cluster deployment: failed to send join request to master #25305

tianmingxing opened this issue Jun 20, 2017 · 12 comments

Comments

@tianmingxing
Copy link

I have three virtual machines(192.168.245.128, 192.168.245.129, 192.168.245.130), respectively, in the above installed ES5.1.2, in the configuration of the cluster environment is encountered errors, the error in the three machines are similar.

This can be between the three machines can ping, you can telnet.
The following information is displayed for each machine.

192.168.245.128

[2017-06-20T10:04:25,434][INFO ][o.e.t.TransportService   ] [node-1] publish_address {192.168.245.128:9300}, bound_addresses {192.168.245.128:9300}
[2017-06-20T10:04:25,440][INFO ][o.e.b.BootstrapCheck     ] [node-1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-06-20T10:04:33,811][INFO ][o.e.d.z.ZenDiscovery     ] [node-1] failed to send join request to master [{node-2}{X-m7gPTMQn2TsdlByavfEg}{S2ucttQDSXCqLjcyi7wjKA}{192.168.245.129}{192.168.245.129:9300}], reason [RemoteTransportException[[node-2][192.168.245.129:9300][internal:discovery/zen/join]]; nested: NotMasterException[Node [{node-2}{X-m7gPTMQn2TsdlByavfEg}{S2ucttQDSXCqLjcyi7wjKA}{192.168.245.129}{192.168.245.129:9300}] not master for join request]; ], tried [3] times

192.168.245.129

[2017-06-20T10:04:30,429][INFO ][o.e.t.TransportService   ] [node-2] publish_address {192.168.245.129:9300}, bound_addresses {192.168.245.129:9300}
[2017-06-20T10:04:30,435][INFO ][o.e.b.BootstrapCheck     ] [node-2] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-06-20T10:04:33,813][INFO ][o.e.d.z.ZenDiscovery     ] [node-2] failed to send join request to master [{node-1}{X-m7gPTMQn2TsdlByavfEg}{YFmPk0dUTh-S1Ef9of5RTA}{192.168.245.128}{192.168.245.128:9300}], reason [RemoteTransportException[[node-1][192.168.245.128:9300][internal:discovery/zen/join]]; nested: NotMasterException[Node [{node-1}{X-m7gPTMQn2TsdlByavfEg}{YFmPk0dUTh-S1Ef9of5RTA}{192.168.245.128}{192.168.245.128:9300}] not master for join request]; ], tried [3] times

192.168.245.130

[2017-06-20T10:04:33,983][INFO ][o.e.t.TransportService   ] [node-3] publish_address {192.168.245.130:9300}, bound_addresses {192.168.245.130:9300}
[2017-06-20T10:04:33,991][INFO ][o.e.b.BootstrapCheck     ] [node-3] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-06-20T10:04:37,354][INFO ][o.e.d.z.ZenDiscovery     ] [node-3] failed to send join request to master [{node-1}{X-m7gPTMQn2TsdlByavfEg}{YFmPk0dUTh-S1Ef9of5RTA}{192.168.245.128}{192.168.245.128:9300}], reason [RemoteTransportException[[node-1][192.168.245.128:9300][internal:discovery/zen/join]]; nested: NotMasterException[Node [{node-1}{X-m7gPTMQn2TsdlByavfEg}{YFmPk0dUTh-S1Ef9of5RTA}{192.168.245.128}{192.168.245.128:9300}] not master for join request]; ], tried [3] times

I chose a configuration file on a machine to show:

cluster.name: my-test
node.name: node-1
network.host: 192.168.245.128
discovery.zen.ping.unicast.hosts: ["192.168.245.129", "192.168.245.130"]
discovery.zen.minimum_master_nodes: 2
@jasontedor
Copy link
Member

Elastic provides a forum for asking general questions and instead prefers to use GitHub only for verified bug reports and feature requests. There's an active community there that should be able to help get an answer to your question. As such, I hope you don't mind that I close this.

@tianmingxing
Copy link
Author

@tvernum
Copy link
Contributor

tvernum commented Jun 21, 2017

@xiaoxing598 You asked that 1 day ago. The forums are offered on a best-efforts basis, and it sometimes takes a few days before anyone is available to respond.

In any case, the people who read and process these github issues are also active on the forums. If we haven't had time to answer it on the forum, then you're not going to improve the situation by ignoring our issue guidelines and asking it here.

@tianmingxing
Copy link
Author

@tvernum I am sorry to bring you inconvenience, I just want to complete the task to the boss.

@mlasak
Copy link

mlasak commented Mar 14, 2018

Awesome. Stumbled upon similar problem (using ec2-discovery) now. On the forum the discussion on this is closed unanswered. Ok, let's see how many hours i will spend on this :(

@dadoonet
Copy link
Member

@mlasak Please open a discussion in the forum and we can probably help there.

@mlasak
Copy link

mlasak commented Mar 14, 2018

Ok, i'll do that. But meanwhile i found the reason for the issue. I'll will document the problem and the solution on the forum so people can benefit from it. Thanks.

@bobby259
Copy link

@mlasak Can you share how you solved it? or, the link to the forum thread where you posted the solution already?

@mlasak
Copy link

mlasak commented Apr 24, 2018

@bobby259 so basically the issue was as follows (Issue marked in bold):

  • we've setup images on AWS (one image for master/ingest nodes, one image for data nodes) all using the discovery-ec2 plug-in
  • we've tested the images and and build AMIs for autoscaling groups
  • after launching 3 master and 3 data nodes, the data nodes were ok, but the master nodes had the old /data folder inside from the testing step (obviously the nodes were not reachable for the data nodes due to wrong security group etc)

So the fix is: If you encounter this problem, then shutdown your cluster master nodes and delete the /data folders. Start your cluster nodes and it works!

Hope this helps you! Please comment/ confirm.

@bobby259
Copy link

@mlasak Thank you for the quick response.

@te701
Copy link

te701 commented May 9, 2018

@mlasak many thanks for the information : I have copied a test envirronment to prepare the production system including the data folder and cluster wasn't working ..
Removing the data folder and restart of elasticsearch made the job.

@NoviceZeng
Copy link

@bobby259 so basically the issue was as follows (Issue marked in bold):

  • we've setup images on AWS (one image for master/ingest nodes, one image for data nodes) all using the discovery-ec2 plug-in
  • we've tested the images and and build AMIs for autoscaling groups
  • after launching 3 master and 3 data nodes, the data nodes were ok, but the master nodes had the old /data folder inside from the testing step (obviously the nodes were not reachable for the data nodes due to wrong security group etc)

So the fix is: If you encounter this problem, then shutdown your cluster master nodes and delete the /data folders. Start your cluster nodes and it works!

Hope this helps you! Please comment/ confirm.

Thanks, guy, I soleved my question, because I copied my VM from node-1 to node-2, after I deleted the relative data, it`s OK now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants