Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tribe node: failed to send join request to master #15373

Closed
rogerwelin opened this issue Dec 10, 2015 · 10 comments
Closed

tribe node: failed to send join request to master #15373

rogerwelin opened this issue Dec 10, 2015 · 10 comments

Comments

@rogerwelin
Copy link

I'm running 2 clusters and cannot get a tribe node to connect to them. Im running ES 2.1 on every ES nodes, including the tribe node.

ClusterA has only one node and clusterB has only one node

ClusterA:
cluster.name: clusterA
node.name: nodeA
node.master: true
node.data: true
network.host: 0.0.0.0
http.port: 9200
transport.tcp.port: 9300
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["dns-name-for-clusterA:9300"]

Configuration for tribe node (for simplicity sake I only trying here to connect to clusterA):

cluster.name: elasticsearch-tribe
node.name: tribe-node
tribe.t1.path.conf: /etc/elasticsearch/tribe.yml #14573
tribe.t1.cluster.name: clusterA
tribe.t1.discovery.zen.multicast.enabled: false
tribe.t1.discovery.zen.ping.unicast.hosts: ["dns-name-for-clusterA:9300"]
network.host: 0.0.0.0
http.port: 9201
transport.tcp.port: 9301

When starting both tribe and cluster this is shown in the ES tribe log:

[2015-12-10 16:30:29,576][INFO ][discovery ] [tribe-node/t1] clusterA/bcjwWrFiRw6ixzNxW9pLRA
[2015-12-10 16:30:48,723][INFO ][discovery.zen ] [tribe-node/t1] failed to send join request to master [{....}{R-LGKQuaRaObh89PJbmPig}{ip-number}{ip-number:9300}{max_local_storage_nodes=1, master=true}], reason [RemoteTransportException[[dns-name][10.85.96.85:9300][internal:discovery/zen/join]]; nested: IllegalStateException[Node [{dns-name}{R-LGKQuaRaObh89PJbmPig}{ip-number}{ip-number:9300}{max_local_storage_nodes=1, master=true}] not master for join request]; ]
[2015-12-10 16:30:59,577][WARN ][discovery ] [tribe-node/t1] waited for 30s and no initial state was set by the discovery
[2015-12-10 16:30:59,579][INFO ][node ] [tribe-node/t1] started
[2015-12-10 16:30:59,579][INFO ][node ] [tribe-node] started
[2015-12-10 16:32:25,784][INFO ][discovery.zen ] [tribe-node/t1] failed to send join request to master [{ip-number}{R-LGKQuaRaObh89PJbmPig}{ip-number}{ip-number:9300}{max_local_storage_nodes=1, master=true}], reason [RemoteTransportException[[dns-name][ipnumber:9300][internal:discovery/zen/join]]; nested: IllegalStateException[Node [{dnsname}{R-LGKQuaRaObh89PJbmPig}{ipnumber}{ipnumber:9300}{max_local_storage_nodes=1, master=true}] not master for join request]; ]

And on clusterA:s log it adds the tribe node and removes it repeatedly:

[2015-12-10 16:32:28,807][INFO ][cluster.service ] [dnsname] new_master {dnsname}{R-LGKQuaRaObh89PJbmPig}{ip-number}{ip-number:9300}{max_local_storage_nodes=1, master=true}, r
eason: zen-disco-join(elected_as_master, [0] joins received)
[2015-12-10 16:32:28,849][INFO ][cluster.service ] [dnsname] added {{tribe-node/t1}{bcjwWrFiRw6ixzNxW9pLRA}{127.0.0.1}{127.0.0.1:9300}{data=false, client=true},}, reason: zen-disco-join(join from
node[{tribe-node/t1}{bcjwWrFiRw6ixzNxW9pLRA}{127.0.0.1}{127.0.0.1:9300}{data=false, client=true}])
[2015-12-10 16:32:58,854][WARN ][discovery.zen.publish ] [dnsname] timed out waiting for all nodes to process published state [14](timeout [30s], pending nodes: [{tribe-node/t1}{bcjwWrFiRw6ixzNxW9pLRA
}{127.0.0.1}{127.0.0.1:9300}{data=false, client=true}])
[2015-12-10 16:32:58,857][WARN ][cluster.service ] [dnsname] cluster state update task [zen-disco-join(join from node[{tribe-node/t1}{bcjwWrFiRw6ixzNxW9pLRA}{127.0.0.1}{127.0.0.1:9300}{data=false
, client=true}])] took 30s above the warn threshold of 30s
[2015-12-10 16:32:58,860][INFO ][cluster.service ] [dnsname] removed {{tribe-node/t1}{bcjwWrFiRw6ixzNxW9pLRA}{127.0.0.1}{127.0.0.1:9300}{data=false, client=true},}, reason: zen-disco-node_failed(
{tribe-node/t1}{bcjwWrFiRw6ixzNxW9pLRA}{127.0.0.1}{127.0.0.1:9300}{data=false, client=true}), reason failed to ping, tried [3] times, each with maximum [30s] timeout

What am I doing wrong here? I've tried several different configuration changes but cannot get it to work. Also I have had no problem with the exact same setup wiith ES 1.4.4

@ywelsch
Copy link
Contributor

ywelsch commented Dec 13, 2015

Looking at the following line in the log of cluster A

[2015-12-10 16:32:28,849][INFO ][cluster.service ] [dnsname] added {{tribe-node/t1}{bcjwWrFiRw6ixzNxW9pLRA}{127.0.0.1}{127.0.0.1:9300}{data=false, client=true},}, reason: zen-disco-join(join from
node[{tribe-node/t1}{bcjwWrFiRw6ixzNxW9pLRA}{127.0.0.1}{127.0.0.1:9300}{data=false, client=true}])

it seems that tribe-node/t1 has as publish address 127.0.0.1:9300. Cluster A cannot find the tribe node under that address (as I assume it's on a different host).

As of ES 2.0, Elasticsearch will only bind to localhost per default (see https://www.elastic.co/blog/elasticsearch-unplugged). The solution is to define network.host for the tribe node (tribe.t1.network.host). It does not inherit this setting from the top-level network.host setting.

@clintongormley
Copy link

network.host: 0.0.0.0 causes it to bind to localhost.

@clintongormley
Copy link

Whoops - that is incorrect. 0.0.0.0 should bind to available interfaces I think. Hopefully this will be fixed by #15300

Could you let us know if it does once 2.1.1 is out?

@Zenexer
Copy link

Zenexer commented Dec 20, 2015

Change to:

network.bind_host: 0.0.0.0
network.publish_host: A.SPECIFIC.IP.ADDRESS

Working, but unintuitive, and it's a major change from 1.x that isn't well-publicized.

@thn-dev
Copy link

thn-dev commented Dec 31, 2015

WRT ES v2.1.1, I have to do the following to get the tribe node talking to two different clusters: cluster A and cluster B

# tribe node's configuration (elasticsearch.yml)
network.host: 0.0.0.0
transport.tcp.port: 9300
http.port: 9200
http.enabled: true

tribe.t1.cluster.name:
tribe.t1.discovery.zen.ping.unicast.hosts: <cluster A's master node>
tribe.t1.discovery.zen.ping.multicast.enabled: false
tribe.t1.path.conf: <valid path/to/conf>
tribe.t1.path.plugins: <valid path/to/plugin>
tribe.t1.network.bind_host: 0.0.0.0
tribe.t1.network.publish_host: <tribe node's IP>
tribe.t1.transport.tcp.port:

repeat the same block but replace "t1" to "t2" for cluster B and fill in proper info related to cluster B but keep the tribe.t2.network.* the same with different tribe.t2.transport.tcp.port value from t1 if specified

@clintongormley
Copy link

The networking docs have been greatly improved. i don't think there is any more to do here, so I'll close

@chinmoydas1
Copy link

@tri Nguyen
The post is helpful, but I am not able to fix the issue, even with the suggested configuration. Qns are as below:

  1. tribe.t1.discovery.zen.ping.unicast.hosts: <cluster A's master node> -- Is it cluster A's master node's host:port or just the node name?
  2. Are the following two lines mandatory in the tribes elasticsearch.yml:
    tribe.t1.path.conf: <valid path/to/conf>
    tribe.t1.path.plugins: <valid path/to/plugin>
  3. tribe.t1.transport.tcp.port: -- transport.tcp.port was mentioned above as 9300, what else can be mentioned for tribe.t1.transport.tcp.port?

@thn-dev
Copy link

thn-dev commented May 9, 2016

regarding t1..unicast.hosts, if you only provide cluster A's master node's host name or IP address, it will try to connect using the default port number. I would specify both here to make sure you have the correct settings.

regarding .path. parameters, I did not have to do it when I was using v1.7.3 but had to explicitly set it in v2.1.1 (even though I was told that I don't need to set them) I have not tested v2.3.x so I can't tell much here but I expect they fix that bug since it was reported as a bug at some point (if I remembered correctly)

regarding *.t1.transport.tcp.port, you can ignore this one

@chinmoydas1
Copy link

@tri Nguyen:
tribe.t1.path.conf: I hope this is the config folder, as I could not find any folder named conf in elasticsearch 2.1.1.

@thn-dev
Copy link

thn-dev commented May 10, 2016

yes it is... if you install ES using .rpm file, by default, it's in /etc/elasticsearch

Here is the link for ES 2.1 that you should be using
https://www.elastic.co/guide/en/elasticsearch/reference/2.1/setup-dir-layout.html

Anyway, this ticket has been closed, I suggest you use the "discussion"
board from ES... the link below from the discussion is somewhat related to
what you are looking for

https://discuss.elastic.co/t/tribe-node-connect-to-specific-ips/45721/10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants