elasticsearch fails to start tribe node #14573

kt97679 · 2015-11-06T01:17:02Z

Hi folks,

I'm trying to start tribe node using following config:

transport.tcp.port: 9301
http.port: 9201
network.host: 0.0.0.0
path.data: /var/lib/elasticsearch/
path.logs: /var/log/elasticsearch/

tribe:
    kibana:
        cluster.name: logstash-kibana
        discovery.zen.ping.multicast.enabled: false
        discovery.zen.ping.unicast.hosts: ["127.0.0.1"]
    els:
        cluster.name: logstash-data
        discovery.zen.ping.multicast.enabled: false
        discovery.zen.ping.unicast.hosts: ["10.128.69.48", "10.128.75.237"]

This config resides in the file /etc/tribe-elasticseach/elasticsearch.yml. I'm starting it using following command:

sudo -u elasticsearch /usr/share/elasticsearch/bin/elasticsearch -Ddefault.path.conf=/etc/tribe-elasticsearch/

Elasticsearch fails with following output:

[2015-11-05 17:07:42,433][INFO ][node                     ] [Bucky] version[2.0.0], pid[25943], build[de54438/2015-10-22T08:09:48Z]
[2015-11-05 17:07:42,434][INFO ][node                     ] [Bucky] initializing ...
[2015-11-05 17:07:42,596][INFO ][plugins                  ] [Bucky] loaded [], sites []
Exception in thread "main" java.security.AccessControlException: access denied ("java.io.FilePermission" "/usr/share/elasticsearch/config/elasticsearch.yml" "read")
        at java.security.AccessControlContext.checkPermission(AccessControlContext.java:457)
        at java.security.AccessController.checkPermission(AccessController.java:884)
        at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
        at java.lang.SecurityManager.checkRead(SecurityManager.java:888)
        at sun.nio.fs.UnixPath.checkRead(UnixPath.java:795)
        at sun.nio.fs.UnixFileSystemProvider.checkAccess(UnixFileSystemProvider.java:290)
        at java.nio.file.Files.exists(Files.java:2385)
        at org.elasticsearch.node.internal.InternalSettingsPreparer.prepareEnvironment(InternalSettingsPreparer.java:87)
        at org.elasticsearch.node.Node.(Node.java:128)
        at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:145)
        at org.elasticsearch.tribe.TribeService.(TribeService.java:136)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at <<>>
        at org.elasticsearch.node.Node.(Node.java:198)
        at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:145)
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:170)
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:270)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35)

I'm not sure why it tries to access /usr/share/elasticsearch/config/elasticsearch.yml. There is no such file in the elasticsearch deb package. I created this file, but command above still fails with same output. Please advise how this can be resolved.

I'm running elasticsearch 2.0.0 installed from the debian package downloaded from the official site. I'm using ubuntu 14.

Thanks,
Kirill.

clintongormley · 2015-11-09T11:36:07Z

You're specifying the custom config file location incorrectly.

See https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_20_setting_changes.html#_custom_config_file

kt97679 · 2015-11-09T19:05:07Z

Hi @clintongormley , thanks for quick response. As you can see from the description I've provided I was using option -Ddefault.path.conf. I tried again same command with option --path.conf. There was no exception because of config access issue, but I had to specify also --path.data and --path.logs because for some reason those settings were ignored in the config I've provided. In my config I also specify nonstandard ports to use and those settings are also not used. Any advise what can be wrong?

Thanks,
Kirill.

kt97679 · 2015-11-09T19:23:19Z

Looks like config is ignored completely. If I specify all options via command line I still get exception like above:

# sudo -u elasticsearch /usr/share/elasticsearch/bin/elasticsearch --path.conf=/etc/tribe-elasticsearch/ --path.logs=/var/log/elasticsearch --path.data=/var/lib/elasticsearch/ --transport.tcp.port=9301 --http.port=9201 --network.host=0.0.0.0 --tribels.cluster.name=logstash-data --tribe.els.discovery.zen.ping.multicast.enabled=false --tribe.els.discovery.zen.ping.unicast.hosts=["10.128.69.48","10.128.75.237"]                                                                                                              
log4j:WARN No appenders could be found for logger (bootstrap).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.security.AccessControlException: access denied ("java.io.FilePermission" "/usr/share/elasticsearch/config/elasticsearch.yml" "read")
        at java.security.AccessControlContext.checkPermission(AccessControlContext.java:457)
        at java.security.AccessController.checkPermission(AccessController.java:884)
        at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
        at java.lang.SecurityManager.checkRead(SecurityManager.java:888)
        at sun.nio.fs.UnixPath.checkRead(UnixPath.java:795)
        at sun.nio.fs.UnixFileSystemProvider.checkAccess(UnixFileSystemProvider.java:290)
        at java.nio.file.Files.exists(Files.java:2385)
        at org.elasticsearch.node.internal.InternalSettingsPreparer.prepareEnvironment(InternalSettingsPreparer.java:87)
        at org.elasticsearch.node.Node.(Node.java:128)
        at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:145)
        at org.elasticsearch.tribe.TribeService.(TribeService.java:136)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at <<>>
        at org.elasticsearch.node.Node.(Node.java:198)
        at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:145)
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:170)
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:270)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35)

clintongormley · 2015-11-17T12:55:24Z

Thanks for persisting. I've managed to replicate this and it is indeed a bug.

When the tribe node attempts to instantiate a node for the tribe service, it checks for access to the config directory, but that setting is no longer available to it and so it defaults to checking for path.home.

This can be replicated with a simple config file, saved as foo/elasticsearch.yml:

node.name: foo

tribe:
    foo:
        cluster.name: bar

Start elasticsearch as:

./elasticsearch-2.0.0/bin/elasticsearch --path.conf foo/

And it fails with:

[2015-11-17 13:54:47,763][INFO ][node                     ] [foo] version[2.0.0], pid[5940], build[de54438/2015-10-22T08:09:48Z]
[2015-11-17 13:54:47,763][INFO ][node                     ] [foo] initializing ...
[2015-11-17 13:54:47,836][INFO ][plugins                  ] [foo] loaded [], sites []
Exception in thread "main" java.security.AccessControlException: access denied ("java.io.FilePermission" "/Users/clinton/workspace/servers/elasticsearch-2.0.0/config/elasticsearch.yml" "read")
  at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
  at java.security.AccessController.checkPermission(AccessController.java:884)
  at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
  at java.lang.SecurityManager.checkRead(SecurityManager.java:888)
  at sun.nio.fs.UnixPath.checkRead(UnixPath.java:795)
  at sun.nio.fs.UnixFileSystemProvider.checkAccess(UnixFileSystemProvider.java:290)
  at java.nio.file.Files.exists(Files.java:2385)
  at org.elasticsearch.node.internal.InternalSettingsPreparer.prepareEnvironment(InternalSettingsPreparer.java:87)
  at org.elasticsearch.node.Node.<init>(Node.java:128)
  at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:145)
  at org.elasticsearch.tribe.TribeService.<init>(TribeService.java:136)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
  at <<<guice>>>
  at org.elasticsearch.node.Node.<init>(Node.java:198)
  at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:145)
  at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:170)
  at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:270)
  at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35)

clintongormley · 2015-11-17T13:02:57Z

@javanna could you take a look at this please?

javanna · 2015-11-17T18:36:50Z

I had a look at this. Only selected settings are forwarded to the inner tribe clients from the tribe node. path.home is one of them but path.conf is not. That said, if I remember correctly the tribe clients shouldn't read from configuration file (and sysprops) but only inherit a few settings from the parent node (like it happens in TribeService), something that we had enforced with #9721. I think something got lost with #13383 where loadConfigSettings was removed, which was our way to prevent loading anything from the config file. With that set to false I believe we wouldn't even check for the existence of the file, thus we wouldn't need any permission for that. At this point it seems to me that we would have to forward path.conf to the tribe clients just because we are going to check for its existence at some point although we have nothing to load from it (otherwise we check for path.home that we have no permissions for)? I think I'd need @rjernst to verify if what I explained makes any sense, it might be that I overlooked something.

rjernst · 2015-11-17T21:21:02Z

If I understand the tribe node correctly, it is no different than any other client node (well, creating multiple client nodes internally). So to me, it should be passing along any settings it needs to configure the node (including path.conf). However, I'm not sure what this has to do with the transport client? The transport client by definition now does not use the config file settings (and the stack trace shown above indicates the exception was from building a node, not a transport client).

javanna · 2015-11-18T17:50:54Z

However, I'm not sure what this has to do with the transport client?

@rjernst it doesn't have to do directly with the transport client, but the inner tribe nodes have a similar requirement when it comes to loading from config file. They should not be reading out of the config file but only inherit some selected settings from their "parent" node (the actual tribe node), and that is why we were previously setting loadConfigSettings to false, which is now removed though. If my analysis is correct security manager barfs because we check if the config file exists while creating inner client nodes as part of TribeService, but we shouldn't need to read from that file at that point anyway. I could forward the path.conf setting to the client nodes too, but I feel it is not the right fix given that we should not be reading from that file nor check if it exists. Not sure what the right fix is though.

javanna · 2015-11-19T14:28:21Z

I looked deeper, I can confirm this is not just a problem around passing in the right path.conf to the inner nodes. The inner client nodes must not read from the main configuration file, something that was fixed in #9721. The option to not load from config settings for a node was though removed with #13383. I had expected TribeUnitTests to fail after that change but it doesn't unfortunately. If you try setting for instance transport.tcp.port in the configuration file, the tribe node will get that port, but the inner nodes will try to get that one too and will fail. The inner nodes should only get some selected settings from their parent node but never read from config file or system properties.

ESamir · 2015-11-19T15:21:35Z

+1
Removing the path.conf did not resolve the issue

the config used

bootstrap:
  mlockall: true
cluster:
  name: tribe.elk.h2.com
discovery:
  zen:
    minimum_master_nodes: 2
    ping:
      unicast:
        hosts:
             - h2-clt01
             - h2-clt02
             - h2-clt03
network:
  host: h2-clt01
node:
  data: false
  master: true
  name: h2-ct01-h2-ct01
path:
  data: /data/h2-ct01
tribe:
  h2:
    cluster:
      name: elk.h2.com
    discovery:
      zen:
        ping:
          unicast:
            hosts:
                 - h2-cm01
                 - h2-cm02
                 - h2-cm03
  h3:
    cluster:
      name: elk.h3.com
    discovery:
      zen:
        ping:
          unicast:
            hosts:
                 - h3-cm01
                 - h3-cm02
                 - h3-cm03

clintongormley · 2015-11-19T15:42:04Z

There is a workaround for this bug. Assuming your tribe config directory is /etc/tribe/:

cd /etc
cp -a /etc/tribe /etc/tribe-client
echo "" > /etc/tribe-client/elasticsearch.yml
chown -R elasticsearch /etc/tribe-client

Then edit /etc/tribe/elasticsearch.yml and specify a path.conf for each tribe cluster, eg:

# arbitrary config
transport.tcp.port: 9301
http.port: 9201
network.host: 0.0.0.0
path.data: /var/lib/elasticsearch/
path.logs: /var/log/elasticsearch/

tribe:
    kibana:
        path.conf: /etc/tribe-client  ### ADD THIS LINE
        cluster.name: logstash-kibana
        discovery.zen.ping.multicast.enabled: false
        discovery.zen.ping.unicast.hosts: ["127.0.0.1"]
    els:
        path.conf: /etc/tribe-client  ### ADD THIS LINE
        cluster.name: logstash-data
        discovery.zen.ping.multicast.enabled: false
        discovery.zen.ping.unicast.hosts: ["10.128.69.48", "10.128.75.237"]

Then start elasticsearch as:

./bin/elasticsearch --path.conf /etc/tribe

The tribe node will use /etc/tribe/ as its config directory. Then the tribe node starts a node client for each cluster, and will use /etc/tribe-client as its config directory, but /etc/tribe-client/elasticsearch.yml is empty, so no settings will be loaded.

javanna · 2015-11-19T16:24:43Z

Workaround above works, the only caveat is that depending on where the additional empty configuration file is located, we might not have the permissions to read from it. I think it should work if we simply add an empty configuration file under the tribe node config and point right to it, not only specifying its parent directory but the complete path that includes the filename:

tribe.t1.path.conf: /path/to/config/tribe.yml
tribe.t2.path.conf: /path/to/config/tribe.yml

ppf2 · 2015-11-19T17:48:42Z

Ran into this last night when attempting to set up a tribe node on 2.0. This will also affect users who attempt to set a custom transport.tcp.port for the tribe node. In this case, setting a custom transport.tcp.port for the tribe node causes a misleading BindException[Address already in use]; exception when the port specified is not actually already in use.

cluster.name: elasticsearch_2_0_0_tribe_cluster
network.host: 127.0.0.1
transport.tcp.port: 11111
node.name: tribe_cluster_node1
tribe:
  t1:
    cluster.name: elasticsearch_2_0_0_cluster1
  t2:
    cluster.name: elasticsearch_2_0_0_cluster2

Settings for the 2 clusters:

cluster.name: elasticsearch_2_0_0_cluster2
network.host: 127.0.0.1
transport.tcp.port: 9301
http.port: 9201
node.name: cluster2_node1

and

cluster.name: elasticsearch_2_0_0_cluster1
network.host: 127.0.0.1
transport.tcp.port: 9300
http.port: 9200
node.name: cluster1_node1

The problem is that the tribe node will not start up as long as I have the transport.tcp.port: 11111 in place. If I don't set a custom transport port for the tribe node, it starts up fine and can connect with the 2 clusters.

The following is the error that shows up when I attempt to set transport.tcp.port for the tribe node. Note that prior to starting the tribe node, I used lsof to confirm that there's no process on the machine using port 11111 (and it doesn't matter what port I set it to, as long as transport.tcp.port is set for the tribe node, it will throw the same bind exception).

[2015-11-19 01:09:12,816][DEBUG][discovery.zen.elect      ] [tribe_cluster_node1/t1] using minimum_master_nodes [-1]

[2015-11-19 01:09:12,816][DEBUG][discovery.zen.ping.unicast] [tribe_cluster_node1/t1] using initial hosts [127.0.0.1, [::1]], with concurrent_connects [10]

[2015-11-19 01:09:12,817][DEBUG][discovery.zen            ] [tribe_cluster_node1/t1] using ping.timeout [3s], join.timeout [1m], master_election.filter_client [true], master_election.filter_data [false]

[2015-11-19 01:09:12,817][DEBUG][discovery.zen.fd         ] [tribe_cluster_node1/t1] [master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]

[2015-11-19 01:09:12,817][DEBUG][discovery.zen.fd         ] [tribe_cluster_node1/t1] [node  ] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]

[2015-11-19 01:09:12,820][DEBUG][script                   ] [tribe_cluster_node1/t1] using script cache with max_size [100], expire [null]

[2015-11-19 01:09:12,853][DEBUG][cluster.routing.allocation.decider] [tribe_cluster_node1/t1] using [cluster.routing.allocation.allow_rebalance] with [indices_all_active]

[2015-11-19 01:09:12,853][DEBUG][cluster.routing.allocation.decider] [tribe_cluster_node1/t1] using [cluster_concurrent_rebalance] with [2]

[2015-11-19 01:09:12,854][DEBUG][cluster.routing.allocation.decider] [tribe_cluster_node1/t1] using node_concurrent_recoveries [2], node_initial_primaries_recoveries [4]

[2015-11-19 01:09:12,855][DEBUG][gateway                  ] [tribe_cluster_node1/t1] using initial_shards [quorum]

[2015-11-19 01:09:12,885][DEBUG][indices.recovery         ] [tribe_cluster_node1/t1] using max_bytes_per_sec[40mb], concurrent_streams [3], file_chunk_size [512kb], translog_size [512kb], translog_ops [1000], and compress [true]

[2015-11-19 01:09:12,886][DEBUG][indices.store            ] [tribe_cluster_node1/t1] using indices.store.throttle.type [NONE], with index.store.throttle.max_bytes_per_sec [10gb]

[2015-11-19 01:09:12,886][DEBUG][indices.memory           ] [tribe_cluster_node1/t1] using indexing buffer size [99mb], with indices.memory.min_shard_index_buffer_size [4mb], indices.memory.max_shard_index_buffer_size [512mb], indices.memory.shard_inactive_time [5m], indices.memory.interval [30s]

[2015-11-19 01:09:12,887][DEBUG][indices.cache.query      ] [tribe_cluster_node1/t1] using [node] query cache with size [10%], actual_size [99mb], max filter count [1000]

[2015-11-19 01:09:12,887][DEBUG][indices.fielddata.cache  ] [tribe_cluster_node1/t1] using size [-1] [-1b], expire [null]

[2015-11-19 01:09:12,897][INFO ][node                     ] [tribe_cluster_node1/t1] initialized

[2015-11-19 01:09:12,906][INFO ][node                     ] [tribe_cluster_node1] initialized

[2015-11-19 01:09:12,907][INFO ][node                     ] [tribe_cluster_node1] starting ...

[2015-11-19 01:09:12,924][DEBUG][netty.channel.socket.nio.SelectorUtil] Using select timeout of 500

[2015-11-19 01:09:12,924][DEBUG][netty.channel.socket.nio.SelectorUtil] Epoll-bug workaround enabled = false

[2015-11-19 01:09:12,947][DEBUG][transport.netty          ] [tribe_cluster_node1] using profile[default], worker_count[8], port[11111], bind_host[null], publish_host[null], compress[false], connect_timeout[30s], connections_per_node[2/3/6/1/1], receive_predictor[512kb->512kb]

[2015-11-19 01:09:12,957][DEBUG][transport.netty          ] [tribe_cluster_node1] binding server bootstrap to: 127.0.0.1

[2015-11-19 01:09:12,985][DEBUG][transport.netty          ] [tribe_cluster_node1] Bound profile [default] to address {127.0.0.1:11111}

[2015-11-19 01:09:12,986][INFO ][transport                ] [tribe_cluster_node1] publish_address {127.0.0.1:11111}, bound_addresses {127.0.0.1:11111}

[2015-11-19 01:09:12,993][DEBUG][discovery.local          ] [tribe_cluster_node1] Connected to cluster [Cluster [elasticsearch_2_0_0_tribe_cluster]]

[2015-11-19 01:09:12,996][INFO ][discovery                ] [tribe_cluster_node1] elasticsearch_2_0_0_tribe_cluster/baK4hDMwRiaKGS5D8ivYng

[2015-11-19 01:09:12,996][WARN ][discovery                ] [tribe_cluster_node1] waited for 0s and no initial state was set by the discovery

[2015-11-19 01:09:12,996][DEBUG][gateway                  ] [tribe_cluster_node1] can't wait on start for (possibly) reading state from gateway, will do it asynchronously

[2015-11-19 01:09:13,010][DEBUG][http.netty               ] [tribe_cluster_node1] Bound http to address {127.0.0.1:22222}

[2015-11-19 01:09:13,011][INFO ][http                     ] [tribe_cluster_node1] publish_address {127.0.0.1:22222}, bound_addresses {127.0.0.1:22222}

[2015-11-19 01:09:13,011][INFO ][node                     ] [tribe_cluster_node1/t2] starting ...

[2015-11-19 01:09:13,016][DEBUG][transport.netty          ] [tribe_cluster_node1/t2] using profile[default], worker_count[8], port[11111], bind_host[null], publish_host[null], compress[false], connect_timeout[30s], connections_per_node[2/3/6/1/1], receive_predictor[512kb->512kb]

[2015-11-19 01:09:13,022][DEBUG][transport.netty          ] [tribe_cluster_node1/t2] binding server bootstrap to: 127.0.0.1

[2015-11-19 01:09:13,039][INFO ][node                     ] [tribe_cluster_node1/t2] stopping ...

[2015-11-19 01:09:13,041][INFO ][node                     ] [tribe_cluster_node1/t2] stopped

[2015-11-19 01:09:13,042][INFO ][node                     ] [tribe_cluster_node1/t2] closing ...

[2015-11-19 01:09:13,048][INFO ][node                     ] [tribe_cluster_node1/t2] closed

[2015-11-19 01:09:13,048][INFO ][node                     ] [tribe_cluster_node1/t1] closing ...

[2015-11-19 01:09:13,052][INFO ][node                     ] [tribe_cluster_node1/t1] closed

Exception in thread "main" BindTransportException[Failed to bind to [11111]]; nested: ChannelException[Failed to bind to: /127.0.0.1:11111]; nested: BindException[Address already in use];

Likely root cause: java.net.BindException: Address already in use

at sun.nio.ch.Net.bind0(Native Method)

at sun.nio.ch.Net.bind(Net.java:444)

at sun.nio.ch.Net.bind(Net.java:436)

at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)

at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)

at org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)

at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)

at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)

at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)

at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Refer to the log for complete error details.

[2015-11-19 01:09:13,058][INFO ][node                     ] [tribe_cluster_node1] stopping ...

[2015-11-19 01:09:13,064][INFO ][node                     ] [tribe_cluster_node1] stopped

[2015-11-19 01:09:13,064][INFO ][node                     ] [tribe_cluster_node1] closing ...

[2015-11-19 01:09:13,066][INFO ][node                     ] [tribe_cluster_node1] closed

Note that I cannot reproduce this on 1.7.2. On 1.7.2, I can set up a custom transport.tcp.port for the tribe node and it will start up fine.

javanna · 2015-11-19T23:21:47Z

@ppf2 this happens because the tribe node process will start three nodes, the first one will get the configured port, and the second will try to get the same one as it reads from the same configuration file. The workaround provided by Clint above should work till we fix this properly.

rjernst · 2015-11-19T23:44:40Z

@javanna I am going to explore having the tribe node have its own subclass of Node which can customize this single behavior (how to get the node's settings). I don't think we should add back this general purpose flag as we need to keep the tons of ways Nodes can be configured to a minimum.

javanna · 2015-11-20T00:03:33Z

@rjernst thanks that sounds good to me.

ppf2 · 2015-11-20T00:03:35Z

Confirmed that the workaround works to prevent the BindTransportException error, thx!

ppf2 · 2015-11-20T00:27:10Z

@rjernst Do we have a sense of whether the fix will make it to the upcoming 2.1 release? Or will it likely be after 2.1 (i.e. use the workaround until a later 2.x release)?

rjernst · 2015-11-20T01:10:11Z

@ppf2 Definitely after 2.1. I would not want to destabilize 2.1 with a refactoring like this.

ppf2 · 2015-11-20T01:12:15Z

@rjernst sounds good, thx!

clintongormley · 2015-12-02T09:12:57Z

This requires some fairly extensive changes, so we will target this for 2.2. In the meantime, we should document the workaround in the 2.1 docs.

rjernst · 2015-12-08T04:22:34Z

I opened a PR to fix this here: #15300.

Note that I was able to do the fix simply enough that I think it will be ok to backport to 2.1.x

clintongormley · 2015-12-09T12:15:45Z

thanks @rjernst

ppf2 · 2015-12-10T02:26:47Z

Thanks @rjernst !

lb425 · 2015-12-12T23:02:06Z

I'm late to the party but thought this might be useful for anyone coming across this. I found that the dummy config file isn't needed to work around the issue. Instead for creating a new directory (/etc/tribe-client in the example) path.conf can reference the current configuration directory.

Using the above example where the config directory was /etc/tribe

arbitrary config

transport.tcp.port: 9301
http.port: 9201
network.host: 0.0.0.0
path.data: /var/lib/elasticsearch/
path.logs: /var/log/elasticsearch/

tribe:
kibana:
path.conf: /etc/tribe #
cluster.name: logstash-kibana
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["127.0.0.1"]
els:
path.conf: /etc/tribe #
cluster.name: logstash-data
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.128.69.48", "10.128.75.237"]

kt97679 · 2015-12-22T00:25:55Z

Is this fixed in 2.1.1?

thn-dev · 2015-12-30T10:50:28Z

With v2.1.1, I still have to specify path.conf and I used the valid path as mentioned above by lb425. In my case, I also had to specify path.plugins for similar reason. Otherwise, I kept getting AccessControlException error.

I did not have to specify both path.conf and path.plugins when I was using v1.7.3

thn-dev · 2015-12-31T14:20:17Z

WRT ES v2.1.1, I have to do the following to get the tribe node talking to two different clusters: cluster A and cluster B

# tribe node's configuration (elasticsearch.yml)
network.host: 0.0.0.0
transport.tcp.port: 9300
http.port: 9200
http.enabled: true

tribe.t1.cluster.name:
tribe.t1.discovery.zen.ping.unicast.hosts: <cluster A's master node>
tribe.t1.discovery.zen.ping.multicast.enabled: false
tribe.t1.path.conf: <valid path/to/conf>
tribe.t1.path.plugins: <valid path/to/plugin>
tribe.t1.network.bind_host: 0.0.0.0
tribe.t1.network.publish_host: <tribe node's IP>
tribe.t1.transport.tcp.port:

repeat the same block but replace "t1" to "t2" for cluster B and fill in proper info related to cluster B but keep the tribe.t2.network.* the same with different tribe.t2.transport.tcp.port value from t1 if specified

rjernst · 2015-12-31T19:12:18Z

@thn-dev Setting network and path settings for tribe nodes (the t1, t2 here) should not be necessary. Can you share your full elasticsearch.yml for both the tribe node, as well as cluster A and cluster B?

thn-dev · 2016-01-01T00:56:30Z

@rjernst I did not have to do network and path settings when I was using v1.7.3. It was a surprise to me when v2.1.1 kept giving me AccessControlException error message. Initially, it pointed to the "plugins" location, after I set it, it complained about the "config" location. If I did not do the network settings for t1 and t2, it was not able to connect to cluster A and/or B. This part is weird too. Again, I did not have to do this in v1.7.3.

All ES instances are installed using .rpm file, not .zip file.

My settings for tribe node is above with additional parameters

cluster.name
discovery.zen.ping.multicast.enabled: false

Cluster A and B, each has 1 master node, 3 data nodes with the following parameters' settings (I don't have all information with me at the moment)

cluster.name:
network.host: 0.0.0.0
transport.tcp.port: 9300
http.port: 9200 (master)
http.enabled: true (master)
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: <master node's IP>
path.conf: /data/es/config
path.plugins: /data/es/plugins
path.data: /data/es

rjernst · 2016-01-05T06:29:36Z

@thn-dev I tried a very minimal configuration with both 2.1.1, and the 2.2 branch. The tribe settings necessary were only cluster.name and discovery.zen.ping.unicast.hosts. If you can reproduce, please create a new issue.

thn-dev · 2016-01-06T18:47:45Z

@rjernst Thank you for looking into it. As I mentioned before, I did not have to do that in v1.7.3. One thing I do know when I upgraded ES from v1.7.3 to 2.1.1, I did "rpm -Uvh elasticsearch-.rpm" instead of removing v1.7.3 completely. Everything that I have described so far is running in CentOS 6.5 or 6.7. I'm in the middle of doing a stress test right now, once I have the opportunity to redo the cluster, I will report back if installing ES 2.1.1 from scratch would make a difference or not.

Once again, thank you.

TinLe · 2016-02-29T23:12:04Z

Still broken in ES 2.2.0.

Simple setup. One tribe node, and a cluster of 7 nodes, all running ES 2.2.0.

Config for tribe.

cluster.name: psec-tribe-elasticsearch-ela4
node.name: ela4-app7246
node.master: false
node.data: false
node.max_local_storage_nodes: 1

path.data: /export/content/data/
path.plugins: /export/content/lid/apps/psec-tribe-elasticsearch/i001/plugins

################################## Tribe ################################
tribe:
   prod-ltx1_psec-elasticsearch:
     cluster.name: psec-elasticsearch_prod-ltx1
     path.conf: /export/content/lid/apps/psec-tribe-elasticsearch/i001/elasticsearch/config/psec-tribe-elasticsearch/
     path.home: /export/content/lid/apps/psec-tribe-elasticsearch/i001/
     path.plugins: /export/content/lid/apps/psec-tribe-elasticsearch/i001/plugins/
     discovery.zen.ping.unicast.hosts:
      - ltx1-app9624
      - ltx1-app9495

Cluster is up and running, reachable.

Log from tribe node when ES is started.

[2016-02-29 23:04:22,694][INFO ][node                     ] [ela4-app7246.prod] version[2.2.0], pid[17205], build[8ff36d1/2016-01-27T13:32:39Z]
[2016-02-29 23:04:22,694][INFO ][node                     ] [ela4-app7246.prod] initializing ...
[2016-02-29 23:04:22,925][INFO ][plugins                  ] [ela4-app7246.prod] modules [], plugins [license, kopf], sites [kopf]
[2016-02-29 23:04:24,160][INFO ][node                     ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] version[2.2.0], pid[17205], build[8ff36d1/2016-01-27T13:32:39Z]
[2016-02-29 23:04:24,160][INFO ][node                     ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] initializing ...
[2016-02-29 23:04:24,236][INFO ][plugins                  ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] modules [], plugins [license, kopf], sites [kopf]
[2016-02-29 23:04:24,344][INFO ][node                     ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] initialized
[2016-02-29 23:04:24,349][INFO ][node                     ] [ela4-app7246.prod] initialized
[2016-02-29 23:04:24,349][INFO ][node                     ] [ela4-app7246.prod] starting ...
[2016-02-29 23:04:24,589][INFO ][transport                ] [ela4-app7246.prod] publish_address {ela4-app7246.prod/172.25.22.199:9300}, bound_addresses {[::]:9300}
[2016-02-29 23:04:24,594][INFO ][discovery                ] [ela4-app7246.prod] psec-tribe-elasticsearch-ela4/yNB3ZuZOSceWreWyi-825Q
[2016-02-29 23:04:24,594][WARN ][discovery                ] [ela4-app7246.prod] waited for 0s and no initial state was set by the discovery
[2016-02-29 23:04:24,618][INFO ][http                     ] [ela4-app7246.prod] publish_address {ela4-app7246.prod/172.25.22.199:9200}, bound_addresses {[::]:9200}
[2016-02-29 23:04:24,619][INFO ][node                     ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] starting ...
[2016-02-29 23:04:24,661][INFO ][transport                ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] publish_address {127.0.0.1:9301}, bound_addresses {[::1]:9301}, {127.0.0.1:9301}
[2016-02-29 23:04:24,665][INFO ][discovery                ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] lotr-elasticsearch_prod-ltx1/ZCGhRNvCQLWF8beCT3MsUw
[2016-02-29 23:04:27,881][INFO ][discovery.zen            ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] failed to send join request to master [{ltx1-app9495}{1GAZZhN8T5qprr2aalxhRw}{10.149.74.222}{10.149.74.222:9300}{max_local_storage_nodes=1, master=true}], reason [RemoteTransportException[[ltx1-app9495][10.149.74.222:9300][internal:discovery/zen/join]]; nested: ConnectTransportException[[ela4-app7246.prod/prod-ltx1_lotr-elasticsearch][127.0.0.1:9301] connect_timeout[30s]]; nested: NotSerializableExceptionWrapper[Connection refused: /127.0.0.1:9301]; ]
[2016-02-29 23:04:30,974][INFO ][discovery.zen            ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] failed to send join request to master [{ltx1-app9495}{1GAZZhN8T5qprr2aalxhRw}{10.149.74.222}{10.149.74.222:9300}{max_local_storage_nodes=1, master=true}], reason [RemoteTransportException[[ltx1-app9495][10.149.74.222:9300][internal:discovery/zen/join]]; nested: ConnectTransportException[[ela4-app7246.prod/prod-ltx1_lotr-elasticsearch][127.0.0.1:9301] connect_timeout[30s]]; nested: NotSerializableExceptionWrapper[Connection refused: /127.0.0.1:9301]; ]
[2016-02-29 23:04:34,062][INFO ][discovery.zen            ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] failed to send join request to master [{ltx1-app9495}{1GAZZhN8T5qprr2aalxhRw}{10.149.74.222}{10.149.74.222:9300}{max_local_storage_nodes=1, master=true}], reason [RemoteTransportException[[ltx1-app9495][10.149.74.222:9300][internal:discovery/zen/join]]; nested: ConnectTransportException[[ela4-app7246.prod/prod-ltx1_lotr-elasticsearch][127.0.0.1:9301] connect_timeout[30s]]; nested: NotSerializableExceptionWrapper[Connection refused: /127.0.0.1:9301]; ]

bleskes · 2016-03-01T10:49:01Z

The problem lies in the publish address of the tribe's client node - it's local host which prevents people from connecting back to the node, which is why it fails to join cluster:

[2016-02-29 23:04:24,661][INFO ][transport                ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] publish_address {127.0.0.1:9301}, bound_addresses {[::1]:9301}, {127.0.0.1:9301}

The tribe itself does bind to a non local address ela4-app7246.prod/172.25.22.199 but I don't see it in the configuration you supplied (we default to localhost in 2.x). Can you share your complete yml file?

bleskes · 2016-03-01T10:51:29Z

@TinLe another option is to supply these settings from the command line, are you perhaps doing that?

TinLe · 2016-03-01T22:20:16Z

@bleskes

The two missing lines are:

network.publish_host: ela4-app7246.prod
network.host: 0.0.0.0

In another email exchange with @sherry-ger, I got the correct settings for the tribe. I need to add the following for tribe to work.

tribe.prod-ltx1_psec-elasticsearch.network.publish_host: ela4-app7246.prod
tribe.prod-ltx1_psec-elasticsearch.network.host: 0.0.0.0

The network.publish_host setting pointing to itself is something I would not have guessed....

In any case, I got tribe working in ES 2.2.0 now.

rjernst · 2016-03-01T23:02:54Z

@TinLe I opened a PR (#16893) to fix the issue of not passing through eg network.publish_host, as well as an issue to validate per tribe client settings to avoid confusion (#16894, you don't need any of the path.* settings there and they are ignored).

thn-dev · 2016-03-01T23:31:11Z

Glad to see a fix for this. Thanks @rjernst @TinLe

TinLe · 2016-03-02T07:28:24Z

FYI, tribe node not passing on settings in elasticsearch.yml is breaking plugins.

erwan-koffi mentioned this issue Nov 7, 2015

tribe #14599

Closed

clintongormley closed this as completed Nov 9, 2015

clintongormley reopened this Nov 17, 2015

clintongormley added >bug :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts v2.1.0 labels Nov 17, 2015

clintongormley added the :Tribe Node label Nov 17, 2015

clintongormley assigned javanna Nov 17, 2015

javanna assigned rjernst and unassigned javanna Nov 20, 2015

javanna added v2.2.0 v5.0.0-alpha1 and removed v2.1.0 labels Nov 20, 2015

clintongormley removed v2.0.2 v2.1.1 labels Dec 2, 2015

rjernst mentioned this issue Dec 8, 2015

Fix tribe node to load config file for internal client nodes #15300

Merged

rjernst closed this as completed in #15300 Dec 8, 2015

javanna added the v2.1.1 label Dec 8, 2015

rogerwelin mentioned this issue Dec 10, 2015

tribe node: failed to send join request to master #15373

Closed

vsellier mentioned this issue Jun 21, 2016

ES-40 : take in acount system properties exoplatform/exo-es-embedded#6

Merged

mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020

elasticsearch fails to start tribe node #14573

elasticsearch fails to start tribe node #14573

Comments

kt97679 commented Nov 6, 2015

clintongormley commented Nov 9, 2015

kt97679 commented Nov 9, 2015

kt97679 commented Nov 9, 2015

clintongormley commented Nov 17, 2015

clintongormley commented Nov 17, 2015

javanna commented Nov 17, 2015

rjernst commented Nov 17, 2015

javanna commented Nov 18, 2015

javanna commented Nov 19, 2015

ESamir commented Nov 19, 2015

clintongormley commented Nov 19, 2015

javanna commented Nov 19, 2015

ppf2 commented Nov 19, 2015

javanna commented Nov 19, 2015

rjernst commented Nov 19, 2015

javanna commented Nov 20, 2015

ppf2 commented Nov 20, 2015

ppf2 commented Nov 20, 2015

rjernst commented Nov 20, 2015

ppf2 commented Nov 20, 2015

clintongormley commented Dec 2, 2015

rjernst commented Dec 8, 2015

clintongormley commented Dec 9, 2015

ppf2 commented Dec 10, 2015

lb425 commented Dec 12, 2015

arbitrary config

kt97679 commented Dec 22, 2015

thn-dev commented Dec 30, 2015

thn-dev commented Dec 31, 2015

rjernst commented Dec 31, 2015

thn-dev commented Jan 1, 2016

rjernst commented Jan 5, 2016

thn-dev commented Jan 6, 2016

TinLe commented Feb 29, 2016

bleskes commented Mar 1, 2016

bleskes commented Mar 1, 2016

TinLe commented Mar 1, 2016

rjernst commented Mar 1, 2016

thn-dev commented Mar 1, 2016

TinLe commented Mar 2, 2016