Skip to content
This repository has been archived by the owner on Jan 4, 2021. It is now read-only.

Federation nodes fail #290

Closed
dforste opened this issue Jun 19, 2017 · 7 comments
Closed

Federation nodes fail #290

dforste opened this issue Jun 19, 2017 · 7 comments

Comments

@dforste
Copy link

dforste commented Jun 19, 2017

When including this in the catalog it creates federation brokers however they never start up.

$nats_federation_collective_servers = [ 
  'lvsprdmco04.example.net', 
  'lvsprdmco05.example.net', 
  'lvsprdmco06.example.net'
]
$nats_collective_servers = [ 
  'lvsprdmco01.example.net', 
  'lvsprdmco02.example.net', 
  'lvsprdmco03.example.net'
]
$nats_collective = 'lvs'
mcollective_choria::federation_broker{$nats_collective:
    instances                   => 2,
    stats_base_port             => 8000,
    federation_middleware_hosts => $nats_federation_collective_servers,
    collective_middleware_hosts => $nats_collective_servers,
  }

This results in the configuration on the broker like this:

libdir = /opt/puppetlabs/mcollective/plugins
logfile = /var/log/puppetlabs/mcollective-federation_lvs_1.log
loglevel = info

securityprovider = choria
connector = nats
identity = lvsprdmco01.example.net

plugin.choria.federation.cluster = lvs
plugin.choria.federation.instance = 1 @ lvsprdmco01.example.net
plugin.choria.srv_domain = example.net
plugin.choria.federation_middleware_hosts = lvsprdmco04.example.net, lvsprdmco05.example.net, lvsprdmco06.example.net
plugin.choria.middleware_hosts = lvsprdmco01.example.net, lvsprdmco02.example.net, lvsprdmco03.example.net
plugin.choria.stats_port = 8000

In the log of the federation broker this is repeated several times:

W, [2017-06-19T15:33:07.161284 #8971]  WARN -- : base.rb:230:in `rescue in block in start' Federation Federation Broker failed: URI::InvalidURIError: bad URI(is not URI?): nats:// lvsprdmco05.example.net:
W, [2017-06-19T15:33:07.173795 #8971]  WARN -- : base.rb:230:in `rescue in block in start' Collective Federation Broker failed: URI::InvalidURIError: bad URI(is not URI?): nats:// lvsprdmco02.example.net:
W, [2017-06-19T15:33:08.161539 #8971]  WARN -- : base.rb:230:in `rescue in block in start' Federation Federation Broker failed: URI::InvalidURIError: bad URI(is not URI?): nats:// lvsprdmco05.example.net:
W, [2017-06-19T15:33:08.174116 #8971]  WARN -- : base.rb:230:in `rescue in block in start' Collective Federation Broker failed: URI::InvalidURIError: bad URI(is not URI?): nats:// lvsprdmco02.example.net:
W, [2017-06-19T15:33:09.161947 #8971]  WARN -- : base.rb:230:in `rescue in block in start' Federation Federation Broker failed: URI::InvalidURIError: bad URI(is not URI?): nats:// lvsprdmco05.example.net:
W, [2017-06-19T15:33:09.174415 #8971]  WARN -- : base.rb:230:in `rescue in block in start' Collective Federation Broker failed: URI::InvalidURIError: bad URI(is not URI?): nats:// lvsprdmco02.example.net:
W, [2017-06-19T15:33:10.162326 #8971]  WARN -- : base.rb:230:in `rescue in block in start' Federation Federation Broker failed: URI::InvalidURIError: bad URI(is not URI?): nats:// lvsprdmco05.example.net:
W, [2017-06-19T15:33:10.174786 #8971]  WARN -- : base.rb:230:in `rescue in block in start' Collective Federation Broker failed: URI::InvalidURIError: bad URI(is not URI?): nats:// lvsprdmco02.example.net:
W, [2017-06-19T15:33:11.162675 #8971]  WARN -- : base.rb:230:in `rescue in block in start' Federation Federation Broker failed: URI::InvalidURIError: bad URI(is not URI?): nats:// lvsprdmco05.example.net:
W, [2017-06-19T15:33:11.175130 #8971]  WARN -- : base.rb:230:in `rescue in block in start' Collective Federation Broker failed: URI::InvalidURIError: bad URI(is not URI?): nats:// lvsprdmco02.example.net:
W, [2017-06-19T15:33:12.162919 #8971]  WARN -- : base.rb:230:in `rescue in block in start' Federation Federation Broker failed: URI::InvalidURIError: bad URI(is not URI?): nats:// lvsprdmco05.example.net:
W, [2017-06-19T15:33:12.175424 #8971]  WARN -- : base.rb:230:in `rescue in block in start' Collective Federation Broker failed: URI::InvalidURIError: bad URI(is not URI?): nats:// lvsprdmco02.example.net:
W, [2017-06-19T15:33:13.163249 #8971]  WARN -- : base.rb:230:in `rescue in block in start' Federation Federation Broker failed: URI::InvalidURIError: bad URI(is not URI?): nats:// lvsprdmco05.example.net:
W, [2017-06-19T15:33:13.175749 #8971]  WARN -- : base.rb:230:in `rescue in block in start' Collective Federation Broker failed: URI::InvalidURIError: bad URI(is not URI?): nats:// lvsprdmco02.example.net:
W, [2017-06-19T15:33:14.163507 #8971]  WARN -- : base.rb:230:in `rescue in block in start' Federation Federation Broker failed: URI::InvalidURIError: bad URI(is not URI?): nats:// lvsprdmco05.example.net:
W, [2017-06-19T15:33:14.176092 #8971]  WARN -- : base.rb:230:in `rescue in block in start' Collective Federation Broker failed: URI::InvalidURIError: bad URI(is not URI?): nats:// lvsprdmco02.example.net:
W, [2017-06-19T15:33:15.163835 #8971]  WARN -- : base.rb:230:in `rescue in block in start' Federation Federation Broker failed: URI::InvalidURIError: bad URI(is not URI?): nats:// lvsprdmco05.example.net:
W, [2017-06-19T15:33:15.176412 #8971]  WARN -- : base.rb:230:in `rescue in block in start' Collective Federation Broker failed: URI::InvalidURIError: bad URI(is not URI?): nats:// lvsprdmco02.example.net:
W, [2017-06-19T15:33:16.164119 #8971]  WARN -- : base.rb:230:in `rescue in block in start' Federation Federation Broker failed: URI::InvalidURIError: bad URI(is not URI?): nats:// lvsprdmco05.example.net:
W, [2017-06-19T15:33:16.176684 #8971]  WARN -- : base.rb:230:in `rescue in block in start' Collective Federation Broker failed: URI::InvalidURIError: bad URI(is not URI?): nats:// lvsprdmco02.example.net:
E, [2017-06-19T15:33:16.244478 #8971] ERROR -- : stats.rb:90:in `rescue in block in start_stats_publisher' Failed to publish stats to federation choria.federation.lvs.stats: NoMethodError: undefined method `<<' for nil:NilClass

If I modify the config file like so:

plugin.choria.federation_middleware_hosts =lvsprdmco04.gspt.net:4222,lvsprdmco05.gspt.net:4222,lvsprdmco06.gspt.net:4222
plugin.choria.middleware_hosts =lvsprdmco01.gspt.net:4222,lvsprdmco02.gspt.net:4222,lvsprdmco03.gspt.net:4222

The federation broker seems to startup:

I, [2017-06-19T16:29:20.694526 #6333]  INFO -- : nats.rb:15:in `initialize' Choria NATS.io connector using pure ruby nats/io/client 0.2.2 with protocol version 1
I, [2017-06-19T16:29:20.695009 #6333]  INFO -- : server.rb:197:in `ensure in block in start' going to shutdown ...
I, [2017-06-19T16:29:20.695144 #6333]  INFO -- : server.rb:200:in `ensure in block in start' WEBrick::HTTPServer#start done.
I, [2017-06-19T16:29:20.960633 #6396]  INFO -- : config.rb:167:in `loadconfig' The Marionette Collective version 2.10.4 started by /opt/puppetlabs/bin/mco using config file /etc/puppetlabs/mcollective/federation/lvs_1.cfg
I, [2017-06-19T16:29:21.613276 #6396]  INFO -- : choria.rb:351:in `valid_certificate?' Verified certificate /CN=lvsprdmco01.gspt.net against CA /CN=Puppet CA: puppetmaster.gspt.net
I, [2017-06-19T16:29:21.633646 #6396]  INFO -- : stats.rb:109:in `block in start_stats_web_server' Listening for stats requests on localhost:8000/stats
I, [2017-06-19T16:29:21.633780 #6396]  INFO -- : server.rb:105:in `initialize' WEBrick 1.3.1
I, [2017-06-19T16:29:21.633837 #6396]  INFO -- : server.rb:106:in `initialize' ruby 2.1.9 (2016-03-30) [x86_64-linux]
I, [2017-06-19T16:29:21.633566 #6396]  INFO -- : stats.rb:78:in `start_stats_publisher' Starting statistics publisher publishing to choria.federation.lvs.stats
I, [2017-06-19T16:29:21.634510 #6396]  INFO -- : server.rb:161:in `block in start' WEBrick::HTTPServer#start: pid=6396 port=8000
I, [2017-06-19T16:29:21.639477 #6396]  INFO -- : base.rb:203:in `start_connection_and_handlers' Starting Federation Broker collective Processor lvs#1 @ lvsprdmco01.gspt.net against ["nats://lvsprdmco01.gspt.net:4222", "nats://lvsprdmco02.gspt.net:4222", "nats://lvsprdmco03.gspt.net:4222"]
I, [2017-06-19T16:29:21.640569 #6396]  INFO -- : base.rb:203:in `start_connection_and_handlers' Starting Federation Broker federation Processor lvs#1 @ lvsprdmco01.gspt.net against ["nats://lvsprdmco04.gspt.net:4222", "nats://lvsprdmco05.gspt.net:4222", "nats://lvsprdmco06.gspt.net:4222"]
I, [2017-06-19T16:29:21.741181 #6396]  INFO -- : base.rb:268:in `consume_from' Starting consuming message from choria.federation.lvs.collective
I, [2017-06-19T16:29:21.741366 #6396]  INFO -- : base.rb:153:in `block in inbox_handler' Starting inbox handler for collective
I, [2017-06-19T16:29:21.743413 #6396]  INFO -- : base.rb:268:in `consume_from' Starting consuming message from choria.federation.lvs.federation
I, [2017-06-19T16:29:21.743559 #6396]  INFO -- : base.rb:153:in `block in inbox_handler' Starting inbox handler for federation

I still do not see it connecting to the federation broker in nats-top:

Server:
  Load: CPU: 0%  Memory: 8.3M
  In:   Msgs: 8.0  Bytes: 28.8K  Msgs/Sec: 0.0  Bytes/Sec: 0.0
  Out:  Msgs: 0.0  Bytes: 0.0  Msgs/Sec: 0.0  Bytes/Sec: 0.0

Connections: 0
  HOST                 CID      SUBS    PENDING     MSGS_TO     MSGS_FROM   BYTES_TO    BYTES_FROM

Any ideas of what I might be missing?

@ripienaar
Copy link
Collaborator

hmm, I suspect the docs needs some fix, try host:4222 for each of those?

@dforste
Copy link
Author

dforste commented Jun 20, 2017

Ya the host:4222 seems to work but the federation brokers just seem to not work. I have tried serveral ways but cant get them to work. I cant use the DNS as we have multiple datacenters in one domain without split dns.

@ripienaar
Copy link
Collaborator

Will need to see logs etc. Run it in debug mode please

@ripienaar
Copy link
Collaborator

Closing this, if you later can provide debug logs feel free to re-open

@dforste
Copy link
Author

dforste commented Aug 18, 2017

Same issue new cluster.

W, [2017-08-18T18:48:13.396125 #57065]  WARN -- : base.rb:230:in `rescue in block in start' Collective Federation Broker failed: URI::InvalidURIError: bad URI(is not URI?): nats:// lvsprdmco01.gspt.net:4222
D, [2017-08-18T18:48:13.396293 #57065] DEBUG -- : base.rb:231:in `rescue in block in start' /opt/puppetlabs/puppet/lib/ruby/2.1.0/uri/common.rb:176:in `split'
	/opt/puppetlabs/puppet/lib/ruby/2.1.0/uri/common.rb:211:in `parse'
	/opt/puppetlabs/puppet/lib/ruby/2.1.0/uri/common.rb:747:in `parse'
	/opt/puppetlabs/puppet/lib/ruby/2.1.0/uri/common.rb:1232:in `URI'
	/opt/puppetlabs/mcollective/plugins/mcollective/util/federation_broker/collective_processor.rb:10:in `block in servers'
	/opt/puppetlabs/mcollective/plugins/mcollective/util/federation_broker/collective_processor.rb:9:in `map'
	/opt/puppetlabs/mcollective/plugins/mcollective/util/federation_broker/collective_processor.rb:9:in `servers'
	/opt/puppetlabs/mcollective/plugins/mcollective/util/federation_broker/base.rb:201:in `start_connection_and_handlers'
	/opt/puppetlabs/mcollective/plugins/mcollective/util/federation_broker/base.rb:228:in `block in start'

How are these read into the configuration? I have been trying to dig into the code and cannot find it.
I think I see here where a simple .lstrip would suffice. Can you verify? The relivant code seems to be here:
https://github.com/choria-io/mcollective-choria/blob/b89265734eff234c789d40829377727ab18e63a5/lib/mcollective/util/federation_broker/collective_processor.rb#L8-L12
and
https://github.com/choria-io/mcollective-choria/blob/b89265734eff234c789d40829377727ab18e63a5/lib/mcollective/util/federation_broker/federation_processor.rb#L8-L16

@dforste
Copy link
Author

dforste commented Aug 18, 2017

@ripienaar
Copy link
Collaborator

@dforste yes, I think putting that in server_resolver will work, let me do that

@ripienaar ripienaar reopened this Aug 19, 2017
ripienaar added a commit to ripienaar/mcollective-choria that referenced this issue Aug 19, 2017
When hosts can be listed as a comma separated list of nodes like in
server lists these had to have no spaces between the commas or trailing
spaces or it would fail

This strips any trailing and leading spaces from the hosts and ports
ripienaar added a commit to ripienaar/mcollective-choria that referenced this issue Aug 19, 2017
When hosts can be listed as a comma separated list of nodes like in
server lists these had to have no spaces between the commas or trailing
spaces or it would fail

This strips any trailing and leading spaces from the hosts and ports
ripienaar added a commit to ripienaar/mcollective-choria that referenced this issue Aug 19, 2017
When hosts can be listed as a comma separated list of nodes like in
server lists these had to have no spaces between the commas or trailing
spaces or it would fail

This strips any trailing and leading spaces from the hosts and ports
ripienaar added a commit that referenced this issue Aug 19, 2017
(#290) Strip spaces from host/port lists
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants