New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After long GCs, getting NodePingThread - Did not find meta info of this node. Re-registering. #2404

Closed
gitfrederic opened this Issue Jun 23, 2016 · 6 comments

Comments

Projects
None yet
4 participants
@gitfrederic
Contributor

gitfrederic commented Jun 23, 2016

We're noticing that frequently when Graylog Server has a GC event (as reported in the logs by 'org.elasticsearch.monitor.jvm'), it is followed by a "org.graylog2.periodical.NodePingThread - Did not find meta info of this node. Re-registering." warning. For example:

00:53:35.097 [elasticsearch[graylog-a0a3fb6e-ac36-48ad-bc10-e901d185228d][scheduler][T#1]] WARN  org.elasticsearch.monitor.jvm - [graylog-a0a3fb6e-ac36-48ad-bc10-e901d185228d] [gc][young][175582][14187] duration [2.1s], collections [1]/[2.9s], total [2.1s]/[51.6m], memory [10.4gb]->[8.1gb]/[11.7gb], all_pools {[young] [2.3gb]->[17.7mb]/[2.4gb]}{[survivor] [5.4mb]->[5.2mb]/[307.1mb]}{[old] [8gb]->[8gb]/[9gb]}

00:53:35.108 [scheduled-daemon-25] WARN  org.graylog2.periodical.NodePingThread - Did not find meta info of this node. Re-registering.

Aside from trying to minimize GC durations, is there a way to increase the timeouts when Graylog Server queries MongoDb (or does Graylog rely on the MongoDB URI parameters for such things)?

(Contiguous) Graylog Server log showing this occurring:

00:26:56.345 [inputbufferprocessor-2] INFO  kafka.log.Log - Rolled new log segment for 'messagejournal-0' in 2 ms.
00:28:10.946 [elasticsearch[graylog-a0a3fb6e-ac36-48ad-bc10-e901d185228d][scheduler][T#1]] WARN  org.elasticsearch.monitor.jvm - [graylog-a0a3fb6e-ac36-48ad-bc10-e901d185228d] [gc][young][174069][14180] duration [1.8s], collections [1]/[2.1s], total [1.8s]/[51.4m], memory [10.4gb]->[8.1gb]/[11.7gb], all_pools {[young] [2.3gb]->[12.7mb]/[2.4gb]}{[survivor] [44.3mb]->[40mb]/[307.1mb]}{[old] [8gb]->[8gb]/[9gb]}
00:29:16.280 [inputbufferprocessor-2] INFO  kafka.log.Log - Rolled new log segment for 'messagejournal-0' in 2 ms.
00:31:34.100 [inputbufferprocessor-2] INFO  kafka.log.Log - Rolled new log segment for 'messagejournal-0' in 2 ms.
00:31:57.789 [elasticsearch[graylog-a0a3fb6e-ac36-48ad-bc10-e901d185228d][scheduler][T#1]] WARN  org.elasticsearch.monitor.jvm - [graylog-a0a3fb6e-ac36-48ad-bc10-e901d185228d] [gc][young][174294][14181] duration [2.1s], collections [1]/[2.8s], total [2.1s]/[51.4m], memory [10.5gb]->[8.1gb]/[11.7gb], all_pools {[young] [2.3gb]->[2.6mb]/[2.4gb]}{[survivor] [40mb]->[35.3mb]/[307.1mb]}{[old] [8gb]->[8gb]/[9gb]}
00:33:47.866 [inputbufferprocessor-2] INFO  kafka.log.Log - Rolled new log segment for 'messagejournal-0' in 2 ms.
00:35:22.462 [elasticsearch[graylog-a0a3fb6e-ac36-48ad-bc10-e901d185228d][scheduler][T#1]] WARN  org.elasticsearch.monitor.jvm - [graylog-a0a3fb6e-ac36-48ad-bc10-e901d185228d] [gc][young][174497][14182] duration [1.9s], collections [1]/[2.6s], total [1.9s]/[51.5m], memory [10.4gb]->[8.1gb]/[11.7gb], all_pools {[young] [2.3gb]->[10.2mb]/[2.4gb]}{[survivor] [35.3mb]->[28mb]/[307.1mb]}{[old] [8gb]->[8gb]/[9gb]}
00:35:22.467 [scheduled-daemon-26] WARN  org.graylog2.periodical.NodePingThread - Did not find meta info of this node. Re-registering.
00:35:22.474 [rest-worker-11] WARN  org.graylog2.shared.rest.resources.ProxiedResource - Node <a0a3fb6e-ac36-48ad-bc10-e901d185228d> not found while trying to call org.graylog2.shared.rest.resources.system.RemoteMetricsResource on it.
00:35:59.633 [inputbufferprocessor-2] INFO  kafka.log.Log - Rolled new log segment for 'messagejournal-0' in 2 ms.
00:38:14.099 [inputbufferprocessor-2] INFO  kafka.log.Log - Rolled new log segment for 'messagejournal-0' in 2 ms.
00:39:08.664 [elasticsearch[graylog-a0a3fb6e-ac36-48ad-bc10-e901d185228d][scheduler][T#1]] WARN  org.elasticsearch.monitor.jvm - [graylog-a0a3fb6e-ac36-48ad-bc10-e901d185228d] [gc][young][174722][14183] duration [2s], collections [1]/[2.1s], total [2s]/[51.5m], memory [10.5gb]->[8.1gb]/[11.7gb], all_pools {[young] [2.3gb]->[10mb]/[2.4gb]}{[survivor] [28mb]->[14.3mb]/[307.1mb]}{[old] [8gb]->[8gb]/[9gb]}
00:39:08.671 [scheduled-daemon-4] WARN  org.graylog2.periodical.NodePingThread - Did not find meta info of this node. Re-registering.
00:40:28.676 [inputbufferprocessor-2] INFO  kafka.log.Log - Rolled new log segment for 'messagejournal-0' in 2 ms.
00:42:41.242 [elasticsearch[graylog-a0a3fb6e-ac36-48ad-bc10-e901d185228d][scheduler][T#1]] WARN  org.elasticsearch.monitor.jvm - [graylog-a0a3fb6e-ac36-48ad-bc10-e901d185228d] [gc][young][174933][14184] duration [1.9s], collections [1]/[2.5s], total [1.9s]/[51.5m], memory [10.4gb]->[8.1gb]/[11.7gb], all_pools {[young] [2.3gb]->[6.1mb]/[2.4gb]}{[survivor] [14.3mb]->[11mb]/[307.1mb]}{[old] [8gb]->[8gb]/[9gb]}
00:42:41.250 [scheduled-daemon-20] WARN  org.graylog2.periodical.NodePingThread - Did not find meta info of this node. Re-registering.
00:42:44.588 [inputbufferprocessor-2] INFO  kafka.log.Log - Rolled new log segment for 'messagejournal-0' in 3 ms.
00:45:02.086 [inputbufferprocessor-2] INFO  kafka.log.Log - Rolled new log segment for 'messagejournal-0' in 2 ms.
00:46:24.988 [elasticsearch[graylog-a0a3fb6e-ac36-48ad-bc10-e901d185228d][scheduler][T#1]] WARN  org.elasticsearch.monitor.jvm - [graylog-a0a3fb6e-ac36-48ad-bc10-e901d185228d] [gc][young][175155][14185] duration [2.1s], collections [1]/[2.6s], total [2.1s]/[51.6m], memory [10.4gb]->[8.1gb]/[11.7gb], all_pools {[young] [2.3gb]->[27.1mb]/[2.4gb]}{[survivor] [11mb]->[5.7mb]/[307.1mb]}{[old] [8gb]->[8gb]/[9gb]}
00:47:13.684 [inputbufferprocessor-2] INFO  kafka.log.Log - Rolled new log segment for 'messagejournal-0' in 3 ms.
00:49:22.110 [inputbufferprocessor-2] INFO  kafka.log.Log - Rolled new log segment for 'messagejournal-0' in 6 ms.
00:49:52.068 [elasticsearch[graylog-a0a3fb6e-ac36-48ad-bc10-e901d185228d][scheduler][T#1]] WARN  org.elasticsearch.monitor.jvm - [graylog-a0a3fb6e-ac36-48ad-bc10-e901d185228d] [gc][young][175361][14186] duration [1.9s], collections [1]/[2s], total [1.9s]/[51.6m], memory [10.4gb]->[8.1gb]/[11.7gb], all_pools {[young] [2.3gb]->[21.5mb]/[2.4gb]}{[survivor] [5.7mb]->[5.4mb]/[307.1mb]}{[old] [8gb]->[8gb]/[9gb]}
00:49:52.075 [scheduled-daemon-1] WARN  org.graylog2.periodical.NodePingThread - Did not find meta info of this node. Re-registering.
00:51:33.840 [inputbufferprocessor-2] INFO  kafka.log.Log - Rolled new log segment for 'messagejournal-0' in 2 ms.
00:53:35.097 [elasticsearch[graylog-a0a3fb6e-ac36-48ad-bc10-e901d185228d][scheduler][T#1]] WARN  org.elasticsearch.monitor.jvm - [graylog-a0a3fb6e-ac36-48ad-bc10-e901d185228d] [gc][young][175582][14187] duration [2.1s], collections [1]/[2.9s], total [2.1s]/[51.6m], memory [10.4gb]->[8.1gb]/[11.7gb], all_pools {[young] [2.3gb]->[17.7mb]/[2.4gb]}{[survivor] [5.4mb]->[5.2mb]/[307.1mb]}{[old] [8gb]->[8gb]/[9gb]}
00:53:35.108 [scheduled-daemon-25] WARN  org.graylog2.periodical.NodePingThread - Did not find meta info of this node. Re-registering.
00:53:48.543 [inputbufferprocessor-2] INFO  kafka.log.Log - Rolled new log segment for 'messagejournal-0' in 1 ms.

Version info, etc.:

23:55:42.304 [main] INFO  org.elasticsearch.node - [graylog-a0a3fb6e-ac36-48ad-bc10-e901d185228d] version[2.3.2], pid[28391], build[b9e4a6a/2016-04-21T16:03:47Z]
...
23:55:47.575 [main] INFO  org.graylog2.bootstrap.ServerBootstrap - Graylog server 2.0.2 (4da1379) starting up
23:55:47.576 [main] INFO  org.graylog2.bootstrap.ServerBootstrap - JRE: Oracle Corporation 1.8.0_72 on Linux 3.10.0-123.el7.x86_64
23:55:47.576 [main] INFO  org.graylog2.bootstrap.ServerBootstrap - Deployment: unknown
23:55:47.576 [main] INFO  org.graylog2.bootstrap.ServerBootstrap - OS: CentOS Linux 7 (Core) (centos)
23:55:47.576 [main] INFO  org.graylog2.bootstrap.ServerBootstrap - Arch: amd64
@gitfrederic

This comment has been minimized.

Show comment
Hide comment
@gitfrederic

gitfrederic Jun 23, 2016

Contributor

To help troubleshoot, I'm also including the relevant MongoDB-related init/startup info (from Graylog Server startup):

03:01:39.402 [main] INFO  org.mongodb.driver.cluster - Cluster created with settings {hosts=[<IP1.IP1.IP1.IP1>:27017, <IP2.IP2.IP2.IP2>:27017, <IP3.IP3.IP3.IP3>:27017, <IP4.IP4.IP4.IP4>:27017, <IP5.IP5.IP5.IP5>:27017, <IP6.IP6.IP6.IP6>:27017, <IP7.IP7.IP7.IP7>:27017], mode=MULTIPLE, requiredClusterType=REPLICA_SET, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500, requiredReplicaSetName='gl-mdb-replset'}
03:01:39.402 [main] INFO  org.mongodb.driver.cluster - Adding discovered server <IP1.IP1.IP1.IP1>:27017 to client view of cluster
03:01:39.424 [main] INFO  org.mongodb.driver.cluster - Adding discovered server <IP2.IP2.IP2.IP2>:27017 to client view of cluster
03:01:39.426 [main] INFO  org.mongodb.driver.cluster - Adding discovered server <IP3.IP3.IP3.IP3>:27017 to client view of cluster
03:01:39.428 [main] INFO  org.mongodb.driver.cluster - Adding discovered server <IP4.IP4.IP4.IP4>:27017 to client view of cluster
03:01:39.429 [main] INFO  org.mongodb.driver.cluster - Adding discovered server <IP5.IP5.IP5.IP5>:27017 to client view of cluster
03:01:39.431 [main] INFO  org.mongodb.driver.cluster - Adding discovered server <IP6.IP6.IP6.IP6>:27017 to client view of cluster
03:01:39.432 [main] INFO  org.mongodb.driver.cluster - Adding discovered server <IP7.IP7.IP7.IP7>:27017 to client view of cluster
03:01:39.452 [main] INFO  org.mongodb.driver.cluster - No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE, all=[ServerDescription{address=<IP1.IP1.IP1.IP1>:27017, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=<IP2.IP2.IP2.IP2>:27017, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=<IP3.IP3.IP3.IP3>:27017, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=<IP4.IP4.IP4.IP4>:27017, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=<IP5.IP5.IP5.IP5>:27017, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=<IP6.IP6.IP6.IP6>:27017, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=<IP7.IP7.IP7.IP7>:27017, type=UNKNOWN, state=CONNECTING}]}. Waiting for 30000 ms before timing out
03:01:39.495 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP4.IP4.IP4.IP4>:27017] INFO  org.mongodb.driver.connection - Opened connection [connectionId{localValue:4, serverValue:861}] to <IP4.IP4.IP4.IP4>:27017
03:01:39.496 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP1.IP1.IP1.IP1>:27017] INFO  org.mongodb.driver.connection - Opened connection [connectionId{localValue:1, serverValue:1025}] to <IP1.IP1.IP1.IP1>:27017
03:01:39.498 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP6.IP6.IP6.IP6>:27017] INFO  org.mongodb.driver.connection - Opened connection [connectionId{localValue:6, serverValue:9404}] to <IP6.IP6.IP6.IP6>:27017
03:01:39.498 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP2.IP2.IP2.IP2>:27017] INFO  org.mongodb.driver.connection - Opened connection [connectionId{localValue:5, serverValue:903}] to <IP2.IP2.IP2.IP2>:27017
03:01:39.498 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP4.IP4.IP4.IP4>:27017] INFO  org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=<IP4.IP4.IP4.IP4>:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 4]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=1542962, setName='gl-mdb-replset', canonicalAddress=<IP4.IP4.IP4.IP4>:27017, hosts=[<IP2.IP2.IP2.IP2>:27017, <IP3.IP3.IP3.IP3>:27017, <IP1.IP1.IP1.IP1>:27017, <IP6.IP6.IP6.IP6>:27017, <IP7.IP7.IP7.IP7>:27017, <IP5.IP5.IP5.IP5>:27017, <IP4.IP4.IP4.IP4>:27017], passives=[], arbiters=[], primary='<IP7.IP7.IP7.IP7>:27017', tagSet=TagSet{[]}, electionId=null, setVersion=1}
03:01:39.499 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP3.IP3.IP3.IP3>:27017] INFO  org.mongodb.driver.connection - Opened connection [connectionId{localValue:3, serverValue:877}] to <IP3.IP3.IP3.IP3>:27017
03:01:39.500 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP5.IP5.IP5.IP5>:27017] INFO  org.mongodb.driver.connection - Opened connection [connectionId{localValue:2, serverValue:909}] to <IP5.IP5.IP5.IP5>:27017
03:01:39.500 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP1.IP1.IP1.IP1>:27017] INFO  org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=<IP1.IP1.IP1.IP1>:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 4]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=4246770, setName='gl-mdb-replset', canonicalAddress=<IP1.IP1.IP1.IP1>:27017, hosts=[<IP2.IP2.IP2.IP2>:27017, <IP3.IP3.IP3.IP3>:27017, <IP1.IP1.IP1.IP1>:27017, <IP6.IP6.IP6.IP6>:27017, <IP7.IP7.IP7.IP7>:27017, <IP5.IP5.IP5.IP5>:27017, <IP4.IP4.IP4.IP4>:27017], passives=[], arbiters=[], primary='<IP7.IP7.IP7.IP7>:27017', tagSet=TagSet{[]}, electionId=null, setVersion=1}
03:01:39.502 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP5.IP5.IP5.IP5>:27017] INFO  org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=<IP5.IP5.IP5.IP5>:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 4]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=1385909, setName='gl-mdb-replset', canonicalAddress=<IP5.IP5.IP5.IP5>:27017, hosts=[<IP2.IP2.IP2.IP2>:27017, <IP3.IP3.IP3.IP3>:27017, <IP1.IP1.IP1.IP1>:27017, <IP6.IP6.IP6.IP6>:27017, <IP7.IP7.IP7.IP7>:27017, <IP5.IP5.IP5.IP5>:27017, <IP4.IP4.IP4.IP4>:27017], passives=[], arbiters=[], primary='<IP7.IP7.IP7.IP7>:27017', tagSet=TagSet{[]}, electionId=null, setVersion=1}
03:01:39.502 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP6.IP6.IP6.IP6>:27017] INFO  org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=<IP6.IP6.IP6.IP6>:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 4]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=3924441, setName='gl-mdb-replset', canonicalAddress=<IP6.IP6.IP6.IP6>:27017, hosts=[<IP2.IP2.IP2.IP2>:27017, <IP3.IP3.IP3.IP3>:27017, <IP1.IP1.IP1.IP1>:27017, <IP6.IP6.IP6.IP6>:27017, <IP7.IP7.IP7.IP7>:27017, <IP5.IP5.IP5.IP5>:27017, <IP4.IP4.IP4.IP4>:27017], passives=[], arbiters=[], primary='<IP7.IP7.IP7.IP7>:27017', tagSet=TagSet{[]}, electionId=null, setVersion=1}
03:01:39.503 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP3.IP3.IP3.IP3>:27017] INFO  org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=<IP3.IP3.IP3.IP3>:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 4]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=3875877, setName='gl-mdb-replset', canonicalAddress=<IP3.IP3.IP3.IP3>:27017, hosts=[<IP2.IP2.IP2.IP2>:27017, <IP3.IP3.IP3.IP3>:27017, <IP1.IP1.IP1.IP1>:27017, <IP6.IP6.IP6.IP6>:27017, <IP7.IP7.IP7.IP7>:27017, <IP5.IP5.IP5.IP5>:27017, <IP4.IP4.IP4.IP4>:27017], passives=[], arbiters=[], primary='<IP7.IP7.IP7.IP7>:27017', tagSet=TagSet{[]}, electionId=null, setVersion=1}
03:01:39.503 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP2.IP2.IP2.IP2>:27017] INFO  org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=<IP2.IP2.IP2.IP2>:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 4]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=4948377, setName='gl-mdb-replset', canonicalAddress=<IP2.IP2.IP2.IP2>:27017, hosts=[<IP2.IP2.IP2.IP2>:27017, <IP3.IP3.IP3.IP3>:27017, <IP1.IP1.IP1.IP1>:27017, <IP6.IP6.IP6.IP6>:27017, <IP7.IP7.IP7.IP7>:27017, <IP5.IP5.IP5.IP5>:27017, <IP4.IP4.IP4.IP4>:27017], passives=[], arbiters=[], primary='<IP7.IP7.IP7.IP7>:27017', tagSet=TagSet{[]}, electionId=null, setVersion=1}
03:01:39.505 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP7.IP7.IP7.IP7>:27017] INFO  org.mongodb.driver.connection - Opened connection [connectionId{localValue:7, serverValue:1925}] to <IP7.IP7.IP7.IP7>:27017
03:01:39.510 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP7.IP7.IP7.IP7>:27017] INFO  org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=<IP7.IP7.IP7.IP7>:27017, type=REPLICA_SET_PRIMARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 4]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=3725427, setName='gl-mdb-replset', canonicalAddress=<IP7.IP7.IP7.IP7>:27017, hosts=[<IP2.IP2.IP2.IP2>:27017, <IP3.IP3.IP3.IP3>:27017, <IP1.IP1.IP1.IP1>:27017, <IP6.IP6.IP6.IP6>:27017, <IP7.IP7.IP7.IP7>:27017, <IP5.IP5.IP5.IP5>:27017, <IP4.IP4.IP4.IP4>:27017], passives=[], arbiters=[], primary='<IP7.IP7.IP7.IP7>:27017', tagSet=TagSet{[]}, electionId=7fffffff0000000000000003, setVersion=1}
03:01:39.510 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP7.IP7.IP7.IP7>:27017] INFO  org.mongodb.driver.cluster - Setting max election id to 7fffffff0000000000000003 from replica set primary <IP7.IP7.IP7.IP7>:27017
03:01:39.510 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP7.IP7.IP7.IP7>:27017] INFO  org.mongodb.driver.cluster - Setting max set version to 1 from replica set primary <IP7.IP7.IP7.IP7>:27017
03:01:39.511 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP7.IP7.IP7.IP7>:27017] INFO  org.mongodb.driver.cluster - Discovered replica set primary <IP7.IP7.IP7.IP7>:27017
03:01:39.527 [main] INFO  org.mongodb.driver.connection - Opened connection [connectionId{localValue:8, serverValue:1926}] to <IP7.IP7.IP7.IP7>:27017
03:01:39.790 [main] INFO  org.graylog2.plugin.system.NodeId - Node ID: a0a3fb6e-ac36-48ad-bc10-e901d185228d

I've replaced the actual IP addresses with something like: <IP7.IP7.IP7.IP7> .

Contributor

gitfrederic commented Jun 23, 2016

To help troubleshoot, I'm also including the relevant MongoDB-related init/startup info (from Graylog Server startup):

03:01:39.402 [main] INFO  org.mongodb.driver.cluster - Cluster created with settings {hosts=[<IP1.IP1.IP1.IP1>:27017, <IP2.IP2.IP2.IP2>:27017, <IP3.IP3.IP3.IP3>:27017, <IP4.IP4.IP4.IP4>:27017, <IP5.IP5.IP5.IP5>:27017, <IP6.IP6.IP6.IP6>:27017, <IP7.IP7.IP7.IP7>:27017], mode=MULTIPLE, requiredClusterType=REPLICA_SET, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500, requiredReplicaSetName='gl-mdb-replset'}
03:01:39.402 [main] INFO  org.mongodb.driver.cluster - Adding discovered server <IP1.IP1.IP1.IP1>:27017 to client view of cluster
03:01:39.424 [main] INFO  org.mongodb.driver.cluster - Adding discovered server <IP2.IP2.IP2.IP2>:27017 to client view of cluster
03:01:39.426 [main] INFO  org.mongodb.driver.cluster - Adding discovered server <IP3.IP3.IP3.IP3>:27017 to client view of cluster
03:01:39.428 [main] INFO  org.mongodb.driver.cluster - Adding discovered server <IP4.IP4.IP4.IP4>:27017 to client view of cluster
03:01:39.429 [main] INFO  org.mongodb.driver.cluster - Adding discovered server <IP5.IP5.IP5.IP5>:27017 to client view of cluster
03:01:39.431 [main] INFO  org.mongodb.driver.cluster - Adding discovered server <IP6.IP6.IP6.IP6>:27017 to client view of cluster
03:01:39.432 [main] INFO  org.mongodb.driver.cluster - Adding discovered server <IP7.IP7.IP7.IP7>:27017 to client view of cluster
03:01:39.452 [main] INFO  org.mongodb.driver.cluster - No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE, all=[ServerDescription{address=<IP1.IP1.IP1.IP1>:27017, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=<IP2.IP2.IP2.IP2>:27017, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=<IP3.IP3.IP3.IP3>:27017, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=<IP4.IP4.IP4.IP4>:27017, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=<IP5.IP5.IP5.IP5>:27017, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=<IP6.IP6.IP6.IP6>:27017, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=<IP7.IP7.IP7.IP7>:27017, type=UNKNOWN, state=CONNECTING}]}. Waiting for 30000 ms before timing out
03:01:39.495 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP4.IP4.IP4.IP4>:27017] INFO  org.mongodb.driver.connection - Opened connection [connectionId{localValue:4, serverValue:861}] to <IP4.IP4.IP4.IP4>:27017
03:01:39.496 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP1.IP1.IP1.IP1>:27017] INFO  org.mongodb.driver.connection - Opened connection [connectionId{localValue:1, serverValue:1025}] to <IP1.IP1.IP1.IP1>:27017
03:01:39.498 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP6.IP6.IP6.IP6>:27017] INFO  org.mongodb.driver.connection - Opened connection [connectionId{localValue:6, serverValue:9404}] to <IP6.IP6.IP6.IP6>:27017
03:01:39.498 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP2.IP2.IP2.IP2>:27017] INFO  org.mongodb.driver.connection - Opened connection [connectionId{localValue:5, serverValue:903}] to <IP2.IP2.IP2.IP2>:27017
03:01:39.498 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP4.IP4.IP4.IP4>:27017] INFO  org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=<IP4.IP4.IP4.IP4>:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 4]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=1542962, setName='gl-mdb-replset', canonicalAddress=<IP4.IP4.IP4.IP4>:27017, hosts=[<IP2.IP2.IP2.IP2>:27017, <IP3.IP3.IP3.IP3>:27017, <IP1.IP1.IP1.IP1>:27017, <IP6.IP6.IP6.IP6>:27017, <IP7.IP7.IP7.IP7>:27017, <IP5.IP5.IP5.IP5>:27017, <IP4.IP4.IP4.IP4>:27017], passives=[], arbiters=[], primary='<IP7.IP7.IP7.IP7>:27017', tagSet=TagSet{[]}, electionId=null, setVersion=1}
03:01:39.499 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP3.IP3.IP3.IP3>:27017] INFO  org.mongodb.driver.connection - Opened connection [connectionId{localValue:3, serverValue:877}] to <IP3.IP3.IP3.IP3>:27017
03:01:39.500 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP5.IP5.IP5.IP5>:27017] INFO  org.mongodb.driver.connection - Opened connection [connectionId{localValue:2, serverValue:909}] to <IP5.IP5.IP5.IP5>:27017
03:01:39.500 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP1.IP1.IP1.IP1>:27017] INFO  org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=<IP1.IP1.IP1.IP1>:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 4]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=4246770, setName='gl-mdb-replset', canonicalAddress=<IP1.IP1.IP1.IP1>:27017, hosts=[<IP2.IP2.IP2.IP2>:27017, <IP3.IP3.IP3.IP3>:27017, <IP1.IP1.IP1.IP1>:27017, <IP6.IP6.IP6.IP6>:27017, <IP7.IP7.IP7.IP7>:27017, <IP5.IP5.IP5.IP5>:27017, <IP4.IP4.IP4.IP4>:27017], passives=[], arbiters=[], primary='<IP7.IP7.IP7.IP7>:27017', tagSet=TagSet{[]}, electionId=null, setVersion=1}
03:01:39.502 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP5.IP5.IP5.IP5>:27017] INFO  org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=<IP5.IP5.IP5.IP5>:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 4]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=1385909, setName='gl-mdb-replset', canonicalAddress=<IP5.IP5.IP5.IP5>:27017, hosts=[<IP2.IP2.IP2.IP2>:27017, <IP3.IP3.IP3.IP3>:27017, <IP1.IP1.IP1.IP1>:27017, <IP6.IP6.IP6.IP6>:27017, <IP7.IP7.IP7.IP7>:27017, <IP5.IP5.IP5.IP5>:27017, <IP4.IP4.IP4.IP4>:27017], passives=[], arbiters=[], primary='<IP7.IP7.IP7.IP7>:27017', tagSet=TagSet{[]}, electionId=null, setVersion=1}
03:01:39.502 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP6.IP6.IP6.IP6>:27017] INFO  org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=<IP6.IP6.IP6.IP6>:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 4]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=3924441, setName='gl-mdb-replset', canonicalAddress=<IP6.IP6.IP6.IP6>:27017, hosts=[<IP2.IP2.IP2.IP2>:27017, <IP3.IP3.IP3.IP3>:27017, <IP1.IP1.IP1.IP1>:27017, <IP6.IP6.IP6.IP6>:27017, <IP7.IP7.IP7.IP7>:27017, <IP5.IP5.IP5.IP5>:27017, <IP4.IP4.IP4.IP4>:27017], passives=[], arbiters=[], primary='<IP7.IP7.IP7.IP7>:27017', tagSet=TagSet{[]}, electionId=null, setVersion=1}
03:01:39.503 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP3.IP3.IP3.IP3>:27017] INFO  org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=<IP3.IP3.IP3.IP3>:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 4]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=3875877, setName='gl-mdb-replset', canonicalAddress=<IP3.IP3.IP3.IP3>:27017, hosts=[<IP2.IP2.IP2.IP2>:27017, <IP3.IP3.IP3.IP3>:27017, <IP1.IP1.IP1.IP1>:27017, <IP6.IP6.IP6.IP6>:27017, <IP7.IP7.IP7.IP7>:27017, <IP5.IP5.IP5.IP5>:27017, <IP4.IP4.IP4.IP4>:27017], passives=[], arbiters=[], primary='<IP7.IP7.IP7.IP7>:27017', tagSet=TagSet{[]}, electionId=null, setVersion=1}
03:01:39.503 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP2.IP2.IP2.IP2>:27017] INFO  org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=<IP2.IP2.IP2.IP2>:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 4]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=4948377, setName='gl-mdb-replset', canonicalAddress=<IP2.IP2.IP2.IP2>:27017, hosts=[<IP2.IP2.IP2.IP2>:27017, <IP3.IP3.IP3.IP3>:27017, <IP1.IP1.IP1.IP1>:27017, <IP6.IP6.IP6.IP6>:27017, <IP7.IP7.IP7.IP7>:27017, <IP5.IP5.IP5.IP5>:27017, <IP4.IP4.IP4.IP4>:27017], passives=[], arbiters=[], primary='<IP7.IP7.IP7.IP7>:27017', tagSet=TagSet{[]}, electionId=null, setVersion=1}
03:01:39.505 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP7.IP7.IP7.IP7>:27017] INFO  org.mongodb.driver.connection - Opened connection [connectionId{localValue:7, serverValue:1925}] to <IP7.IP7.IP7.IP7>:27017
03:01:39.510 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP7.IP7.IP7.IP7>:27017] INFO  org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=<IP7.IP7.IP7.IP7>:27017, type=REPLICA_SET_PRIMARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 4]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=3725427, setName='gl-mdb-replset', canonicalAddress=<IP7.IP7.IP7.IP7>:27017, hosts=[<IP2.IP2.IP2.IP2>:27017, <IP3.IP3.IP3.IP3>:27017, <IP1.IP1.IP1.IP1>:27017, <IP6.IP6.IP6.IP6>:27017, <IP7.IP7.IP7.IP7>:27017, <IP5.IP5.IP5.IP5>:27017, <IP4.IP4.IP4.IP4>:27017], passives=[], arbiters=[], primary='<IP7.IP7.IP7.IP7>:27017', tagSet=TagSet{[]}, electionId=7fffffff0000000000000003, setVersion=1}
03:01:39.510 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP7.IP7.IP7.IP7>:27017] INFO  org.mongodb.driver.cluster - Setting max election id to 7fffffff0000000000000003 from replica set primary <IP7.IP7.IP7.IP7>:27017
03:01:39.510 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP7.IP7.IP7.IP7>:27017] INFO  org.mongodb.driver.cluster - Setting max set version to 1 from replica set primary <IP7.IP7.IP7.IP7>:27017
03:01:39.511 [cluster-ClusterId{value='576b5113cddc5a53f3756872', description='null'}-<IP7.IP7.IP7.IP7>:27017] INFO  org.mongodb.driver.cluster - Discovered replica set primary <IP7.IP7.IP7.IP7>:27017
03:01:39.527 [main] INFO  org.mongodb.driver.connection - Opened connection [connectionId{localValue:8, serverValue:1926}] to <IP7.IP7.IP7.IP7>:27017
03:01:39.790 [main] INFO  org.graylog2.plugin.system.NodeId - Node ID: a0a3fb6e-ac36-48ad-bc10-e901d185228d

I've replaced the actual IP addresses with something like: <IP7.IP7.IP7.IP7> .

@kroepke

This comment has been minimized.

Show comment
Hide comment
@kroepke

kroepke Jun 23, 2016

Member

This happens because some other node is dropping the entry in the nodes collection.
It can happen for two reasons:

  1. slow nodes (due to GC or simply being overwhelmed)
  2. time drift between nodes

There is a configuration setting stale_master_timeout to increase the timeout value after which graylog nodes are being considered outdated.
The default is 2 seconds, but you can increase this to avoid the message.
Other than removing it from the set of nodes considered alive there is nothing else that is being done.

We will consider reworking how the clustering works in a future version but it's not planned for any immediate version yet.

Member

kroepke commented Jun 23, 2016

This happens because some other node is dropping the entry in the nodes collection.
It can happen for two reasons:

  1. slow nodes (due to GC or simply being overwhelmed)
  2. time drift between nodes

There is a configuration setting stale_master_timeout to increase the timeout value after which graylog nodes are being considered outdated.
The default is 2 seconds, but you can increase this to avoid the message.
Other than removing it from the set of nodes considered alive there is nothing else that is being done.

We will consider reworking how the clustering works in a future version but it's not planned for any immediate version yet.

@gitfrederic

This comment has been minimized.

Show comment
Hide comment
@gitfrederic

gitfrederic Jun 23, 2016

Contributor

Thanks @kroepke -- we'll try that increasing stale_master_timeout to something well-above the "normal" expected GCs (e.g. 10 seconds or so).

Contributor

gitfrederic commented Jun 23, 2016

Thanks @kroepke -- we'll try that increasing stale_master_timeout to something well-above the "normal" expected GCs (e.g. 10 seconds or so).

@kroepke kroepke added the question label Jun 27, 2016

@kroepke

This comment has been minimized.

Show comment
Hide comment
@kroepke

kroepke Jun 27, 2016

Member

I'll close this since we found a workaround for the warnings.

Member

kroepke commented Jun 27, 2016

I'll close this since we found a workaround for the warnings.

@kroepke kroepke closed this Jun 27, 2016

@scampuza

This comment has been minimized.

Show comment
Hide comment
@scampuza

scampuza May 11, 2017

@kroepke I am having this very same issue with a similar installation of GrayLog.. could please tell me what did you do for the warnings ?

scampuza commented May 11, 2017

@kroepke I am having this very same issue with a similar installation of GrayLog.. could please tell me what did you do for the warnings ?

@joschi

This comment has been minimized.

Show comment
Hide comment
@joschi

joschi May 11, 2017

Contributor

@scampuza We are using GitHub issues for tracking bugs in Graylog itself, but this doesn't look like one. Please post this issue to our discussion forum or join the #graylog channel on freenode IRC.

Thank you!

Contributor

joschi commented May 11, 2017

@scampuza We are using GitHub issues for tracking bugs in Graylog itself, but this doesn't look like one. Please post this issue to our discussion forum or join the #graylog channel on freenode IRC.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment