Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Icinga2 Service on Win Client goes up to 100% if it tries to connect to master #5105

Closed
darmagan opened this issue Mar 30, 2017 · 11 comments
Closed
Labels
area/windows Windows agent and plugins

Comments

@darmagan
Copy link

Hi,

on a Win Agent, if configured over director(Agent powershell script) the load of the icinga2 service goes up to 100% always on the point where it opens a new client connection to the master.
There are no errors, just the high load.

If endpoint and zones config are commented out it runs normally. Also if the Agent is manually configured over the Wizard it works, too.

The problem is that it is not reliably reproducible. On some Win clients it works, on some not. The clients have the same policies and firewall config. So it is appearing randomly to me.

The clients are clear, too. No config, except the standard config shifted with the Agent. The only new config is the config synced by the director to /var/lib/icinga2/api/zones/director-global/director/ but they are ignored, cause the assign rules are not matching.

working icinga2.conf

/** Icinga 2 Config - proposed by Icinga Director */
include "constants.conf"
include <itl>
include <plugins>
include <nscp>
include <windows-plugins>
// include <plugins-contrib>
if (!globals.contains("NscpPath")) {
  NscpPath = dirname(msi_get_component_path("{5C45463A-4AE9-4325-96DB-6E239C034F93}"))
}
object FileLogger "main-log" {
  severity = "information"
  path = LocalStateDir + "/log/icinga2/icinga2.log"
}
// TODO: improve establish connection handling
#object Endpoint "DESKTOP-9PBK4P6" {}
#object Endpoint "box17.int.netways.de"{}
#object Endpoint "box18.int.netways.de"{}
#
#object Zone "master" {
# endpoints = [ "box17.int.netways.de", "box18.int.netways.de" ]
#}
#object Zone "director-global" { global = true }
#object Zone "DESKTOP-9PBK4P6" {
#  parent = "master"
#  endpoints = [ "DESKTOP-9PBK4P6" ]
#}
#object ApiListener "api" {
#  cert_path = SysconfDir + "/icinga2/pki/DESKTOP-9PBK4P6.crt"
#  key_path = SysconfDir + "/icinga2/pki/DESKTOP-9PBK4P6.key"
#  ca_path = SysconfDir + "/icinga2/pki/ca.crt"
#  accept_commands = true
#  accept_config = true
#}

Regards

@TheFlyingCorpse
Copy link
Contributor

Do you have some debug logs to show this behavior?

@darmagan
Copy link
Author

darmagan commented Apr 18, 2017

debug.log

C:\Program Files\ICINGA2\sbin>icinga2.exe daemon -x debug
[2017-04-18 12:39:11 W. Europe Daylight Time] information/cli: Icinga applicatio
n loader (version: v2.6.3-2-g9d26a9c)
[2017-04-18 12:39:11 W. Europe Daylight Time] information/cli: Loading configura
tion file(s).
[2017-04-18 12:39:11 W. Europe Daylight Time] notice/ConfigCompiler: Compiling c
onfig file: C:\ProgramData\icinga2\etc/icinga2/icinga2.conf
[2017-04-18 12:39:11 W. Europe Daylight Time] notice/ConfigCompiler: Compiling c
onfig file: C:\ProgramData\icinga2\etc\icinga2/constants.conf
[2017-04-18 12:39:11 W. Europe Daylight Time] notice/ConfigCompiler: Compiling c
onfig file: C:\Program Files\ICINGA2\\share\icinga2\include/itl
[2017-04-18 12:39:11 W. Europe Daylight Time] notice/ConfigCompiler: Compiling c
onfig file: C:\Program Files\ICINGA2\\share\icinga2\include/command-icinga.conf
[2017-04-18 12:39:11 W. Europe Daylight Time] notice/ConfigCompiler: Compiling c
onfig file: C:\Program Files\ICINGA2\\share\icinga2\include/plugins
[2017-04-18 12:39:11 W. Europe Daylight Time] notice/ConfigCompiler: Compiling c
onfig file: C:\Program Files\ICINGA2\\share\icinga2\include/command-plugins.conf

[2017-04-18 12:39:11 W. Europe Daylight Time] notice/ConfigCompiler: Compiling c
onfig file: C:\Program Files\ICINGA2\\share\icinga2\include/nscp
[2017-04-18 12:39:11 W. Europe Daylight Time] notice/ConfigCompiler: Compiling c
onfig file: C:\Program Files\ICINGA2\\share\icinga2\include/command-nscp-local.c
onf
[2017-04-18 12:39:11 W. Europe Daylight Time] notice/ConfigCompiler: Compiling c
onfig file: C:\Program Files\ICINGA2\\share\icinga2\include/windows-plugins
[2017-04-18 12:39:11 W. Europe Daylight Time] notice/ConfigCompiler: Compiling c
onfig file: C:\Program Files\ICINGA2\\share\icinga2\include/command-plugins-wind
ows.conf
[2017-04-18 12:39:11 W. Europe Daylight Time] notice/ConfigCompiler: Compiling c
onfig file: C:\ProgramData\icinga2\var\lib\icinga2\api\packages/_api/include.con
f
[2017-04-18 12:39:11 W. Europe Daylight Time] notice/ConfigCompiler: Compiling c
onfig file: C:\ProgramData\icinga2\var\lib\icinga2\api\packages\_api\testclient
-1492511472-1/include.conf
[2017-04-18 12:39:11 W. Europe Daylight Time] notice/ConfigCompiler: Compiling c
onfig file: C:\ProgramData\icinga2\var\lib\icinga2\api\packages\_api\testclient
-1492511472-1\../active.conf
[2017-04-18 12:39:11 W. Europe Daylight Time] notice/ConfigCompiler: Compiling c
onfig file: C:\ProgramData\icinga2\var\lib\icinga2\api\packages\_api\testclient
-1492511472-1/conf.d/downtimes/testclient!load!testclient-1492511472-0.conf
[2017-04-18 12:39:11 W. Europe Daylight Time] information/ConfigItem: Committing
config item(s).
[2017-04-18 12:39:11 W. Europe Daylight Time] notice/WorkQueue: Spawning WorkQue
ue threads for 'DaemonUtility::LoadConfigFiles'
[2017-04-18 12:39:11 W. Europe Daylight Time] information/ApiListener: My API id
entity: testclient
[2017-04-18 12:39:12 W. Europe Daylight Time] notice/ThreadPool: Pool #1: Pendin
g tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0%
[2017-04-18 12:39:12 W. Europe Daylight Time] notice/ConfigObject: Ignoring conf
ig object 'testclient-1492511472-0' of type 'Downtime' due to errors: Error: V
alidation failed for object 'testclient!load!testclient
-1492511472-0' of type 'Downtime'; Attribute 'host_name': Object 'testclient
.domain.com' of type 'Host' does not exist.
Location: in C:\ProgramData\icinga2\var\lib\icinga2\api\packages\_api\testclient
-1492511472-1/conf.d/downtimes/testclient!load!testclient-1492511472-0.conf: 1:0-1:58
C:\ProgramData\icinga2\var\lib\icinga2\api\packages\_api\testclient-1492511472
-1/conf.d/downtimes/testclient!load!testclient-149251
1472-0.conf(1): object Downtime "testclient-1492511472-0" ignore_on_error {


^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
C:\ProgramData\icinga2\var\lib\icinga2\api\packages\_api\testclient-1492511472
-1/conf.d/downtimes/testclient!load!testclient-149251
1472-0.conf(2): author = "icingaadmin"
C:\ProgramData\icinga2\var\lib\icinga2\api\packages\_api\testclient-1492511472
-1/conf.d/downtimes/testclient!load!testclient-149251
1472-0.conf(3): comment = "Scheduled downtime for backup"


[2017-04-18 12:39:12 W. Europe Daylight Time] information/ConfigItem: Instantiat
ed 1 ApiListener.
[2017-04-18 12:39:12 W. Europe Daylight Time] information/ConfigItem: Instantiat
ed 3 Zones.
[2017-04-18 12:39:12 W. Europe Daylight Time] information/ConfigItem: Instantiat
ed 2 Endpoints.
[2017-04-18 12:39:12 W. Europe Daylight Time] information/ConfigItem: Instantiat
ed 1 FileLogger.
[2017-04-18 12:39:12 W. Europe Daylight Time] information/ConfigItem: Instantiat
ed 86 CheckCommands.
[2017-04-18 12:39:12 W. Europe Daylight Time] information/ConfigItem: Instantiat
ed 1 IcingaApplication.
[2017-04-18 12:39:12 W. Europe Daylight Time] information/ScriptGlobal: Dumping
variables to file 'C:\ProgramData\icinga2\var/cache/icinga2/icinga2.vars'
[2017-04-18 12:39:12 W. Europe Daylight Time] notice/WorkQueue: Stopped WorkQueu
e threads for 'DaemonUtility::LoadConfigFiles'
[2017-04-18 12:39:12 W. Europe Daylight Time] information/ConfigItem: Triggering
Start signal for config items
[2017-04-18 12:39:12 W. Europe Daylight Time] notice/WorkQueue: Spawning WorkQue
ue threads for 'DaemonCommand::Run'
[2017-04-18 12:39:12 W. Europe Daylight Time] information/ApiListener: Adding ne
w listener on port '5665'
[2017-04-18 12:39:12 W. Europe Daylight Time] debug/ApiListener: Not connecting
to Endpoint 'testclient' because that's us.
[2017-04-18 12:39:12 W. Europe Daylight Time] information/ConfigItem: Activated
all objects.
[2017-04-18 12:39:12 W. Europe Daylight Time] debug/ApiListener: Not connecting
to Endpoint 'Icingaserver' because the host/port attributes are missing.
[2017-04-18 12:39:12 W. Europe Daylight Time] notice/ApiListener: Current zone m
aster: testclient
[2017-04-18 12:39:12 W. Europe Daylight Time] notice/ApiListener: Connected endp
oints:
[2017-04-18 12:39:12 W. Europe Daylight Time] notice/WorkQueue: Stopped WorkQueu
e threads for 'DaemonCommand::Run'
[2017-04-18 12:39:12 W. Europe Daylight Time] debug/IcingaApplication: In Icinga
Application::Main()
[2017-04-18 12:39:21 W. Europe Daylight Time] notice/WorkQueue: #4 (ApiListener,
RelayQueue) tasks: 0
[2017-04-18 12:39:21 W. Europe Daylight Time] notice/WorkQueue: #5 (ApiListener,
SyncQueue) tasks: 0
[2017-04-18 12:39:27 W. Europe Daylight Time] notice/ThreadPool: Pool #1: Pendin
g tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.0066491%
[2017-04-18 12:39:30 W. Europe Daylight Time] information/ApiListener: New clien
t connection for identity 'Icingaserver' fro

at this point the load of the Icinga service goes up to 100%

@micha-09
Copy link

micha-09 commented May 4, 2017

Any ideas?

@darmagan
Copy link
Author

darmagan commented Jun 2, 2017

Any news here?

@TheFlyingCorpse
Copy link
Contributor

Sorry, been a bit much todo both at work and in private.

I do not see anything amiss in your logs, except the downtime entry, however that looks like "just" noise.

A)
Please post the icinga2 master version, distribution and what package repo (if any).
Please also post the icinga2 node version, OS and what package repo (if any).

Why?
With that I should be able to rebuild your environment, as I haven't seen or been able to reproduce this.

B)
It would be interesting to see a matching debug from the master as the node is connecting, if possible.

Why?
It might be caused by the master.

A hint on running it via console on Windows, increase the console width, I usually have it around 150 or so, so there are fewer newlines. see more here

@cptnkoiz
Copy link

Same issue here on Random Windows Servers.

Master:
CentOS7
Icinga 2.7.0 Icinga RPM Repo
Icinga Director 2.3.1
Windows Server 2008 R2, Server 2012 R2
Agent Version 2.6.3 and 2.7.0

@TheFlyingCorpse
Copy link
Contributor

@cptnkoiz - Can you reproduce it reliably on those platforms? I cannot do anything with the information provided to reproduce the issue. My environment has 450+ windows servers, yet none exhibit this issue and havent since I started with this setup in July 2016.
Without any way to reproduce it reliably or logs from when it occurs, this is close to impossible to work with.

@cptnkoiz
Copy link

@TheFlyingCorpse - on specific hosts i can! i have currently about 400 Windows Hosts in this Setup and about 15 Hosts have this Issue. All Hosts are deployed in the same way.

Today i enabled the debuglog on one of those Hosts and it seems like there is an issue with the Configuration thats generated by the Powershell Script. I will take another look at it tomorrow and post the Debuglog of the Windows Host.

The behaviour is somewhat strange.

Deployment via Powershell Script -> Agent gets installed, config generated

After that the Hosts are split in two. One works as it should and after a time (i don't know how long) the icinga2.exe process gets 100% CPU (if Cores >=2 no Problem here) the Hosts switches to Unknown on all Checks.

The Second Group of hosts gets to 100% CPU right after the first start and stays there.

@hypermagicmountain
Copy link

hypermagicmountain commented Oct 17, 2017

We have about 13 Windows Server 2012 R2 machines, 3 of them have only one virtual CPU assigned, 1 machine has this Issue.

@mInrOz
Copy link

mInrOz commented May 18, 2018

I dont really have any experience with Icinga but im also seeing this issue on several Windows Hosts.
See below for the log output from one of the servers experiencing the problem.

Is there any other logs or information i can provide?

OS: Windows Server 2008 R2

[2018-05-17 10:06:50 +0200] information/WorkQueue: #10 (JsonRpcConnection, #2) items: 0, rate: 0.6/s (36/min 180/5min 536/15min);
[2018-05-17 10:07:48 +0200] information/WorkQueue: #8 (JsonRpcConnection, #0) items: 0, rate: 0.6/s (36/min 180/5min 536/15min);
[2018-05-17 10:08:36 +0200] information/WorkQueue: #11 (JsonRpcConnection, #3) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-05-17 10:08:45 +0200] information/ConfigObject: Dumping program state to file 'C:\ProgramData\icinga2\var/lib/icinga2/icinga2.state'
[2018-05-17 10:09:04 +0200] information/WorkQueue: #5 (ApiListener, SyncQueue) items: 0, rate: 0.0166667/s (1/min 2/5min 3/15min);
[2018-05-17 10:09:31 +0200] information/WorkQueue: #9 (JsonRpcConnection, #1) items: 0, rate: 0.6/s (36/min 158/5min 158/15min);
[2018-05-17 10:10:25 +0200] information/WorkQueue: #4 (ApiListener, RelayQueue) items: 0, rate: 0.7/s (42/min 186/5min 543/15min);
[2018-05-17 10:17:05 +0200] information/CheckerComponent: 'checker' started.
[2018-05-17 10:17:05 +0200] information/ApiListener: Adding new listener on port '5665'
[2018-05-17 10:17:05 +0200] critical/TcpSocket: Invalid socket: 10048, "Only one usage of each socket address (protocol/network address/port) is normally permitted."
Context:
	(0) Activating object 'api' of type 'ApiListener'

[2018-05-17 10:17:05 +0200] critical/ApiListener: Cannot bind TCP socket for host '::' on port '5665'.
Context:
	(0) Activating object 'api' of type 'ApiListener'

[2018-05-17 10:17:05 +0200] critical/ApiListener: Cannot add listener on host '::' for port '5665'.
Context:
	(0) Activating object 'api' of type 'ApiListener'

[2018-05-17 17:28:23 +0200] information/CheckerComponent: 'checker' started.
[2018-05-17 17:28:23 +0200] information/ApiListener: Adding new listener on port '5665'
[2018-05-17 17:28:23 +0200] critical/TcpSocket: Invalid socket: 10048, "Only one usage of each socket address (protocol/network address/port) is normally permitted."
Context:
	(0) Activating object 'api' of type 'ApiListener'

[2018-05-17 17:28:23 +0200] critical/ApiListener: Cannot bind TCP socket for host '::' on port '5665'.
Context:
	(0) Activating object 'api' of type 'ApiListener'

[2018-05-17 17:28:23 +0200] critical/ApiListener: Cannot add listener on host '::' for port '5665'.
Context:
	(0) Activating object 'api' of type 'ApiListener'

[2018-05-17 17:28:43 +0200] information/CheckerComponent: 'checker' started.
[2018-05-17 17:28:43 +0200] information/ApiListener: Adding new listener on port '5665'
[2018-05-17 17:28:43 +0200] critical/TcpSocket: Invalid socket: 10048, "Only one usage of each socket address (protocol/network address/port) is normally permitted."
Context:
	(0) Activating object 'api' of type 'ApiListener'

[2018-05-17 17:28:43 +0200] critical/ApiListener: Cannot bind TCP socket for host '::' on port '5665'.
Context:
	(0) Activating object 'api' of type 'ApiListener'

[2018-05-17 17:28:43 +0200] critical/ApiListener: Cannot add listener on host '::' for port '5665'.
Context:
	(0) Activating object 'api' of type 'ApiListener'

[2018-05-17 17:28:58 +0200] information/CheckerComponent: 'checker' started.
[2018-05-17 17:28:58 +0200] information/ApiListener: Adding new listener on port '5665'
[2018-05-17 17:28:58 +0200] critical/TcpSocket: Invalid socket: 10048, "Only one usage of each socket address (protocol/network address/port) is normally permitted."
Context:
	(0) Activating object 'api' of type 'ApiListener'

[2018-05-17 17:28:58 +0200] critical/ApiListener: Cannot bind TCP socket for host '::' on port '5665'.
Context:
	(0) Activating object 'api' of type 'ApiListener'

[2018-05-17 17:28:58 +0200] critical/ApiListener: Cannot add listener on host '::' for port '5665'.
Context:
	(0) Activating object 'api' of type 'ApiListener'

[2018-05-17 17:31:26 +0200] information/CheckerComponent: 'checker' started.
[2018-05-17 17:31:26 +0200] information/ApiListener: Adding new listener on port '5665'
[2018-05-17 17:31:26 +0200] critical/TcpSocket: Invalid socket: 10048, "Only one usage of each socket address (protocol/network address/port) is normally permitted."
Context:
	(0) Activating object 'api' of type 'ApiListener'

[2018-05-17 17:31:26 +0200] critical/ApiListener: Cannot bind TCP socket for host '::' on port '5665'.
Context:
	(0) Activating object 'api' of type 'ApiListener'

[2018-05-17 17:31:26 +0200] critical/ApiListener: Cannot add listener on host '::' for port '5665'.
Context:
	(0) Activating object 'api' of type 'ApiListener'

[2018-05-17 18:05:38 +0200] information/CheckerComponent: 'checker' started.
[2018-05-17 18:05:38 +0200] information/ApiListener: Adding new listener on port '5665'
[2018-05-17 18:05:38 +0200] critical/TcpSocket: Invalid socket: 10048, "Only one usage of each socket address (protocol/network address/port) is normally permitted."
Context:
	(0) Activating object 'api' of type 'ApiListener'

[2018-05-17 18:05:38 +0200] critical/ApiListener: Cannot bind TCP socket for host '::' on port '5665'.
Context:
	(0) Activating object 'api' of type 'ApiListener'

[2018-05-17 18:05:38 +0200] critical/ApiListener: Cannot add listener on host '::' for port '5665'.
Context:
	(0) Activating object 'api' of type 'ApiListener'

[2018-05-18 19:49:47 +0200] information/CheckerComponent: 'checker' started.
[2018-05-18 19:49:47 +0200] information/ApiListener: Adding new listener on port '5665'
[2018-05-18 19:49:47 +0200] information/ConfigItem: Committing config item(s).
[2018-05-18 19:49:47 +0200] information/ConfigItem: Activated all objects.
[2018-05-18 19:49:47 +0200] information/ApiListener: Reconnecting to endpoint 'XXX' via host 'XXX' and port '5665'
[2018-05-18 19:49:47 +0200] warning/ApplyRule: Apply rule 'satellite-host' (in C:\ProgramData\icinga2\etc\icinga2/conf.d/satellite.conf: 29:1-29:41) for type 'Dependency' does not match anywhere!
[2018-05-18 19:49:47 +0200] information/ConfigItem: Instantiated 1 Downtime.
[2018-05-18 19:49:47 +0200] information/ConfigItem: Triggering Start signal for config items
[2018-05-18 19:49:47 +0200] information/ConfigItem: Activated all objects.
[2018-05-18 19:49:47 +0200] information/ApiListener: New client connection for identity 'XXX' to [XXX]:5665
[2018-05-18 19:49:47 +0200] information/ApiListener: Finished reconnecting to endpoint 'XXX' via host 'XXX' and port '5665'
[2018-05-18 19:49:47 +0200] information/ApiListener: Requesting new certificate for this Icinga instance from endpoint 'XXX'.
[2018-05-18 19:49:47 +0200] information/ApiListener: Applying config update from endpoint 'XXX' of zone 'master'.
[2018-05-18 19:49:47 +0200] information/ApiListener: Sending config updates for endpoint 'XXX' in zone 'master'.
[2018-05-18 19:49:47 +0200] information/ApiListener: Finished sending config file updates for endpoint 'XXX' in zone 'master'.
[2018-05-18 19:49:47 +0200] information/ApiListener: Syncing runtime objects to endpoint 'XXXt'.
[2018-05-18 19:49:47 +0200] information/ApiListener: Finished syncing runtime objects to endpoint 'XXX'.
[2018-05-18 19:49:47 +0200] information/ApiListener: Finished sending runtime config updates for endpoint 'XXX' in zone 'master'.
[2018-05-18 19:49:47 +0200] information/ApiListener: Sending replay log for endpoint 'XXX' in zone 'master'.
[2018-05-18 19:49:47 +0200] information/ApiListener: Replayed 3 messages.
[2018-05-18 19:49:47 +0200] information/ApiListener: Finished sending replay log for endpoint 'XXX' in zone 'master'.
[2018-05-18 19:49:47 +0200] information/ApiListener: Finished syncing endpoint 'XXX' in zone 'master'.
[2018-05-18 19:49:56 +0200] information/WorkQueue: #5 (ApiListener, SyncQueue) items: 0, rate: 0.0166667/s (1/min 1/5min 1/15min);
[2018-05-18 19:49:56 +0200] information/WorkQueue: #4 (ApiListener, RelayQueue) items: 0, rate: 0.266667/s (16/min 16/5min 16/15min);
[2018-05-18 19:49:57 +0200] information/WorkQueue: #9 (JsonRpcConnection, #0) items: 0, rate: 0.133333/s (8/min 8/5min 8/15min);
[2018-05-18 19:49:57 +0200] information/WorkQueue: #12 (JsonRpcConnection, #3) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-05-18 19:49:57 +0200] information/WorkQueue: #11 (JsonRpcConnection, #2) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-05-18 19:49:57 +0200] information/WorkQueue: #10 (JsonRpcConnection, #1) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-05-18 19:50:31 +0200] information/CheckerComponent: 'checker' started."

@dnsmichi
Copy link
Contributor

This has been fixed with #6378 and will be released with 2.9. Please test the snapshot packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/windows Windows agent and plugins
Projects
None yet
Development

No branches or pull requests

8 participants