New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Icinga2 startup fails, if network stack is not fully loaded. #6758

Closed
jschanz opened this Issue Nov 8, 2018 · 7 comments

Comments

Projects
None yet
4 participants
@jschanz
Contributor

jschanz commented Nov 8, 2018

Icinga2 startup fails, if network stack is not fully loaded.
Not sure, if this is a systemd or icinga2 related problem.

Icinga2 can't determine the FQDN of the host, if the startup of the network stack tooks longer than usual (e.g. if you use a brdige and several network interfaces.

Icinga2 does a fallback or could only get the hostname, but not the domain of the host, and fails while loading the certs to startup.

2018-11-07T17:08:45.232722+01:00 icinga-01 icinga2[788]: [2018-11-07 17:08:45 +0100] critical/SSL: Error on bio X509 AUX reading pem file '/var/lib/icinga2/certs//icinga-01.crt': 33558530, "error:02001002:lib(2):func(1):reason(2)"
2018-11-07T17:08:45.256520+01:00 icinga-01 icinga2[788]: [2018-11-07 17:08:45 +0100] critical/config: Error: Cannot get certificate from cert path: '/var/lib/icinga2/certs//icinga-01.crt'.

hostname is "icinga-01"
domain is "localdomain.local"
fqdn is "icinga-01.localdomain.local"

certs are stored with fqdn naming scheme

icinga-01.localdomain.local:/etc/sysconfig/network # ll /var/lib/icinga2/certs/
insgesamt 16
-rw-rw---- 1 icinga icinga 1720 25. Okt 06:37 ca.crt
-rw-rw---- 1 icinga icinga 1785 25. Okt 06:37 trusted-master.crt
-rw-rw---- 1 icinga icinga 1777 25. Okt 06:37 icinga-01.localdomain.local.crt
-rw------- 1 icinga icinga 3243 25. Okt 06:37 icinga-01.localdomain.local.key

full log of initialization ...

2018-11-07T17:08:44.121362+01:00 icinga-01 systemd[1]: Starting system-network.slice.
2018-11-07T17:08:44.121581+01:00 icinga-01 systemd[1]: Created slice system-network.slice.
2018-11-07T17:08:44.126479+01:00 icinga-01 systemd[1]: Starting ifup managed network interface eth0...
2018-11-07T17:08:44.166293+01:00 icinga-01 ifup[1363]: eth0      device: Intel Corporation 82578DM Gigabit Network Connection (rev 05)
2018-11-07T17:08:44.167068+01:00 icinga-01 ifup[1363]:     eth0      device: Intel Corporation 82578DM Gigabit Network Connection (rev 05)
2018-11-07T17:08:44.437492+01:00 icinga-01 icinga2[788]: [2018-11-07 17:08:44 +0100] information/cli: Icinga application loader (version: r2.10.1-1)
2018-11-07T17:08:44.437715+01:00 icinga-01 icinga2[788]: [2018-11-07 17:08:44 +0100] information/cli: Loading configuration file(s).
2018-11-07T17:08:44.706756+01:00 icinga-01 kernel: [   24.758137] e1000e 0000:00:19.0: irq 43 for MSI/MSI-X
2018-11-07T17:08:44.807769+01:00 icinga-01 kernel: [   24.859013] e1000e 0000:00:19.0: irq 43 for MSI/MSI-X
2018-11-07T17:08:44.807793+01:00 icinga-01 kernel: [   24.859162] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
2018-11-07T17:08:44.948371+01:00 icinga-01 icinga2[788]: [2018-11-07 17:08:44 +0100] information/ConfigItem: Committing config item(s).
2018-11-07T17:08:45.232722+01:00 icinga-01 icinga2[788]: [2018-11-07 17:08:45 +0100] critical/SSL: Error on bio X509 AUX reading pem file '/var/lib/icinga2/certs//icinga-01.crt': 33558530, "error:02001002:lib(2):func(1):reason(2)"
2018-11-07T17:08:45.256520+01:00 icinga-01 icinga2[788]: [2018-11-07 17:08:45 +0100] critical/config: Error: Cannot get certificate from cert path: '/var/lib/icinga2/certs//icinga-01.crt'.
2018-11-07T17:08:45.256679+01:00 icinga-01 icinga2[788]: Location: in /etc/icinga2/icinga2.conf: 29:1-29:24
2018-11-07T17:08:45.256830+01:00 icinga-01 icinga2[788]: /etc/icinga2/icinga2.conf(27): }
2018-11-07T17:08:45.256967+01:00 icinga-01 icinga2[788]: /etc/icinga2/icinga2.conf(28):
2018-11-07T17:08:45.257101+01:00 icinga-01 icinga2[788]: /etc/icinga2/icinga2.conf(29): object ApiListener "api" {
2018-11-07T17:08:45.257237+01:00 icinga-01 icinga2[788]: ^^^^^^^^^^^^^^^^^^^^^^^^
2018-11-07T17:08:45.257372+01:00 icinga-01 icinga2[788]: /etc/icinga2/icinga2.conf(30):   accept_commands = true
2018-11-07T17:08:45.257507+01:00 icinga-01 icinga2[788]: /etc/icinga2/icinga2.conf(31):   accept_config = true
2018-11-07T17:08:45.257649+01:00 icinga-01 icinga2[788]: [2018-11-07 17:08:45 +0100] critical/config: 1 error
2018-11-07T17:08:45.268125+01:00 icinga-01 systemd[1]: icinga2.service: main process exited, code=exited, status=1/FAILURE
2018-11-07T17:08:45.269265+01:00 icinga-01 systemd[1]: Failed to start Icinga host/service/network monitoring system.
2018-11-07T17:08:45.269488+01:00 icinga-01 systemd[1]: Unit icinga2.service entered failed state.
2018-11-07T17:08:46.159716+01:00 icinga-01 kernel: [   26.212521] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
2018-11-07T17:08:46.159734+01:00 icinga-01 kernel: [   26.212634] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
2018-11-07T17:08:46.159735+01:00 icinga-01 kernel: [   26.212670] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
2018-11-07T17:08:46.234230+01:00 icinga-01 systemd[1]: Started ifup managed network interface eth0.
2018-11-07T17:08:46.268042+01:00 icinga-01 systemd[1]: Expecting device sys-subsystem-net-devices-br0.device...
2018-11-07T17:08:46.268271+01:00 icinga-01 systemd[1]: Starting ifup managed network interface br0...
2018-11-07T17:08:46.321031+01:00 icinga-01 ifup[1942]: br0
2018-11-07T17:08:46.321739+01:00 icinga-01 ifup[1942]:     br0
2018-11-07T17:08:46.362713+01:00 icinga-01 kernel: [   26.415058] Bridge firewalling registered
2018-11-07T17:08:46.370519+01:00 icinga-01 ifup[1942]: br0       Ports: [eth0]
2018-11-07T17:08:46.370799+01:00 icinga-01 kernel: [   26.423412] device eth0 entered promiscuous mode
2018-11-07T17:08:46.373718+01:00 icinga-01 kernel: [   26.426703] br0: port 1(eth0) entered forwarding state
2018-11-07T17:08:46.373725+01:00 icinga-01 kernel: [   26.426707] br0: port 1(eth0) entered forwarding state
2018-11-07T17:08:46.374563+01:00 icinga-01 ifup-bridge[2019]:     br0       forwarddelay (see man ifcfg-bridge)
2018-11-07T17:08:46.375138+01:00 icinga-01 ifup[1942]: br0       forwarddelay (see man ifcfg-bridge) ... ready
2018-11-07T17:08:46.375375+01:00 icinga-01 systemd-sysctl[2047]: Overwriting earlier assignment of kernel/sysrq in file '/etc/sysctl.d/99-sysctl.conf'.
2018-11-07T17:08:46.376159+01:00 icinga-01 systemd[1]: Found device /sys/subsystem/net/devices/br0.
2018-11-07T17:08:46.378135+01:00 icinga-01 ifup-bridge[2019]: ... ready
2018-11-07T17:08:46.492138+01:00 icinga-01 systemd[1]: Started ifup managed network interface br0.
2018-11-07T17:08:46.504507+01:00 icinga-01 network[817]: ..done..done..doneSetting up service network  .  .  .  .  .  .  .  .  .  .  .  .  ...done
2018-11-07T17:08:46.505711+01:00 icinga-01 systemd[1]: Started LSB: Configure network interfaces and set up routing.
2018-11-07T17:08:46.505891+01:00 icinga-01 systemd[1]: Starting Network.
2018-11-07T17:08:46.507845+01:00 icinga-01 systemd[1]: Reached target Network.

If you do a restart after system is fully started, everything works as expected and the service is started.

Expected Behavior

Shouldn't fail

Current Behavior

Fails sometimes, if initialization of network stack is slow

Possible Solution

Steps to Reproduce (for bugs)

Not reproducible everytime, because sometimes it works, sometimes not.

Your Environment

  • Version used (icinga2 --version):
    icinga2 - The Icinga 2 network monitoring daemon (version: r2.10.1-1)

Copyright (c) 2012-2018 Icinga Development Team (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later http://gnu.org/licenses/gpl2.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
Platform: openSUSE
Platform version: 13.1 (Bottle)
Kernel: Linux
Kernel version: 3.11.10-29-desktop
Architecture: i686

Build information:
Compiler: GNU 4.8.1
Build host: server342vmx

Application information:

General paths:
Config directory: /etc/icinga2
Data directory: /var/lib/icinga2
Log directory: /var/log/icinga2
Cache directory: /var/cache/icinga2
Spool directory: /var/spool/icinga2
Run directory: /var/run/icinga2

Old paths (deprecated):
Installation root: /usr
Sysconf directory: /etc
Run directory (base): /var/run
Local state directory: /var

Internal paths:
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /var/run/icinga2/icinga2.pid

  • Operating System and version:

openSUSE 13.1 (i586)
VERSION = 13.1
CODENAME = Bottle

@dnsmichi dnsmichi added the Packages label Nov 8, 2018

@jschanz

This comment has been minimized.

Contributor

jschanz commented Nov 8, 2018

I think it has something to do with name resolution. If no entry is set in /etc/hosts, getaddrinfo fails without network. If a entry is set in /etc/host, FQDN is set, also without network.
Maybee it's only a documentation update to set an entry in /etc/hosts, which I could also do later.

@jschanz

This comment has been minimized.

Contributor

jschanz commented Nov 9, 2018

I'll get also messages like these:

2018-11-08T03:05:51.734071+01:00 icinga-01 icinga2[694]: [2018-11-08 03:05:51 +0100] critical/TcpSocket: getaddrinfo() failed with error code -2, "Name or service not known"
2018-11-08T03:05:51.747005+01:00 icinga-01 icinga2[694]: [2018-11-08 03:05:51 +0100] critical/TcpSocket: getaddrinfo() failed with error code -2, "Name or service not known"
@Crunsher

This comment has been minimized.

Member

Crunsher commented Nov 9, 2018

I looked this up yesterday: At startup Icinga calls getaddrinfo to get the FQDN, if that fails hostname and if that fails it uses 'localhost'.

I don't think there is anything we can do about this either, except document it 🤷‍♀️

@dgoetz

This comment has been minimized.

Member

dgoetz commented Nov 9, 2018

Just to ensure, @jschanz can you show the content of the systemd icinga2.service unit?

It should contain After=... network-online.target ..., which should be enough. If it is not enough like in your case ensure the wait daemon corresponding the network managing daemon is enabled (systemctl is-enabled NetworkManager-wait-online.service systemd-networkd-wait-online.service). If this is not enough I would say it is a problem of this daemon instead of Icinga 2.

Have a look for further details at https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

@jschanz

This comment has been minimized.

Contributor

jschanz commented Nov 9, 2018

@dgoetz

[Unit]
Description=Icinga host/service/network monitoring system
After=syslog.target network-online.target postgresql.service mariadb.service carbon-cache.service carbon-relay.service

[Service]
Type=notify
EnvironmentFile=/etc/sysconfig/icinga2
ExecStartPre=/usr/lib/icinga2/prepare-dirs /etc/sysconfig/icinga2
ExecStart=/usr/sbin/icinga2 daemon -e /var/log/icinga2/error.log
PIDFile=/var/run/icinga2/icinga2.pid
ExecReload=/usr/lib/icinga2/safe-reload /etc/sysconfig/icinga2
TimeoutStartSec=30m

# Systemd >228 enforces a lower process number for services.
# Depending on the distribution and Systemd version, this must
# be explicitly raised. Packages will set the needed values
# into /etc/systemd/system/icinga2.service.d/limits.conf
#
# Please check the troubleshooting documentation for further details.
# The values below can be used as examples for customized service files.

#TasksMax=infinity
#LimitNPROC=62883

[Install]
WantedBy=multi-user.target

Target "network" is reached after icinga2 start:

2018-11-07T17:08:46.507845+01:00 icinga-01 systemd[1]: Reached target Network.

but

2018-11-07T17:08:45.269265+01:00 icinga-01 systemd[1]: Failed to start Icinga host/service/network monitoring system.

I can reproduce this now ... Please unplug the network cable and try to use the following /etc/hosts

#
# hosts         This file describes a number of hostname-to-address
#               mappings for the TCP/IP subsystem.  It is mostly
#               used at boot time, when no name servers are running.
#               On small systems, this file can be used instead of a
#               "named" name server.
# Syntax:
#    
# IP-Address  Full-Qualified-Hostname  Short-Hostname
#
127.0.0.1	localhost.localdomain localhost

So no adress resultion (local, dns, etc.) is possible. Icinga2 is unable to determine the FQDN with getaddrinfo and fails while looking up for the certs in /var/lib/icinga2/certs/ and won't start due to that.

@jschanz

This comment has been minimized.

Contributor

jschanz commented Nov 9, 2018

Tested on SLES and OpenSUSE. Needs more testing in other environments.
Remove entry with from /etc/hosts and reboot. Icinga2-Service should start after successful network initialization now.

@dgoetz

This comment has been minimized.

Member

dgoetz commented Nov 9, 2018

I tried to reproduce on CentOS 7. On CentOS7 with NetworkManager.service and NetworkManager-wait-online.service enabled Icinga 2 is always started after networking. Enabling the old network.service and disabling NetworkManager.service and NetworkManager-wait-online.service gave me the same problem. Disabling network.service and only enabling NetworkManager.service also did not cause a problem. So it is totally depending on the network managing service.

With an additional Requires it also works for network.service only. While the Requires can delay start up of the system, I would say lets add it.

@dnsmichi dnsmichi closed this in 2754175 Nov 9, 2018

dnsmichi added a commit that referenced this issue Nov 9, 2018

@dnsmichi dnsmichi added this to the 2.11.0 milestone Nov 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment