Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vspheredb daemon fails all the time after update to 1.1.0 #143

Closed
terra-nova opened this issue Dec 3, 2019 · 15 comments
Closed

vspheredb daemon fails all the time after update to 1.1.0 #143

terra-nova opened this issue Dec 3, 2019 · 15 comments
Assignees
Milestone

Comments

@terra-nova
Copy link

Expected Behavior

The icinga-vspheredb.service systemd service should be running constantly.

Current Behavior

The service starts up, terminates, starts up again, terminates, ... ad infinitum

Logs:

Dec 03 09:20:33 xxx systemd[1]: Started Icinga vSphereDB Daemon.
Dec 03 09:20:43 xxx systemd[1]: icinga-vspheredb.service watchdog timeout (limit 10s)!
Dec 03 09:20:43 xxx systemd[1]: icinga-vspheredb.service: main process exited, code=killed, status=6/ABRT
Dec 03 09:20:43 xxx systemd[1]: Unit icinga-vspheredb.service entered failed state.
Dec 03 09:20:43 xxx systemd[1]: icinga-vspheredb.service failed.
Dec 03 09:21:14 xxx systemd[1]: icinga-vspheredb.service holdoff time over, scheduling restart.
Dec 03 09:21:14 xxx systemd[1]: Stopped Icinga vSphereDB Daemon.
Dec 03 09:21:14 xxx systemd[1]: Starting Icinga vSphereDB Daemon...
...

Shortly after starting up, the service reports Running DB cleanup (this could take some time) and attempts to run an MySQL OPTIMIZE TABLE command on the vspheredb_daemonlog table:

...
| 374 | icinga_vspheredb | localhost | icinga_vspheredb | Query   |     2 | altering table         | OPTIMIZE TABLE vspheredb_daemonlog |
...
+-----+------------------+-----------+------------------+---------+-------+------------------------+------------------------------------+

If I run that statement manally (using the same database user), I get this output:

mysql> optimize table vspheredb_daemonlog;
+--------------------------------------+----------+----------+-------------------------------------------------------------------+
| Table                                | Op       | Msg_type | Msg_text                                                          |
+--------------------------------------+----------+----------+-------------------------------------------------------------------+
| icinga_vspheredb.vspheredb_daemonlog | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| icinga_vspheredb.vspheredb_daemonlog | optimize | status   | OK                                                                |
+--------------------------------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (5.15 sec)

So the operation does not seem to have failed. Shortly after, the daemon process is terminated ( watchdog timeout (limit 10s)!).

Possible Solution

Steps to Reproduce (for bugs)

Your Environment

  • VMware vCenter®/ESXi™-Version:
  • Version/GIT-Hash of this module: v1.7.2
  • Icinga Web 2 version: 2.7.3
  • Operating System and version: CentOS x64 7.7.1908
  • Webserver, PHP versions: httpd 2.4.6-90, rh-php71-php 7.1.30-1, rh-mysql80-mysql-server-8.0.17-1
@log1-c
Copy link

log1-c commented Dec 4, 2019

Can confirm:

root@server01:/usr/share/icingaweb2/modules/vspheredb# systemctl status icinga-vspheredb.service                                                                                           
● icinga-vspheredb.service - Icinga vSphereDB Daemon
   Loaded: loaded (/etc/systemd/system/icinga-vspheredb.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2019-12-04 15:22:39 CET; 9s ago
     Docs: https://icinga.com/docs/icinga-vsphere/latest/
 Main PID: 29689 (icingacli)
   Status: "Running DB cleanup (this could take some time)"
    Tasks: 1 (limit: 4660)
   CGroup: /system.slice/icinga-vspheredb.service
           └─29689 Icinga::vSphereDB::main: 0 active runners

Dez 04 15:22:38 server01 systemd[1]: Starting Icinga vSphereDB Daemon...
Dez 04 15:22:39 server01 systemd[1]: Started Icinga vSphereDB Daemon.
root@server01:/usr/share/icingaweb2/modules/vspheredb# systemctl status icinga-vspheredb.service                                                                                           
● icinga-vspheredb.service - Icinga vSphereDB Daemon
   Loaded: loaded (/etc/systemd/system/icinga-vspheredb.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: watchdog) since Wed 2019-12-04 15:22:49 CET; 7s ago
     Docs: https://icinga.com/docs/icinga-vsphere/latest/
  Process: 29689 ExecStart=/usr/bin/icingacli vspheredb daemon run (code=dumped, signal=ABRT)
 Main PID: 29689 (code=dumped, signal=ABRT)
   Status: "Running DB cleanup (this could take some time)"
root@server01:/usr/share/icingaweb2/modules/vspheredb# systemctl status icinga-vspheredb.service                                                                                           
● icinga-vspheredb.service - Icinga vSphereDB Daemon
   Loaded: loaded (/etc/systemd/system/icinga-vspheredb.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: watchdog) since Wed 2019-12-04 15:22:49 CET; 15s ago
     Docs: https://icinga.com/docs/icinga-vsphere/latest/
  Process: 29689 ExecStart=/usr/bin/icingacli vspheredb daemon run (code=dumped, signal=ABRT)
 Main PID: 29689 (code=dumped, signal=ABRT)
   Status: "Running DB cleanup (this could take some time)"
root@server01:/usr/share/icingaweb2/modules/vspheredb# systemctl status icinga-vspheredb.service                                                                                           
● icinga-vspheredb.service - Icinga vSphereDB Daemon
   Loaded: loaded (/etc/systemd/system/icinga-vspheredb.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: watchdog) since Wed 2019-12-04 15:22:49 CET; 22s ago
     Docs: https://icinga.com/docs/icinga-vsphere/latest/
  Process: 29689 ExecStart=/usr/bin/icingacli vspheredb daemon run (code=dumped, signal=ABRT)
 Main PID: 29689 (code=dumped, signal=ABRT)
   Status: "Running DB cleanup (this could take some time)"
root@server01:/usr/share/icingaweb2/modules/vspheredb# systemctl status icinga-vspheredb.service                                                                                           
● icinga-vspheredb.service - Icinga vSphereDB Daemon
   Loaded: loaded (/etc/systemd/system/icinga-vspheredb.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: watchdog) since Wed 2019-12-04 15:24:10 CET; 7s ago
     Docs: https://icinga.com/docs/icinga-vsphere/latest/
  Process: 30116 ExecStart=/usr/bin/icingacli vspheredb daemon run (code=dumped, signal=ABRT)
 Main PID: 30116 (code=dumped, signal=ABRT)
   Status: "Running DB cleanup (this could take some time)"

Dez 04 15:24:10 server01 systemd[1]: icinga-vspheredb.service: Failed with result 'watchdog'.

Icinga Web 2 Version
2.7.3
Git Commit
06cabfe8ba28cf545a42c92f25484383191a4e51
PHP Version
7.2.24-0ubuntu0.18.04.1
Git Commit Datum
2019-10-18

Module vspheredb
Status enabled
Version 1.1.0
Git Commit 5bc3546

@uffsalot
Copy link

uffsalot commented Dec 5, 2019

Can confirm.

Icinga Web 2 Version
2.7.3
Git Commit
06cabfe8ba28cf545a42c92f25484383191a4e51
PHP Version
7.3.11-1~deb10u1
Git Commit Datum
2019-10-18

VMware vCenter®/ESXi™-Version: 6.7
Version/GIT-Hash of this module: v1.1.0
Operating System and version: Debian 10 x64

@Obivatelj
Copy link

Obivatelj commented Dec 9, 2019

Confirm here too,
When starting with icingacli vspheredb daemon run --debug it works. It seams that Running DB cleanup (if longer then 10s) timeout causes this output message and, ergo, restarting of service.
I think it is related to #138

@madmax01
Copy link

madmax01 commented Dec 29, 2019

Is there any Fix for this?

i just fresh installed a setup today. and the service not starting up. the "Add Esxi/vcenter > is this just working once the Service is up ? as once clicking "Add",.. there is nothing to add. Create a new vCenter/ESXi-Connection > underneath everything blank !!.

Icinga Web 2 Version
2.7.3
PHP 7.2.26

dependencies all installed.

Errors:

`PHP Fatal error: Uncaught Error: Call to undefined function Icinga\Module\Vspheredb\Daemon\posix_getpid() in /usr/share/icingaweb2/modules/vspheredb/library/Vspheredb/Daemon/Daemon.php:64
Stack trace:
#0 /usr/share/icingaweb2/modules/vspheredb/library/Vspheredb/Daemon/Daemon.php(57): Icinga\Module\Vspheredb\Daemon\Daemon->detectProcessInfo()
#1 /usr/share/icingaweb2/modules/vspheredb/application/clicommands/DaemonCommand.php(25): Icinga\Module\Vspheredb\Daemon\Daemon->__construct()
#2 /usr/share/php/Icinga/Cli/Loader.php(265): Icinga\Module\Vspheredb\Clicommands\DaemonCommand->runAction()
#3 /usr/share/php/Icinga/Application/Cli.php(152): Icinga\Cli\Loader->dispatch()
#4 /usr/share/php/Icinga/Application/Cli.php(142): Icinga\Application\Cli->dispatchOnce()
#5 /usr/bin/icingacli(7): Icinga\Application\Cli->dispatch()
#6 {main}
thrown in /usr/share/icingaweb2/modules/vspheredb/library/Vspheredb/Daemon/Daemon.php on line 64

Fatal error: Uncaught Error: Call to undefined function Icinga\Module\Vspheredb\Daemon\posix_getpid() in /usr/share/icingaweb2/modules/vspheredb/library/Vspheredb/Daemon/Daemon.php:64
Stack trace:
#0 /usr/share/icingaweb2/modules/vspheredb/library/Vspheredb/Daemon/Daemon.php(57): Icinga\Module\Vspheredb\Daemon\Daemon->detectProcessInfo()
#1 /usr/share/icingaweb2/modules/vspheredb/application/clicommands/DaemonCommand.php(25): Icinga\Module\Vspheredb\Daemon\Daemon->__construct()
#2 /usr/share/php/Icinga/Cli/Loader.php(265): Icinga\Module\Vspheredb\Clicommands\DaemonCommand->runAction()
#3 /usr/share/php/Icinga/Application/Cli.php(152): Icinga\Cli\Loader->dispatch()
#4 /usr/share/php/Icinga/Application/Cli.php(142): Icinga\Application\Cli->dispatchOnce()
#5 /usr/bin/icingacli(7): Icinga\Application\Cli->dispatch()
#6 {main}
thrown in /usr/share/icingaweb2/modules/vspheredb/library/Vspheredb/Daemon/Daemon.php on line 64
`

@madmax01
Copy link

somehow i got the service working to start. but i'am not able to add vCenter. everything blank below "Create a new vCenter/ESXi-Connection

@Thomas-Gelf
Copy link
Contributor

@madmax01: please check the requirements section in our installation documentation

@guldil
Copy link

guldil commented Jan 4, 2020

i have same issue service always restart "Running DB cleanup (this could take some time)" but if i run "icingacli vspheredb daemon run --debug" it's working.

@guldil
Copy link

guldil commented Jan 5, 2020

i found a solution, just change WatchdogSec=10 to WatchdogSec=360 in /etc/systemd/system/icinga-vspheredb.service then systemctl daemon-reload and systemctl start icinga-vspheredb.

@stultitiophobia
Copy link

stultitiophobia commented Jan 7, 2020

can confirm the fix working here - thanks for that !

@slasse
Copy link

slasse commented Jan 9, 2020

we had the same problem and we can also confirm the fix

@Wintermute2k6
Copy link

We can also confirm this as an working fix for the Issue.

@hhamester
Copy link

hhamester commented Mar 12, 2020

Hi,

someone with the same problem?

# systemctl status icinga-vspheredb.service

● icinga-vspheredb.service - Icinga vSphereDB Daemon
     Loaded: loaded (/etc/systemd/system/icinga-vspheredb.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2020-03-12 07:28:23 CET; 26min ago
       Docs: https://icinga.com/docs/icinga-vsphere/latest/
   Main PID: 2156724 (icingacli)
     Status: "DB has been cleaned up"
      Tasks: 1 (limit: 9490)
     Memory: 13.3M
     CGroup: /system.slice/icinga-vspheredb.service
             └─2156724 Icinga::vSphereDB::main: 3 active runners

Mar 12 07:28:23 98lipmoni3 systemd[1]: Started Icinga vSphereDB Daemon.
Mar 12 07:32:42 98lipmoni3 icingacli[2156724]: Got invalid NetString data:
Mar 12 07:32:42 98lipmoni3 icingacli[2156724]: Fatal error: Uncaught Error: Class 'SoapVar' not found in /usr/share/icingaweb2/mod>
Mar 12 07:32:42 98lipmoni3 icingacli[2156724]: Got invalid NetString data:
Mar 12 07:32:42 98lipmoni3 icingacli[2156724]: Fatal error: Uncaught Error: Class 'SoapVar' not found in /usr/share/icingaweb2/mod>
Mar 12 07:32:42 98lipmoni3 icingacli[2156724]: Got invalid NetString data:
Mar 12 07:32:42 98lipmoni3 icingacli[2156724]: Fatal error: Uncaught Error: Class 'SoapVar' not found in /usr/share/icingaweb2/mod>
Mar 12 07:32:42 98lipmoni3 icingacli[2156724]: Server for vCenterID=2 failed, will try again in 30 seconds
Mar 12 07:32:42 98lipmoni3 icingacli[2156724]: Server for vCenterID=4 failed, will try again in 30 seconds
Mar 12 07:32:42 98lipmoni3 icingacli[2156724]: Server for vCenterID=6 failed, will try again in 30 seconds

# icingacli vspheredb daemon run --trace --debug

Got invalid NetString data: 
Fatal error: Uncaught Error: Class 'SoapVar' not found in /usr/share/icingaweb2/modules/vspheredb/l[..] truncated 1120 bytes [..] a\Module\Vspheredb in /usr/share/icingaweb2/modules/vspheredb/library/Vspheredb/Api.php on line 410
Got invalid NetString data: 
Fatal error: Uncaught Error: Class 'SoapVar' not found in /usr/share/icingaweb2/modules/vspheredb/l[..] truncated 1120 bytes [..] a\Module\Vspheredb in /usr/share/icingaweb2/modules/vspheredb/library/Vspheredb/Api.php on line 410
Got invalid NetString data: 
Fatal error: Uncaught Error: Class 'SoapVar' not found in /usr/share/icingaweb2/modules/vspheredb/l[..] truncated 1120 bytes [..] a\Module\Vspheredb in /usr/share/icingaweb2/modules/vspheredb/library/Vspheredb/Api.php on line 410
Server for vCenterID=2 failed, will try again in 30 seconds
Pid 2158637 stopped
Server for vCenterID=4 failed, will try again in 30 seconds
Pid 2158638 stopped
Server for vCenterID=6 failed, will try again in 30 seconds
Pid 2158639 stopped
SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '\x8D\xA3\xF8\xE6I\x00s\xFE\xBA\x18K#\xCE\x92=\x1C' for key 'PRIMARY', query was: INSERT INTO vspheredb_daemon (instance_uuid, ts_last_refresh, process_info, pid, fqdn, username, php_version) VALUES (?, ?, ?, ?, ?, ?, ?)
Database connection has been closed

@ChristianMoritz
Copy link

ChristianMoritz commented Mar 16, 2020

after ive tested the option with the WatchDog Timer and this doesnt works for me...

ive started the module with --debug and after about 1hour ive got an status update..

"DB has been cleaned up"

and a while later ive got the next error...

root@smon03:/# systemctl status icinga-vspheredb
● icinga-vspheredb.service - Icinga vSphereDB Daemon
Loaded: loaded (/etc/systemd/system/icinga-vspheredb.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2020-03-16 14:59:36 CET; 1min 41s ago
Docs: https://icinga.com/docs/icinga-vsphere/latest/
Main PID: 11396 (icingacli)
Status: "DB has been cleaned up"
CGroup: /system.slice/icinga-vspheredb.service
├─11396 Icinga::vSphereDB::main: 5 active runners
├─12977 Icinga::vSphereDB::sync (shv19call01)
├─12978 Icinga::vSphereDB::sync (shv06call01)
├─12980 Icinga::vSphereDB::sync (shvwerm01)
├─12982 Icinga::vSphereDB::sync (shv1911: Event Stream)
└─12987 Icinga::vSphereDB::sync (svcs: VM DataStore Usage)

Mar 16 14:59:36 smon03 systemd[1]: Starting Icinga vSphereDB Daemon...
Mar 16 14:59:36 smon03 systemd[1]: Started Icinga vSphereDB Daemon.
Mar 16 15:00:39 smon03 icingacli[11396]: Task perfCounterInfo failed: SQLSTATE[22001]: String data, right truncated: 1406 Data too long for column 'summary' at row 1, query was: INSERT INTO performance_counter (vcenter_uuid, counter_key, name, group_name, unit_name, label, summary, rollup_type, stats_type, level, per_device_level) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)

does some one got a hint for me ?

my enviroment:
icinga2: r2.11.3-1
icingaweb2: 2.7.3
vspheredb: 1.1.0
ipl: 0.5.0
incubator: 0.5.0
reactbundle: 0.7.0

UPDATE: after about 2 hours... the modul now is working again fine (without doing anything)

@wp-perc
Copy link

wp-perc commented Mar 31, 2020

I have the same issue. I'm not inside the systemd workflow, but it seems like there is some kind of "pulse" the vspheredb daemon process must send to the systemd to avoid being killed.

Cleaning up the daemon log table can take a very large amount of time, depending on both how often you restart the vspheredb service and how many virtual centers are monitored.
Because of this, I don't feel much comfortable on increasing the watchdog timeout: in case of failure, the unit will take a log time to be automatically restarted... Or am I wrong?

Therefore, a trade-off is needed: how often you restart the vspheredb service vs how much log you want to keep.

Besides, increasing the watchdog timeout to 600 seconds resolved for me. But I'm not that happy.

@Thomas-Gelf Thomas-Gelf self-assigned this Apr 29, 2020
@Thomas-Gelf Thomas-Gelf added this to the v1.2.0 milestone Apr 29, 2020
@Thomas-Gelf
Copy link
Contributor

This has been fixed, see #138 for related commits. Please upgrade to the current master (or the upcoming v1.2.0 release), apply schema migrations, restore the former watchdog setting and restart your daemon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests