Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poller timeout because of one down host #586

Closed
reboot1983 opened this issue Apr 19, 2017 · 2 comments
Closed

Poller timeout because of one down host #586

reboot1983 opened this issue Apr 19, 2017 · 2 comments

Comments

@reboot1983
Copy link
Contributor

reboot1983 commented Apr 19, 2017

Cacti 1.1.3 (official release from website);

I notice a high load on my Cacti Server (RPi 3B) and a time-out on the poller when one single host is down. I shutdown the NAS because of maintenance (firmware upgrade), so Cacti could not reach the NAS. Cacti successfully detected the host was down, but the complete poller stalled and waited. Resulting in the poller to shutdown after 300s, other hosts not being collected, resulting in malformed graphs, and also a new poller not being started resulting in more missing data.

cacti_7

2017-04-19 21:50:07 - SYSTEM STATS: Time:5.4570 Method:cmd.php Processes:4 Threads:N/A Hosts:4 HostsPerProcess:1 DataSources:66 RRDsProcessed:47
2017-04-19 21:45:06 - SYSTEM STATS: Time:4.5418 Method:cmd.php Processes:4 Threads:N/A Hosts:4 HostsPerProcess:1 DataSources:66 RRDsProcessed:47
2017-04-19 21:45:04 - CMDPHP Device[2] Description[QNAP TS-231] NOTICE: HOST EVENT: Device Returned FROM DOWN State:
2017-04-19 21:40:07 - SYSTEM STATS: Time:4.4803 Method:cmd.php Processes:4 Threads:N/A Hosts:4 HostsPerProcess:1 DataSources:66 RRDsProcessed:47
2017-04-19 21:35:15 - RECACHE STATS: Poller: RecacheTime:7.9856 DevicesRecached:1
2017-04-19 21:35:07 - PCOMMAND Device[2] Description[QNAP TS-231] WARNING: Recache Event Detected for Device
2017-04-19 21:35:06 - SYSTEM STATS: Time:4.5311 Method:cmd.php Processes:4 Threads:N/A Hosts:4 HostsPerProcess:1 DataSources:66 RRDsProcessed:47
2017-04-19 21:35:04 - POLLER: Poller[1] ASSERT: '47327<27979' failed. Recaching host '192.168.1.2', data query #5
2017-04-19 21:35:04 - POLLER: Poller[1] ASSERT: '47326<27978' failed. Recaching host '192.168.1.2', data query #4
2017-04-19 21:35:04 - POLLER: Poller[1] ASSERT: '47326<27977' failed. Recaching host '192.168.1.2', data query #1
2017-04-19 21:32:07 - SYSTEM STATS: Time:125.7961 Method:cmd.php Processes:4 Threads:N/A Hosts:4 HostsPerProcess:1 DataSources:66 RRDsProcessed:55
2017-04-19 21:30:07 - CMDPHP WARNING: SNMP Error:'Timeout (2000 ms)', Device:'192.168.1.2', OID:'.1.3.6.1.2.1.1.3.0'
2017-04-19 21:30:01 - POLLER: Poller[1] WARNING: There are '1' detected as overrunning a polling process, please investigate
2017-04-19 21:30:00 - SYSTEM STATS: Time:298.1245 Method:cmd.php Processes:4 Threads:N/A Hosts:4 HostsPerProcess:1 DataSources:66 RRDsProcessed:15
2017-04-19 21:30:00 - POLLER: Poller[1] Maximum runtime of 298 seconds exceeded. Exiting.
2017-04-19 21:25:07 - CMDPHP WARNING: SNMP Error:'Timeout (2000 ms)', Device:'192.168.1.2', OID:'.1.3.6.1.2.1.1.3.0'
2017-04-19 21:25:03 - CMDPHP WARNING: SNMP Error:'Timeout (2000 ms)', Device:'192.168.1.2', OID:'.1.3.6.1.2.1.1.1.0'
2017-04-19 21:24:55 - CMDPHP WARNING: SNMP Error:'Timeout (2500 ms)', Device:'192.168.1.2', OID:'.1.3.6.1.2.1.1.1.0'
2017-04-19 21:22:51 - AUTH LOGIN: User 'admin' Authenticated via Authentication Cookie
2017-04-19 21:20:06 - SYSTEM STATS: Time:4.4696 Method:cmd.php Processes:4 Threads:N/A Hosts:4 HostsPerProcess:1 DataSources:66 RRDsProcessed:47
2017-04-19 21:15:16 - RECACHE STATS: Poller: RecacheTime:8.5893 DevicesRecached:1
2017-04-19 21:15:07 - PCOMMAND Device[2] Description[QNAP TS-231] WARNING: Recache Event Detected for Device
2017-04-19 21:15:06 - SYSTEM STATS: Time:4.4532 Method:cmd.php Processes:4 Threads:N/A Hosts:4 HostsPerProcess:1 DataSources:66 RRDsProcessed:47
2017-04-19 21:15:04 - POLLER: Poller[1] ASSERT: '314162081<17366' failed. Recaching host '192.168.1.2', data query #5
2017-04-19 21:15:04 - POLLER: Poller[1] ASSERT: '314162080<17365' failed. Recaching host '192.168.1.2', data query #4
2017-04-19 21:15:04 - POLLER: Poller[1] ASSERT: '314162080<17364' failed. Recaching host '192.168.1.2', data query #1
2017-04-19 21:12:56 - SYSTEM STATS: Time:173.9243 Method:cmd.php Processes:4 Threads:N/A Hosts:4 HostsPerProcess:1 DataSources:66 RRDsProcessed:135
2017-04-19 21:12:55 - CMDPHP SQL Backtrace: (/cmd.php: 766 record_cmdphp_done)(/cmd.php: 74 db_execute_prepared)(/lib/database.php: 178 cacti_debug_backtrace)
2017-04-19 21:12:55 - CMDPHP ERROR: A DB Exec Failed!, Error: MySQL server has gone away
2017-04-19 21:12:55 - DBCALL ERROR: A DB Exec Failed!, Error:2006, SQL:"UPDATE poller_time SET end_time=NOW() WHERE poller_id = ? AND pid = ?'
2017-04-19 21:12:55 - CMDPHP SQL Backtrace: (/cmd.php: 732 db_execute)(/lib/database.php: 113 db_execute_prepared)(/lib/database.php: 178 cacti_debug_backtrace)
2017-04-19 21:12:55 - CMDPHP ERROR: A DB Exec Failed!, Error: MySQL server has gone away
2017-04-19 21:12:55 - DBCALL ERROR: A DB Exec Failed!, Error:2006, SQL:"INSERT IGNORE INTO poller_output (local_data_id, rrd_name, time, output) VALUES (7, 'ssCpuIdle', '2017-04-19 20:40:03', '96'), (8, 'ssCpuSystem', '2017-04-19 20:40:03', '0'), (9, 'ssCpuUser', '2017-04-19 20:40:03', '3'), (10, 'load_1min', '2017-04-19 20:40:03', '0.39'), (11, 'load_15min', '2017-04-19 20:40:03', '0.19'), (12, 'load_5min', '2017-04-19 20:40:03', '0.23'), (13, 'mem_buffers', '2017-04-19 20:40:03', '55140'), (14, 'mem_cache', '2017-04-19 20:40:03', '557248'), (15, 'mem_free', '2017-04-19 20:40:03', '68908'), (16, 'mem_total', '2017-04-19 20:40:03', '947732'), (18, 'users', '2017-04-19 20:40:03', '0'), (19, 'proc', '2017-04-19 20:40:03', '137'), (24, 'traffic_in', '2017-04-19 20:40:03', '615475736'), (24, 'traffic_out', '2017-04-19 20:40:03', '492830897'), (25, 'hdd_free', '2017-04-19 20:40:03', '42960'), (25, 'hdd_used', '2017-04-19 20:40:03', '21496'), (26, 'hdd_free', '2017-04-19 20:40:03', '13142676'), (26, 'hdd_used', '2017-04-19 20:40:03', '1425228'), (27, 'Bytes_Read', '2017-04-19 20:40:03', '409622016'), (27, 'Bytes_Written', '2017-04-19 20:40:03', '17734750720'), (28, 'Bytes_Read', '2017-04-19 20:40:03', '942592'), (28, 'Bytes_Written', '2017-04-19 20:40:03', '512'), (29, 'Bytes_Read', '2017-04-19 20:40:03', '408269824'), (29, 'Bytes_Written', '2017-04-19 20:40:03', '17734750208'), (66, 'hrSystemUptime', '2017-04-19 20:40:03', '46179074'), (67, 'degrees', '2017-04-19 20:40:03', '50.5')'
2017-04-19 21:10:09 - CMDPHP WARNING: SNMP Error:'Timeout (2500 ms)', Device:'192.168.1.2', OID:'.1.3.6.1.2.1.1.3.0'
2017-04-19 21:10:02 - POLLER: Poller[1] WARNING: There are '1' detected as overrunning a polling process, please investigate
2017-04-19 21:10:01 - SYSTEM STATS: Time:299.1497 Method:cmd.php Processes:4 Threads:N/A Hosts:4 HostsPerProcess:1 DataSources:66 RRDsProcessed:15
2017-04-19 21:10:01 - POLLER: Poller[1] Maximum runtime of 298 seconds exceeded. Exiting.
2017-04-19 21:05:09 - CMDPHP WARNING: SNMP Error:'Timeout (2500 ms)', Device:'192.168.1.2', OID:'.1.3.6.1.2.1.1.3.0'
2017-04-19 21:05:02 - POLLER: Poller[1] WARNING: There are '1' detected as overrunning a polling process, please investigate
2017-04-19 21:05:00 - SYSTEM STATS: Time:299.1229 Method:cmd.php Processes:4 Threads:N/A Hosts:4 HostsPerProcess:1 DataSources:66 RRDsProcessed:15
2017-04-19 21:05:00 - POLLER: Poller[1] Maximum runtime of 298 seconds exceeded. Exiting.
2017-04-19 21:00:08 - CMDPHP WARNING: SNMP Error:'Timeout (2500 ms)', Device:'192.168.1.2', OID:'.1.3.6.1.2.1.1.3.0'
2017-04-19 21:00:01 - POLLER: Poller[1] WARNING: There are '1' detected as overrunning a polling process, please investigate
2017-04-19 21:00:00 - SYSTEM STATS: Time:298.1997 Method:cmd.php Processes:4 Threads:N/A Hosts:4 HostsPerProcess:1 DataSources:66 RRDsProcessed:15
2017-04-19 21:00:00 - POLLER: Poller[1] Maximum runtime of 298 seconds exceeded. Exiting.
2017-04-19 20:55:09 - CMDPHP WARNING: SNMP Error:'Timeout (2500 ms)', Device:'192.168.1.2', OID:'.1.3.6.1.2.1.1.3.0'
2017-04-19 20:55:02 - POLLER: Poller[1] WARNING: There are '1' detected as overrunning a polling process, please investigate
2017-04-19 20:55:01 - SYSTEM STATS: Time:299.1642 Method:cmd.php Processes:4 Threads:N/A Hosts:4 HostsPerProcess:1 DataSources:66 RRDsProcessed:15
2017-04-19 20:55:01 - POLLER: Poller[1] Maximum runtime of 298 seconds exceeded. Exiting.
2017-04-19 20:50:10 - CMDPHP WARNING: SNMP Error:'Timeout (2500 ms)', Device:'192.168.1.2', OID:'.1.3.6.1.2.1.1.3.0'
2017-04-19 20:50:02 - POLLER: Poller[1] WARNING: There are '1' detected as overrunning a polling process, please investigate
2017-04-19 20:50:01 - SYSTEM STATS: Time:299.1396 Method:cmd.php Processes:4 Threads:N/A Hosts:4 HostsPerProcess:1 DataSources:66 RRDsProcessed:15
2017-04-19 20:50:01 - POLLER: Poller[1] Maximum runtime of 298 seconds exceeded. Exiting.
2017-04-19 20:45:09 - CMDPHP Device[2] Description[QNAP TS-231] ERROR: HOST EVENT: Device is DOWN Message: Device did not respond to SNMP
2017-04-19 20:45:09 - CMDPHP WARNING: SNMP Error:'Timeout (2500 ms)', Device:'192.168.1.2', OID:'.1.3.6.1.2.1.1.3.0'
2017-04-19 20:45:02 - POLLER: Poller[1] WARNING: There are '1' detected as overrunning a polling process, please investigate
2017-04-19 20:45:01 - SYSTEM STATS: Time:299.1370 Method:cmd.php Processes:4 Threads:N/A Hosts:4 HostsPerProcess:1 DataSources:66 RRDsProcessed:15
2017-04-19 20:45:01 - POLLER: Poller[1] Maximum runtime of 298 seconds exceeded. Exiting.
2017-04-19 20:40:09 - CMDPHP WARNING: SNMP Error:'Timeout (2500 ms)', Device:'192.168.1.2', OID:'.1.3.6.1.2.1.1.3.0'
2017-04-19 20:35:06 - SYSTEM STATS: Time:4.4616 Method:cmd.php Processes:4 Threads:N/A Hosts:4 HostsPerProcess:1 DataSources:66 RRDsProcessed:47
2017-04-19 20:30:11 - SYSTEM STATS: Time:9.4890 Method:cmd.php Processes:4 Threads:N/A Hosts:4 HostsPerProcess:1 DataSources:66 RRDsProcessed:47
2017-04-19 20:25:09 - SYSTEM STATS: Time:6.4769 Method:cmd.php Processes:4 Threads:N/A Hosts:4 HostsPerProcess:1 DataSources:66 RRDsProcessed:47
@cigamit
Copy link
Member

cigamit commented Apr 21, 2017

Yea, this is why many use spine.

cigamit added a commit that referenced this issue Apr 21, 2017
Overrunning pollers can cause system load spikes.
@cigamit
Copy link
Member

cigamit commented Apr 21, 2017

Resolving in cmd.php which will have a cascading affect to poller.php.

@cigamit cigamit closed this as completed Apr 21, 2017
@github-actions github-actions bot locked and limited conversation to collaborators Jun 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants