Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve downed device detection #304

Closed
iskandarbasman opened this issue Mar 23, 2023 · 16 comments
Closed

Improve downed device detection #304

iskandarbasman opened this issue Mar 23, 2023 · 16 comments

Comments

@iskandarbasman
Copy link

Describe the bug

Upgraded from Cacti 1.2.21 to 1.2.24 and noticed that Nokia Router graphs under VPRN has stopped polling.
After troubleshooting found that Downed Device Detection "Ping or SNMP Time" is causing the issue.
Reconfiguring Downed Device Detection to "Ping" ONLY resolves the problem.

Nokia router VPRN devices do not support SNMP uptime OID and in previous cacti version 1.2.21, Downed Device Detection "Ping or SNMP Time" was working well.

To Reproduce

Steps to reproduce the behavior:

When polling for devices that do not support SNMP uptime OID like Nokia Router using VPRN.
Change Downed Device Detection to "Ping or SNMP Time".
Polling stops for the device.

Expected behavior

Configuration is for "Ping or SNMP Time" and that expectation is that since this is OR statement, when the device is still pingable polling should be working.

Screenshots

image

@TheWitness
Copy link
Member

Going to move this to spine. Try Ping Sys Description or Ping Get next.

@TheWitness TheWitness transferred this issue from Cacti/cacti Mar 23, 2023
@iskandarbasman
Copy link
Author

Going to move this to spine. Try Ping Sys Description or Ping Get next.

SNMP Desc is not supported on Nokia VPRNs
image

SNMP GetNext is not supported on Nokia VPRNs
image

"Ping or SNMP Time" was previously working now it is bugged in Cacti 1.2.24 as you can see ping working and cacti should not be detecting the devices as DOWN.
image

Ping ONLY is working fine. Cacti polls the device since it is detected UP.
image

@TheWitness
Copy link
Member

Are you using php-snmp or Net-SNMP? If using php-snmp, u install it and restart Apache.

@iskandarbasman
Copy link
Author

Are you using php-snmp or Net-SNMP? If using php-snmp, u install it and restart Apache.

There might be some misunderstanding. If you thinking there is some issue with the SNMP polling on the Nokia Routers.

SNMP Uptime polling works fine on the Nokia 'Physical' router as per the screenshot below. (NOKIA 7750 SR)
image

NOKIA 7750 SR has the feature called VPRN which is basically creating a virtual router instance within the physical router.
We can SNMP poll these VPRN virtual just like a real router, however it does not support all SNMP OIDs eg. SNMP Uptime and etc.. But OIDs like SNMP interface counters polling works fine.

The bug I am trying to highlight is that the OR statement in the down detection (Ping or SNMP Uptime) stopped working properly in Cacti 1.2.24.
Even though the Nokia VPRN router is still pingable, cacti is detecting still as Down device, thus stops SNMP polling.

@TheWitness
Copy link
Member

What spine version were you using before?

@iskandarbasman
Copy link
Author

iskandarbasman commented Mar 27, 2023

What spine version were you using before?

Previously was on cacti & spine 1.2.21

@netniV
Copy link
Member

netniV commented Jun 8, 2023

I don't think that spine is the issue here because it will take a response of missing OID as the host responded within ping.c. In poller.c, the other place it reads for these oids, it will assume a value to the polling information only if it gets it, but doesn't stop the process otherwise.

I did spot one bug where the uptime OID check only applied to the one of the two uptime OIDs, so I have now applied it to both.

@netniV
Copy link
Member

netniV commented Jun 8, 2023

Can i suggest you run spine in debug mode for this particular host and report the results back to us?

@netniV netniV added this to the v1.2.25 milestone Jun 8, 2023
@netniV netniV changed the title Cacti 1.2.24 - Downed Device Detection "Ping or SNMP Time" issue Improve downed Device detection Jun 8, 2023
netniV added a commit that referenced this issue Jun 8, 2023
This relates to cacti/#5356
@netniV
Copy link
Member

netniV commented Jun 8, 2023

I made some changes to the Cacti Core as well as spine, see how you get on.

netniV added a commit that referenced this issue Jun 12, 2023
This relates to cacti/#5356
@TheWitness
Copy link
Member

@netniV, is this resolved now?

@netniV
Copy link
Member

netniV commented Jun 20, 2023

We haven't heard anything back and I believe the changes I made are for the better

@jdcoats
Copy link

jdcoats commented Jun 26, 2023

I'm tracking this also, i had noticed the same thing with "ping and" vs "ping or" just didn't have time to chase down the details. I have updated and will let you know tomorrow if i find time to dig in deeper.

@iskandarbasman
Copy link
Author

I can only test this issue once the fix push to the main release of cacti.
I do not run cacti-develop on my production cacti and unfortunately I do not have a router with Nokia VPRNs
enabled on my UAT Cacti. (But I can try to do something later.)

I decided to install cacti-develop and spine-develop into my UAT cacti anyway.
Running into a different problem below.
The cacti main page will not load and getting the errors below.


[root@sptcacti03-dc1 cacti]# cat /var/log/httpd/error_log
[Sun Jun 25 03:49:01.887867 2023] [lbmethod_heartbeat:notice] [pid 8493] AH02282: No slotmem from mod_heartmonitor
[Sun Jun 25 03:49:01.922552 2023] [mpm_prefork:notice] [pid 8493] AH00163: Apache/2.4.6 (CentOS) PHP/7.3.28 configured -- resuming normal operations
[Sun Jun 25 03:49:01.922574 2023] [core:notice] [pid 8493] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
[Tue Jun 27 09:21:21.155731 2023] [php7:error] [pid 1381] [client 10.107.0.66:49986] PHP Parse error:  syntax error, unexpected '|', expecting variable (T_VARIABLE) in /var/www/html/cacti/lib/database.php on line 240
[Tue Jun 27 09:21:38.771366 2023] [php7:error] [pid 24785] [client 10.107.0.66:49998] PHP Parse error:  syntax error, unexpected '|', expecting variable (T_VARIABLE) in /var/www/html/cacti/lib/database.php on line 240
[Tue Jun 27 09:21:58.568096 2023] [php7:error] [pid 27419] [client 10.107.0.66:50010] PHP Parse error:  syntax error, unexpected '|', expecting variable (T_VARIABLE) in /var/www/html/cacti/lib/database.php on line 240
[Tue Jun 27 09:24:55.335313 2023] [php7:error] [pid 26936] [client 10.107.0.66:50092] PHP Parse error:  syntax error, unexpected '|', expecting variable (T_VARIABLE) in /var/www/html/cacti/lib/database.php on line 240

LINE 240


function db_check_reconnect(object|false $db_conn = false, $log = true) {
        global $config, $database_details;

        include(CACTI_PATH_INCLUDE . '/config.php');

        if (cacti_sizeof($database_details) && $db_conn !== false) {
                foreach ($database_details as $det) {
                        if (spl_object_hash($det['database_conn']) == spl_object_hash($db_conn)) {
                                $database_hostname = $det['database_hostname'];
                                $database_username = $det['database_username'];
                                $database_password = $det['database_password'];
                                $database_default  = $det['database_default'];
                                $database_type     = $det['database_type'];
                                $database_port     = $det['database_port'];
                                $database_retries  = $det['database_retries'];
                                $database_ssl      = $det['database_ssl'];
                                $database_ssl_key  = $det['database_ssl_key'];
                                $database_ssl_cert = $det['database_ssl_cert'];
                                $database_ssl_ca   = $det['database_ssl_ca'];

                                break;
                        }
                }
        } else {
                if (!isset($database_ssl)) {
                        $database_ssl      = false;
                }

                if (!isset($database_ssl_key)) {
                        $database_ssl_key  = '';
                }

                if (!isset($database_ssl_cert)) {
                        $database_ssl_cert = '';
                }

                if (!isset($database_ssl_ca)) {
                        $database_ssl_ca   = '';
                }

                if (!isset($database_retries)) {
                        $database_retries  = 2;
                }

                if (!isset($database_port)) {
                        $database_port     = 3306;
                }

@jdcoats
Copy link

jdcoats commented Jun 27, 2023

@iskandarbasman use 1.2.x branch not develop

@jdcoats
Copy link

jdcoats commented Jun 27, 2023

This is working as expected now. If ping or snmp is in use and snmp stops responding the device is not down as long as it still answers a ping. Likewise if ping and snmp is in use if either one stops responding the device is down.

@TheWitness
Copy link
Member

The ping method "SNMP getnext" is the most compatible with all devices. Use it if you have issues.

@netniV netniV changed the title Improve downed Device detection Improve downed device detection Sep 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants