Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on interpretation of snmpUtime, when to big #4428

Closed
arno-st opened this issue Oct 18, 2021 · 19 comments
Closed

Error on interpretation of snmpUtime, when to big #4428

arno-st opened this issue Oct 18, 2021 · 19 comments
Labels
bug Undesired behaviour resolved A fixed issue
Milestone

Comments

@arno-st
Copy link
Contributor

arno-st commented Oct 18, 2021

On a device when uptime is to long (more than 224 days I think) the device is seen as down, if the monitoring is set to snmpUptime and ping.

As an exemple I have this:
Uptime: 4190934469 (485days, 1hours, 29minutes)
and device is down.
ON 1.2.18 I have this:

sre-core 10.0.2.26 3159 28 29 Down 2d:23h:56m N/A  

And on 1.2.17

sre-core 10.0.2.26 363 37 38 Up 226d:23h:49m 485d:1h:55m  
@arno-st arno-st added bug Undesired behaviour unverified Some days we don't have a clue labels Oct 18, 2021
@netniV
Copy link
Member

netniV commented Oct 18, 2021

This may be the difference between 32-bit and 64-bit uptime counters. Uptime is recorded in seconds if I remember rightly as I used to have this issue on older 32-bit hardware making Cacti think it had rebooted when it hadn't.

What is your OS/Arch for all involved systems?

@arno-st
Copy link
Contributor Author

arno-st commented Oct 18, 2021

My 1.2.18 run on centos 10.0-1160.42.2.el7.x86_64

My 1.2.17 run on centos 10.0-1160.42.2.el7.x86_64

The only point can find is that on has PHP 7.4.2 on the 1.2.17 version, the other one has 7.4.14 where cacti 1.2.18 run.

Otherwise, it should be same same!

As for the client I'm polling there are both the same device Cisco Switch

@netniV
Copy link
Member

netniV commented Oct 18, 2021

What about the SNMP libraries? Are you using php-snmp or net-snmp?

@arno-st
Copy link
Contributor Author

arno-st commented Oct 18, 2021

php-snmp
both same version
Name : php-snmp
Arch : x86_64
Version : 7.4.24
Release : 1.el7.remi

@arno-st
Copy link
Contributor Author

arno-st commented Oct 18, 2021

Don't think the problem is on the polling part.
since the DB give me both:
cacti 1.2.17 4165077143
cacti 1.2.18 4192683188

Mineur difference.

@TheWitness
Copy link
Member

I think we need to move that column to a bigint later tonight my time unless @netniV want's to hammer it out.

@TheWitness TheWitness removed the unverified Some days we don't have a clue label Oct 19, 2021
@TheWitness TheWitness added this to the v1.2.19 milestone Oct 19, 2021
@TheWitness
Copy link
Member

Okay, this is resolved for the 1.2.19 release. You can just hand run the two SQL alters at the bottom of the 1_2_19.php file if you want to hack it in.

@TheWitness TheWitness added the resolved A fixed issue label Oct 19, 2021
@TheWitness
Copy link
Member

Thanks for keeping your eye on the ball.

@arno-st
Copy link
Contributor Author

arno-st commented Oct 19, 2021

Can you reopen it, it dosen't solve the problem.
I'm working on it to find what's wrong.

Here is the ouput of my DB, the first 2 record are in SNMP v3 and are show as DOWN, the last one is in SNMP V2, and see as UP. I'm looking on that track to see if it's a SNMP version problem.
"id","poller_id","site_id","host_template_id","description","hostname","location","notes","external_id","snmp_community","snmp_version","snmp_username","snmp_password","snmp_auth_protocol","snmp_priv_passphrase","snmp_priv_protocol","snmp_context","snmp_engine_id","snmp_port","snmp_timeout","snmp_sysDescr","snmp_sysObjectID","snmp_sysUpTimeInstance","snmp_sysContact","snmp_sysName","snmp_sysLocation","availability_method","ping_method","ping_port","ping_timeout","ping_retries","max_oids","bulk_walk_size","device_threads","deleted","disabled","monitor","monitor_text","monitor_criticality","monitor_warn","monitor_alert","thold_send_email","thold_host_email","status","status_event_count","status_fail_date","status_rec_date","status_last_error","min_time","max_time","cur_time","avg_time","polling_time","total_polls","failed_polls","availability","last_updated","serial_no","model","isPhone","keep_mac_track","password","console_type","can_be_upgraded","can_be_rebooted","do_backup","login","mode"

"3159","1","3","8","core","10.0.2.26",," EZV: En service","SR02129 SR02128 ","telvlsn","3","SNMP_USER","SNMP_KEY","SHA","SNMP_KEY","AES128",,,"161","500","Cisco IOS Software [Fuji], Catalyst L3 Switch Software (CAT9K_IOSXE), Version 16.9.5, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2020 by Cisco Systems, Inc.
Compiled Thu 30-Jan-20 18:53 by mcpre","iso.3.6.1.4.1.9.1.2593","4165077143",,"CORE","LOCATION","1","1","23","400","1","10","-1","1",,,,,"0","0","0","1","0","1","5670","2021-10-15 10:06:02","2021-10-13 15:13:02","Device responded to SNMP, ICMP: Destination address not specified","0.00000","88.03201","0.00000","0.74397","0.027","795147","8664","98.91040","2021-10-19 08:35:02","FCW2211A0AT FCW2211A0B8","C9500-16X","of","of","SNMP_KEY","1","off","off","on","PASSWORD","bundle"

"3160","1","3","7","vbb","10.1.128.10",,,,"telvlsn","3","SNMP_USER","SNMP_KEY","SHA","SNMP_KEY","AES128",,,"161","500","Cisco IOS Software, IOS-XE Software, Catalyst 4500 L3 Switch Software (cat4500e-UNIVERSALK9-M), Version 03.08.08.E RELEASE SOFTWARE (fc3)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2019 by Cisco Systems, Inc.
Compiled Fri 1","iso.3.6.1.4.1.9.1.1732","2013484542","SOI - Telecom","VBB.recolte.lausanne.ch","LOCATION","1","1","23","400","1","10","-1","1",,,,,"0","0","0","1","0","1","11080","2021-10-11 15:56:02","2020-06-20 08:27:00","Device responded to SNMP, ICMP: Destination address not specified","0.03791","995.94558","1.81699","6.07636","0.034","795146","11232","98.58740","2021-10-19 08:35:02","JAE17430H3B JAE17430BD6","WS-C4500X-32","of","of","SNMP_KEY","1","off","off","on","PASSWORD","bundle"

"4515","1","3","9","pdp","10.128.1.41",," EZV: En service","SR01288 ","telvlsn","2",,,,,,,,"161","500","Cisco NX-OS(tm) n5000, Software (n5000-uk9), Version 5.1(3)N2(1b), RELEASE SOFTWARE Copyright (c) 2002-2011 by Cisco Systems, Inc. Device Manager Version 5.2(1), Compiled 8/31/2012 17:00:00","iso.3.6.1.4.1.9.12.3.1.3.1084","1643546154","","PDP","LOCATION","4","1","23","400","1","10","-1","1",,,"on",,"0","0","0","1","0","3","0","2021-08-04 18:05:05","2021-08-04 18:19:02","Device did not respond to SNMP, ICMP: Ping timed out","0.00000","55.65691","0.00000","1.20858","0.046","371268","14","99.99620","2021-10-19 08:35:02","SSI154300JT","N5K-C5548UP",,"of","UNAW2m3sFF+9uSzZf","1","off","off","off","PASSWORD","bundle"

@TheWitness
Copy link
Member

Using spine or cmd.php? Problem could be spine at this point.

@arno-st
Copy link
Contributor Author

arno-st commented Oct 19, 2021

I'm using spine, and I'm looking into it.
Don't know yet, I have tu understand how the status is check.
how can I debug spine ?
Does it affect all poller, or can I run it once for a specific device ?

@arno-st
Copy link
Contributor Author

arno-st commented Oct 19, 2021

So here is the output of spine:
./spine -S -H 3160 --verbosity=5 --conf=../etc/spine.conf
SPINE: Using spine config file [../etc/spine.conf]
Total[0.0064] DEBUG: The path_php_server variable is /usr/share/cacti/script_server.php
Total[0.0065] DEBUG: The path_cactilog variable is /usr/share/cacti/log/cacti.log
Total[0.0065] DEBUG: The log_destination variable is 4 (STDOUT)
Total[0.0067] DEBUG: The path_php variable is /bin/php
Total[0.0069] DEBUG: The availability_method variable is 4
Total[0.0070] DEBUG: The ping_recovery_count variable is 6
Total[0.0071] DEBUG: The ping_failure_count variable is 4
Total[0.0072] DEBUG: The ping_method variable is 1
Total[0.0073] DEBUG: The ping_retries variable is 1
Total[0.0074] DEBUG: The ping_timeout variable is 400
Total[0.0074] DEBUG: The snmp_retries variable is 3
Total[0.0075] DEBUG: The log_perror variable is 1
Total[0.0076] DEBUG: The log_pwarn variable is 1
Total[0.0077] DEBUG: The boost_redirect variable is 1
Total[0.0078] DEBUG: The boost_rrd_update_enable variable is 0
Total[0.0079] DEBUG: The log_pstats variable is 1
Total[0.0080] DEBUG: The threads variable is 13
Total[0.0081] DEBUG: The polling interval is 60 seconds
Total[0.0082] DEBUG: The number of concurrent processes is 2
Total[0.0083] DEBUG: The script timeout is 25
Total[0.0084] DEBUG: The selective_device_debug variable is 3160,3159
Total[0.0085] DEBUG: The spine_log_level variable is 0
Total[0.0086] DEBUG: The number of php script servers to run is 5
Total[0.0087] DEBUG: Device List to be polled='3160', TotalPHPScripts='1'
Total[0.0087] DEBUG: The PHP Script Server is Required
Total[0.0088] DEBUG: The Maximum SNMP OID Get Size is 10
Total[0.0088] DEBUG: Selective Debug Devices 3160,3159
Total[0.0090] DEBUG: Total Connections made 1
Total[0.0090] DEBUG: Creating Local Connection Pool of 13 threads.
Total[0.0090] DEBUG: Creating Local Connection 0.
Total[0.0092] DEBUG: Total Connections made 2
Total[0.0096] DEBUG: Creating Local Connection 1.
Total[0.0098] DEBUG: Total Connections made 3
Total[0.0101] DEBUG: Creating Local Connection 2.
Total[0.0103] DEBUG: Total Connections made 4
Total[0.0107] DEBUG: Creating Local Connection 3.
Total[0.0109] DEBUG: Total Connections made 5
Total[0.0113] DEBUG: Creating Local Connection 4.
Total[0.0115] DEBUG: Total Connections made 6
Total[0.0119] DEBUG: Creating Local Connection 5.
Total[0.0120] DEBUG: Total Connections made 7
Total[0.0124] DEBUG: Creating Local Connection 6.
Total[0.0126] DEBUG: Total Connections made 8
Total[0.0130] DEBUG: Creating Local Connection 7.
Total[0.0132] DEBUG: Total Connections made 9
Total[0.0136] DEBUG: Creating Local Connection 8.
Total[0.0138] DEBUG: Total Connections made 10
Total[0.0142] DEBUG: Creating Local Connection 9.
Total[0.0143] DEBUG: Total Connections made 11
Total[0.0147] DEBUG: Creating Local Connection 10.
Total[0.0149] DEBUG: Total Connections made 12
Total[0.0153] DEBUG: Creating Local Connection 11.
Total[0.0155] DEBUG: Total Connections made 13
Total[0.0159] DEBUG: Creating Local Connection 12.
Total[0.0161] DEBUG: Total Connections made 14
Total[0.0166] DEBUG: Version 1.2.18 starting
Total[0.0166] DEBUG: MySQL is Thread Safe!
Total[0.0166] DEBUG: Spine running as 0 UID, 0 EUID
Total[0.0167] DEBUG: Spine is running as root.
Total[0.0167] DEBUG: Spine has got ICMP
Total[0.0167] DEBUG: Initializing Net-SNMP API
Total[0.0167] DEBUG: Issues with SNMP Header Version information, assuming old version of Net-SNMP.
Total[0.0180] DEBUG: Initializing PHP Script Server(s)
Total[0.0180] DEBUG: SS[0] PHP Script Server Routine Starting
Total[0.0180] DEBUG: SS[0] PHP Script Server About to FORK Child Process
Total[0.0185] DEBUG: SS[0] PHP Script Server Child FORK Success
Total[0.1668] DEBUG: SS[0] Confirmed PHP Script Server running using readfd[20], writefd[19]
Total[0.1668] DEBUG: SS[1] PHP Script Server Routine Starting
Total[0.1668] DEBUG: SS[1] PHP Script Server About to FORK Child Process
Total[0.1670] DEBUG: SS[1] PHP Script Server Child FORK Success
Total[0.3097] DEBUG: SS[1] Confirmed PHP Script Server running using readfd[22], writefd[21]
Total[0.3097] DEBUG: SS[2] PHP Script Server Routine Starting
Total[0.3097] DEBUG: SS[2] PHP Script Server About to FORK Child Process
Total[0.3099] DEBUG: SS[2] PHP Script Server Child FORK Success
Total[0.4525] DEBUG: SS[2] Confirmed PHP Script Server running using readfd[24], writefd[23]
Total[0.4525] DEBUG: SS[3] PHP Script Server Routine Starting
Total[0.4525] DEBUG: SS[3] PHP Script Server About to FORK Child Process
Total[0.4527] DEBUG: SS[3] PHP Script Server Child FORK Success
Total[0.6048] DEBUG: SS[3] Confirmed PHP Script Server running using readfd[26], writefd[25]
Total[0.6048] DEBUG: SS[4] PHP Script Server Routine Starting
Total[0.6049] DEBUG: SS[4] PHP Script Server About to FORK Child Process
Total[0.6050] DEBUG: SS[4] PHP Script Server Child FORK Success
Total[0.7482] DEBUG: SS[4] Confirmed PHP Script Server running using readfd[28], writefd[27]
Total[0.7492] Spine will support multithread device polling.
Total[0.7497] DEBUG: Initial Value of Available Threads is 13 (0 outstanding)
Total[0.7499] DEBUG: Valid Thread to be Created
Total[0.7500] DEBUG: Available Threads is 12 (1 outstanding)
Total[0.7500] DEBUG: In Poller, About to Start Polling of Device for Device ID 0
Total[0.7500] DEBUG: Traversing Local Connection Pool for free connection.
Total[0.7500] DEBUG: Checking Local Pool ID 0.
Total[0.7500] DEBUG: Allocating Local Pool ID 0.
Total[0.7502] DEBUG: Valid Thread to be Created
Total[0.7502] DEBUG: Available Threads is 11 (2 outstanding)
Total[0.7502] WARNING: Spine Sleeping While Waiting for 2 Threads to End
Total[0.7502] DEBUG: In Poller, About to Start Polling of Device for Device ID 3160
Total[0.7503] Device[0] HT[1] Updating Poller Items for Next Poll
Total[0.7503] DEBUG: Traversing Local Connection Pool for free connection.
Total[0.7503] DEBUG: Checking Local Pool ID 0.
Total[0.7503] DEBUG: Checking Local Pool ID 1.
Total[0.7503] DEBUG: Allocating Local Pool ID 1.
Total[0.7506] Device[0] HT[1] Total Time: 0.00064 Seconds
Total[0.7508] get_namebyhost(10.1.128.10) - Allocating name_t
Total[0.7508] get_namebyhost(10.1.128.10) - Token #1
Total[0.7508] get_hostbyname(10.1.128.10) - No matching method for 11 chars: 10.1.128.10
Total[0.7508] get_namebyhost(10.1.128.10) - Setting hostname: 10.1.128.10
Total[0.7508] DEBUG: Freeing Local Pool ID 0
Total[0.7508] DEBUG: Device[0] HT[1] DEBUG: HOST COMPLETE: About to Exit Device Polling Thread Function
Total[0.7509] Device[3160] INFO: SNMP Device '10.1.128.10' has timeout 500000 (500), retries 3
Total[0.7770] Device[3160] IPv4 address 10.1.128.10 (10.1.128.10)

Total[0.7770] Device[3160] DEBUG: Entering ICMP Ping
Total[0.7771] WARNING: Spine Sleeping While Waiting for 1 Threads to End
Total[0.7778] WARNING: Spine Sleeping While Waiting for 1 Threads to End
Total[0.7778] WARNING: Spine Sleeping While Waiting for 1 Threads to End
Total[0.7778] WARNING: Spine Sleeping While Waiting for 1 Threads to End
Total[0.7779] Device[3160] DEBUG: Entering SNMP Ping
Total[0.7823] Device[3160] PING Result: ICMP: Destination address not specified
Total[0.7823] Device[3160] SNMP Result: Device responded to SNMP

Total[0.7851] Device[3160] HT[1] NOTE: There are '350' Polling Items for this Device
Total[0.7852] DEBUG: Setting up writes to local database
Total[0.7854] Device[3160] HT[1] Updating Poller Items for Next Poll
Total[0.7865] Device[3160] HT[1] Total Time: 0.036 Seconds
Total[0.7867] DEBUG: Freeing Local Pool ID 1
Total[0.7867] DEBUG: Device[3160] HT[1] DEBUG: HOST COMPLETE: About to Exit Device Polling Thread Function
Total[1.2779] The Final Value of Threads is 0
Total[1.2786] DEBUG: Closing Local Connection Pool ID 0
Total[1.2786] DEBUG: Closing Local Connection Pool ID 1
Total[1.2787] DEBUG: Closing Local Connection Pool ID 2
Total[1.2787] DEBUG: Closing Local Connection Pool ID 3
Total[1.2787] DEBUG: Closing Local Connection Pool ID 4
Total[1.2787] DEBUG: Closing Local Connection Pool ID 5
Total[1.2787] DEBUG: Closing Local Connection Pool ID 6
Total[1.2787] DEBUG: Closing Local Connection Pool ID 7
Total[1.2787] DEBUG: Closing Local Connection Pool ID 8
Total[1.2787] DEBUG: Closing Local Connection Pool ID 9
Total[1.2788] DEBUG: Closing Local Connection Pool ID 10
Total[1.2788] DEBUG: Closing Local Connection Pool ID 11
Total[1.2788] DEBUG: Closing Local Connection Pool ID 12
Total[1.2788] DEBUG: Thread Cleanup Complete
Total[1.2788] DEBUG: SS[0] Script Server Shutdown Started
Total[1.3289] DEBUG: SS[1] Script Server Shutdown Started
Total[1.3790] DEBUG: SS[2] Script Server Shutdown Started
Total[1.4291] DEBUG: SS[3] Script Server Shutdown Started
Total[1.4793] DEBUG: SS[4] Script Server Shutdown Started
Total[1.5294] DEBUG: PHP Script Server Pipes Closed
Total[1.5294] DEBUG: Allocated Variable Memory Freed
Total[1.5294] DEBUG: MYSQL Free & Close Completed
Total[1.5295] DEBUG: Net-SNMP Close Completed
Total[1.5295] Time: 1.2779 s, Threads: 13, Devices: 2

@TheWitness
Copy link
Member

Open a spine bug would you. Cacti is fixed.

@TheWitness
Copy link
Member

Let me take that back, spine uses a string. So, it might have something to do with the snmp library. In the mean time. Edit poller.c and make the modification as in the highlighted row below:

image

Then, make spine and run as follows:

./spine -R --mibs --first host_id1 --last host_id2
<snip>
NOTE: The SNMP Uptime was 8536518
NOTE: The SNMP Uptime was 8537397
NOTE: The SNMP Uptime was 8536553
NOTE: The SNMP Uptime was 318654799
NOTE: The SNMP Uptime was 318654799
Time: 2.8684 s, Threads: 4, Devices: 47

Which should show the output above. Let us know if the value is correct there.

@TheWitness
Copy link
Member

Continuing this discussion on the spine side.

@TheWitness
Copy link
Member

Okay, made a few more GUI changes so that when you edit the device, you also see the correct uptime. Also addressed cmd.php and reindexing there.

@arno-st
Copy link
Contributor Author

arno-st commented Oct 20, 2021

Just wondering where did you change the GUI

@TheWitness
Copy link
Member

TheWitness commented Oct 20, 2021

What do you mean? If you edit the device, it grabs uptime dynamically.

@arno-st
Copy link
Contributor Author

arno-st commented Oct 20, 2021

Ho ok, I where wondering if you add a field.
Because the uptime was visible on the snmp Information, and didn't check the device page.

TheWitness added a commit that referenced this issue Oct 23, 2021
This particular check was not caught.  Thanks @jdcoats
TheWitness added a commit that referenced this issue Oct 23, 2021
TheWitness added a commit that referenced this issue Oct 24, 2021
The wrong OID uptime is inserted into the poller_reindex table.
@github-actions github-actions bot locked and limited conversation to collaborators Jan 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Undesired behaviour resolved A fixed issue
Projects
None yet
Development

No branches or pull requests

3 participants