-
-
Notifications
You must be signed in to change notification settings - Fork 427
Description
Describe the bug
We have a JunOS query which should return the SPU utilization (you can check the file, i had given you and Sean the whole bunch of our queries and scripts - it is: resource/snmp_queries/juniper_spu_all.xml).
We have several cases, where this query is failing for some reason on newly integrated devices, and i get a suspicion that it might have to do with the device having only one single SPU installed. When spine is doing the re-caching checks, it somehow fails to query the only sub-OID ".0" ("0" is the only object-index in the SNMP tree - see manual query below).
I have tested multiple spine versions and it started to completely fail in version 1.2.22 and we now use 1.2.27
Up to 1.2.21 we still get the re-cache assert fail but the device is not completely dropped out.
Starting with 1.2.22 the whole device is marked with the "ignore" flag and polling stops for all other data sources as well.
In our case we use the index-OID ".1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3" which is also marked as "walk below in the fields section (does spine even check this? Shouldn't it use "walk" by default when checking an index?). And we use the OID parser to determine the index from the OID itself, even if the manual suggests that cacti can determine this on its own.
Here's the output of a snmpwalk on the tree .1.3.6.1.4.1.2636.3.39.1.12.1.1.1
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.2.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.4.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.5.0 = Gauge32: 43
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.6.0 = Gauge32: 50
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.7.0 = Gauge32: 62914560
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.8.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.9.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.10.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.11.0 = STRING: "single"
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.12.0 = Gauge32: 50
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.13.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.14.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.15.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.16.0 = Gauge32: 24
Remember, "0" is the only instance - so it might look to cacti as if this is not a table, but a bunch of single objects.
And here is what cacti does once it reaches the "SPU" query ("interfaces" is re-cached perfectly fine!):
1727694394.408670 Total[0.4960] Device[5224] DEBUG: snmp_pdu_create(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3)
1727694394.408704 Total[0.4960] Device[5224] DEBUG: snmp_pdu_create(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3) [complete]
1727694394.408738 Total[0.4960] Device[5224] DEBUG: snmp_parse_oid(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3)
1727694394.408771 Total[0.4960] Device[5224] DEBUG: snmp_parse_oid(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3) [complete]
1727694394.408805 Total[0.4960] Device[5224] DEBUG: snmp_add_null_var(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3)
1727694394.408838 Total[0.4960] Device[5224] DEBUG: snmp_add_null_var(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3) [complete]
1727694394.408872 Total[0.4960] Device[5224] DEBUG: snmp_sess_sync_response(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3)
1727694394.408905 Total[0.4966] Device[5224] DEBUG: snmp_sess_sync_response(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3) [complete]
**1727694394.408939 Total[0.4966] ERROR: No such Instance for oid '1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3' for Device[5224] with Status[1]**
**1727694394.408972 Total[0.4966] Device[5224] HT[1] DQ[8] RECACHE OID: 1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3, (assert: 0 = output: U)**
**1727694394.409006 Total[0.4966] Device[5224] HT[1] DQ[8] RECACHE ASSERT FAILED: '0=U'**
1727694394.409040 Total[0.4966] WARNING: Skipped oid '.1.3.6.1.4.1.2636.3.39.1.12.1.4.1.1' for Device[5224] as host ignore flag is active
1727694394.409073 Total[0.4966] Device[5224] HT[1] DQ[9] RECACHE OID: .1.3.6.1.4.1.2636.3.39.1.12.1.4.1.1, (assert: 0 = output: (null))
1727694394.409106 Total[0.4966] Device[5224] HT[1] DQ[9] RECACHE ASSERT FAILED: '0=(null)'
Expected behavior
- A subtree should be parsable even if the objects in the tree are ending in .0
- spine should not mark the whole device failing if one query fails, it should continue with the queries that can be run
Server (please complete the following information):
- OS: RHEL 7.9
- Version 1.2.27
Compiling (please complete the following information):
- compiler: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)
- autoconf: GNU Autoconf 2.69.
- glibc: glibc-2.17-260.el7_6.3
- source: release 1.2.27 (starts from 1.2.22)
Additional context
Logs can be provided on demand.