New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When syncing data collectors, a reindex event may be triggered unnecessarily #3931
Comments
I am trying this on cmd.php right now to see if the same behavior happens |
Hmm same thing with cmd.php |
I also found the following
Its seems spine is doing alot with this OID almost like a mini loop |
Disable data collector sync. We've found that this causes the issue. |
Might be something else too. |
Just tried to disable replication but no change
I am not sure if this is something with the device or Cacti at this point
but as I say that I only see this happening on a hand full of the devices I
have 400 of them in total but 40 of them are doing this
…On Sat, Oct 10, 2020 at 6:41 AM TheWitness ***@***.***> wrote:
Might be something else too.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<https://github.com/Cacti/spine/issues/173#issuecomment-706528464>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADGEXTDBZ7QF572ESBF5JFLSKA26ZANCNFSM4SKOTC4A>
.
--
Thank you
Sean Mancini,LBBIT
Owner/Principal Engineer
www.seanmancini.com
“Companies spend millions of dollars on firewalls, encryption, and secure
access devices, and it’s money wasted because none of these measures
address the weakest link in the security chain.”
*– Kevin Mitnick*
|
So deeper look it turns out its not just a handful of devices its all of this specific device but the error remains the same unable to fetch .1.3.6.1.2.1.1.3.0' snmpwalk still works fine on the same device fir that oid |
Did this happen to happen when we switched from DST to ST? |
Nope: 2020: Sunday, March 8 and Sunday, November 1 |
@bmfmancini, someone is messing with the clocks on those devices. That's the reason. |
That's weird it's happening at every poll would that mean the poller is
seeing the time change every time ?
…On Fri, Nov 20, 2020, 19:34 TheWitness ***@***.***> wrote:
Closed Cacti/spine#173 <https://github.com/Cacti/spine/issues/173>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/Cacti/spine/issues/173#event-4023252563>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADGEXTEYPACRBOGEBR7KJPDSQ4DK3ANCNFSM4SKOTC4A>
.
|
What is the re-index method? |
It's set to uptime
…On Fri, Nov 20, 2020, 22:24 TheWitness ***@***.***> wrote:
What is the re-index method?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/Cacti/spine/issues/173#issuecomment-731500193>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADGEXTCMBZ4B7UH3DRRSWVDSQ4XGRANCNFSM4SKOTC4A>
.
|
Okay, so each polling cycle, Cacti takes the value from poller_reindex, the "assert_value" and compares it against the "arg1" snmpget of that value, and if the operator is wrong, then it causes that error. From my system below. Tinker with calling snmpget with the oid and what is stored in that table. These are also remote devices right? |
This could be an artifact from a recent change BTW. |
Okay, there is a problem. Lucky day! |
Should have the solution shortly. |
Turns out this is a Cacti issue. Committing in a bit. |
- Recache due to failed to get OID but SNMPWALK works - Certain Device actions cause the removal of poller items from the remote data collector
Test ASAP. This will force us to move ahead the release. |
Thanks Larry I should be able to test this on monday if I get a chance
earlier is I'll report back
…On Sat, Nov 21, 2020, 00:38 TheWitness ***@***.***> wrote:
Test ASAP. This will force us to move ahead the release.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3931 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADGEXTHQ2MDKRXKNCMUR6ODSQ5G4FANCNFSM4T5SEDDQ>
.
|
Sooner the better, I don't want this bug hanging out there for too long. |
I tested this morning I grabbed the files off the 1.2.x branch same behavior is seen |
Should I try to rebuild the poller cache? |
Yes. |
Ok just rebuilt the cache will report back soon |
@TheWitness No change on my side |
Are your PHP file replicating to the remotes? |
I will confirm for sure but I am pretty sure they did
…On Wed, Nov 25, 2020, 09:12 TheWitness ***@***.***> wrote:
Are your PHP file replicating to the remotes?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3931 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADGEXTGDX5RDCV66RP7ESIDSRUGDFANCNFSM4T5SEDDQ>
.
|
It was buggered up on my system again. Research tonight. |
Ok cool thanks !
…On Wed, Nov 25, 2020, 14:00 TheWitness ***@***.***> wrote:
It was buggered up on my system again. Research tonight.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3931 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADGEXTEV3IJBMJIF5IXEQ4LSRVH4RANCNFSM4T5SEDDQ>
.
|
Okay, take the latest lib/poller.php and do a full sync to your pollers. This should get it fixed. |
Sorry man no Dice still coming in |
So, the re-index warnings happen after replication still then? I want to ensure that we are not mixing issues. The poller cache evaporating vs. reindex errors. |
recache OID warnings still come even after replication |
Yea, just about to make an update. |
* forgot to handle the poller_reindex cache
Okay, should be fixed now. |
Ok testing now |
Bump! |
Sorry I thought I replied
Sorry man I'm still seeing the same thing
…On Fri, Nov 27, 2020, 10:06 TheWitness ***@***.***> wrote:
Bump!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3931 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADGEXTCAVUHULORQBR7EOODSR6555ANCNFSM4T5SEDDQ>
.
|
Okay, take this watch and re-write it for your system, and then run it. While it's running note the "assert" values. Then, do a full sync to the remote, the value on the remote should go back. watch 'echo "Main Hosts";mysql -e "select * from poller_reindex where host_id=13" cacti;echo "Remote Host";mysql -ucactiuser -pcactiuser -hvmhost1 -e "select * from poller_reindex where host_id=13" cacti'; Output should look something like:
|
You should notice that the Main collectors time should be back from before the last sync, and it should not be updated afterwards. The only person writing that assert_value will be the remote data collector unless you have another collector pushing data into your production system, which would be a setup error I think. For that, it might be good to log the connection attempts and where they are coming from. |
If the later is the case, you might want to consider a more strict ACL on who can connect to what databases, if you don't have that already. I don't want to jump to any conclusion here though. |
I am trying this now the thing is I have devices doing this that are on the Main poller |
That's very odd then. Might even be some device issue. |
@bmfmancini, I'm marking this one closed as it was a real problem for remote data collectors, and that issue is resolved. As far as main data collector based devices, I suggest a zone collision or some hardware issue. |
Hey All
I am having a weird issue and I see a descrpency between spine and netsnmp
I have some new devices in my lab around 400 of them I recently started noticing that a handful of them seem to always be set for recache
Digging furthure I see that the recache has been triggered because of .1.3.6.1.2.1.1.3.0 this is a wireless modem and the OID puts out the uptime of the wireless connection and not the modem system uptime for what ever reason
Here is the log
I also started seeing the following in the error log
When I do a poll direct from spine I get the following
./spine -R -H 329 | more
2020-10-09 15:29:33 - SPINE: Poller[1] PID[18512] Device[329] WARNING: snmp_pdu_create(.1.3.6.1.2.1.1.3.0)
2020-10-09 15:29:33 - SPINE: Poller[1] PID[18512] Device[329] WARNING: snmp_pdu_create(.1.3.6.1.2.1.1.3.0) [complete]
2020-10-09 15:29:33 - SPINE: Poller[1] PID[18512] Device[329] WARNING: snmp_parse_oid(.1.3.6.1.2.1.1.3.0)
2020-10-09 15:29:33 - SPINE: Poller[1] PID[18512] Device[329] WARNING: snmp_parse_oid(.1.3.6.1.2.1.1.3.0) [complete]
2020-10-09 15:29:33 - SPINE: Poller[1] PID[18512] Device[329] WARNING: snmp_add_null_var(.1.3.6.1.2.1.1.3.0)
2020-10-09 15:29:33 - SPINE: Poller[1] PID[18512] Device[329] WARNING: snmp_add_null_var(.1.3.6.1.2.1.1.3.0) [complete]
2020-10-09 15:29:33 - SPINE: Poller[1] PID[18512] Device[329] WARNING: snmp_sess_sync_response(.1.3.6.1.2.1.1.3.0)
2020-10-09 15:29:35 - SPINE: Poller[1] PID[18512] Device[329] WARNING: snmp_sess_sync_response(.1.3.6.1.2.1.1.3.0) [complete]
2020-10-09 15:29:35 - SPINE: Poller[1] PID[18512] ERROR: Failed to get oid '.1.3.6.1.2.1.1.3.0' for Device[329]
But if I do a snmpwalk on that OID it is responding fine
Spine v1.2.12
Cacti V1.2.12
NET-SNMP version: 5.7.2
The text was updated successfully, but these errors were encountered: