Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When many hosts go offline, Recache Event can be constantly logged #4892

Closed
bmfmancini opened this issue Aug 17, 2022 · 1 comment
Closed
Labels
bug Undesired behaviour confirmed Bug is confirm by dev team resolved A fixed issue
Milestone

Comments

@bmfmancini
Copy link
Member

I have found that when you have multiple devices go offline say after a network hit that a recache loop gets triggered

2022-08-17 15:38:00 - PCOMMAND Device[1105] NOTE: Recache Event Detected for Device
2022-08-17 15:37:50 - PCOMMAND Device[1102] NOTE: Recache Event Detected for Device
2022-08-17 15:37:42 - PCOMMAND Device[1100] NOTE: Recache Event Detected for Device
2022-08-17 15:37:34 - PCOMMAND Device[1099] NOTE: Recache Event Detected for Device
2022-08-17 15:37:26 - PCOMMAND Device[1097] NOTE: Recache Event Detected for Device
2022-08-17 15:37:19 - PCOMMAND Device[1096] NOTE: Recache Event Detected for Device
2022-08-17 15:37:11 - PCOMMAND Device[1089] NOTE: Recache Event Detected for Device
2022-08-17 15:37:03 - PCOMMAND Device[1087] NOTE: Recache Event Detected for Device

in this case these devices have a large amount of OID's being pulled so they take time to respond
the poller_commands.php script times out even after 20 minutes due to how long it take to run through each device

However I notice that the poller_commands table does not update even after your reindex the devices manually
I also switched all of the reindex methods to none and still the poller_commands script runs

Example device 1576

2022-08-17 15:26:38 - PCOMMAND Device[1576] NOTE: Recache Event Detected for Device
2022-08-17 14:46:50 - PCOMMAND Device[1576] NOTE: Recache Event Detected for Device

manually re-indexing the device completes without error however the poller_command table still has an entry for that device

MariaDB [cacti]> select * from poller_command where command like '%1576%'  \G
*************************** 1. row ***************************
   poller_id: 1
        time: 2022-07-20 20:40:02
      action: 1
     command: 1576:10
last_updated: 2022-07-20 20:40:02
*************************** 2. row ***************************
   poller_id: 1
        time: 2022-07-20 20:40:02
      action: 1
     command: 1576:4
last_updated: 2022-07-20 20:40:02
2 rows in set (0.001 sec)

reindex method is set to none for this device yet poller_commands still has it set for recache

MariaDB [cacti]> select * from host_snmp_query where host_id = 1576;
+---------+---------------+------------+-----------------+----------------+
| host_id | snmp_query_id | sort_field | title_format    | reindex_method |
+---------+---------------+------------+-----------------+----------------+
|    1576 |             4 | ifIndex    | |query_ifIndex| |              0 |
|    1576 |            10 | t1         | |query_t1|      |              0 |
+---------+---------------+------------+-----------------+----------------+
2 rows in set (0.000 sec)

Spine reports the device has nothing for recache

2022-08-17 15:50:19 - SPINE: Poller[1] PID[3914359] PT[140412896401152] Device[1576] HT[1] Device has no information for recache.
2022-08-17 15:50:19 - SPINE: Poller[1] PID[3914359] PT[140412896401152] Device[1576] HT[1] NOTE: There are '78' Polling Items for this Device
@bmfmancini bmfmancini added bug Undesired behaviour unverified Some days we don't have a clue labels Aug 17, 2022
@bmfmancini
Copy link
Member Author

Workaround for this

I truncated the poller_command table make sure poller_commands is not running
if it is killing the process or you may still see recache events

@TheWitness TheWitness changed the title [1.2.20] Recache Loop after many devices go offline Recache Loop after many devices go offline Aug 18, 2022
TheWitness added a commit that referenced this issue Aug 18, 2022
Issue #3131 has been available through the Edit Device interface for some time.  So, marking that complete.  The other two address first a bug in 1.2.22 with the removal of re-index records from the poller_command table, and introduces parallel processing to the whole re-index game.
@TheWitness TheWitness added this to the v1.2.23 milestone Aug 19, 2022
@TheWitness TheWitness added resolved A fixed issue confirmed Bug is confirm by dev team and removed unverified Some days we don't have a clue labels Aug 19, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Nov 28, 2022
@netniV netniV changed the title Recache Loop after many devices go offline When many hosts go offline, Recache Event can be constantly logged Dec 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Undesired behaviour confirmed Bug is confirm by dev team resolved A fixed issue
Projects
None yet
Development

No branches or pull requests

2 participants