Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When boost is running, graphs can appear broken #4941

Closed
bmfmancini opened this issue Oct 3, 2022 · 25 comments
Closed

When boost is running, graphs can appear broken #4941

bmfmancini opened this issue Oct 3, 2022 · 25 comments
Labels
bug Undesired behaviour confirmed Bug is confirm by dev team resolved A fixed issue
Milestone

Comments

@bmfmancini
Copy link
Member

Hey All,

I have found that on 1.2.22 in between boost runs some devices show gaps in the graph
I traced the data from spine all the way to poller_output_boost

the data makes it way correctly until it hits the RRA write it seems that for some the data is not written
The data is removed out of poller_output_boost after each run so its not stuck in there

No relevant errors in log
For the affected devices I have rebuilt the poller cache no difference

@bmfmancini bmfmancini added bug Undesired behaviour unverified Some days we don't have a clue labels Oct 3, 2022
@bmfmancini
Copy link
Member Author

FOund this when running boost manually

php poller_boost.php --verbose --debug --force >> /tmp/boost.txt

2022/10/03 14:33:40 - CMDPHP SQL Backtrace: (/poller_boost.php[222]:boost_output_rrd_data(), /poller_boost.php[588]:boost_process_local_data_ids(), /poller_boost.php[688]:db_fetch_assoc(), /lib/database.php[593]:db_fetch_assoc_prepared(), /lib/database.php[613]:db_execute_prepared())
2022/10/03 14:33:40 - CMDPHP ERROR: A DB Row Failed!, Error: Table 'cacti.poller_output_boost_arch_1664822004' doesn't exist
2022/10/03 14:33:40 - BOOST CHILD DEBUG: Processing 128 of 130 for Boost Process 1
2022/10/03 14:33:40 - CMDPHP SQL Backtrace: (/poller_boost.php[222]:boost_output_rrd_data(), /poller_boost.php[588]:boost_process_local_data_ids(), /poller_boost.php[688]:db_fetch_assoc(), /lib/database.php[593]:db_fetch_assoc_prepared(), /lib/database.php[613]:db_execute_prepared())
2022/10/03 14:33:40 - CMDPHP ERROR: A DB Row Failed!, Error: Table 'cacti.poller_output_boost_arch_1664822004' doesn't exist
2022/10/03 14:33:40 - BOOST CHILD DEBUG: Processing 127 of 130 for Boost Process 1
2022/10/03 14:33:40 - CMDPHP SQL Backtrace: (/poller_boost.php[222]:boost_output_rrd_data(), /poller_boost.php[588]:boost_process_local_data_ids(), /poller_boost.php[688]:db_fetch_assoc(), /lib/database.php[593]:db_fetch_assoc_prepared(), /lib/database.php[613]:db_execute_prepared())
2022/10/03 14:33:40 - CMDPHP ERROR: A DB Row Failed!, Error: Table 'cacti.poller_output_boost_arch_1664822004' doesn't exist
2022/10/03 14:33:40 - BOOST CHILD DEBUG: Processing 126 of 130 for Boost Process 1
2022/10/03 14:33:40 - CMDPHP SQL Backtrace: (/poller_boost.php[222]:boost_output_rrd_data(), /poller_boost.php[588]:boost_process_local_data_ids(), /poller_boost.php[688]:db_fetch_assoc(), /lib/database.php[593]:db_fetch_assoc_prepared(), /lib/database.php[613]:db_execute_prepared())
2022/10/03 14:33:40 - CMDPHP ERROR: A DB Row Failed!, Error: Table 'cacti.poller_output_boost_arch_1664822004' doesn't exist
2022/10/03 14:33:40 - BOOST CHILD DEBUG: Processing 125 of 130 for Boost Process 1

@bmfmancini
Copy link
Member Author

Output from debug file

DEBUG: Checking if Boost is ready to run.
DEBUG: Last Runtime was 2022-10-03 14:08:27 (1664820507).
DEBUG: Next Runtime is 2022-10-03 15:08:27 (1664824107).
DEBUG: Records Found:6717232, Max Threshold:7000000.
DEBUG: Time to Run Boost, Force Run is true!
DEBUG: Parallel Process Setup Begins.
DEBUG: Data Sources:89253, Concurrent Processes:1
DEBUG: Parallel Process Setup Complete.  Ready to spawn children.
DEBUG: About to launch 1 processes.
DEBUG: Launching Boost Process Number 1
Total[1.4670] DEBUG: About to Spawn a Remote Process [CMD: /bin/php, ARGS: /var/www/html/cacti/poller_boost.php --child=1 --debug]
DEBUG: 1 Processes Running, Sleeping for 2 seconds.

@bmfmancini
Copy link
Member Author

Boost tables are clean according to audit_database

bash-4.2$ php audit_database.php --report | grep boost
Checking Table: 'poller_output_boost' - Clean
Checking Table: 'poller_output_boost_local_data_ids' - Clean
Checking Table: 'poller_output_boost_processes' - Clean
bash-4.2$

@bmfmancini
Copy link
Member Author

bmfmancini commented Oct 3, 2022

OK, so when I ran boost the first time manually I think I collided with cacti running it?
rerunning it manually seems fine but the strange thing is the first time I ran it the boost table went empty

@TheWitness TheWitness changed the title [1.2.22] - When Boost runs gaps in graphs appear for some devices Viewing graphs can break when boost is running in some rare cases Oct 5, 2022
TheWitness added a commit that referenced this issue Oct 5, 2022
Viewing graphs can break when boost is running in some rare cases
@TheWitness TheWitness added confirmed Bug is confirm by dev team resolved A fixed issue and removed unverified Some days we don't have a clue labels Oct 5, 2022
@TheWitness TheWitness added this to the v1.2.23 milestone Oct 5, 2022
@TheWitness
Copy link
Member

Test now @bmfmancini

@bmfmancini
Copy link
Member Author

So far so good @TheWitness will let it soak for a bit and let you know

@bmfmancini
Copy link
Member Author

@TheWitness unfortunately still seeing gaps in plotting

@bmfmancini
Copy link
Member Author

confirmed its only after a graph has been viewed and boost runs afterwards

@TheWitness
Copy link
Member

Any errors in the log?

@bmfmancini
Copy link
Member Author

bmfmancini commented Oct 7, 2022 via email

@TheWitness
Copy link
Member

Well, that's good. Now you have to find the real reason. How many poller items for the device in question?

@bmfmancini
Copy link
Member Author

bmfmancini commented Oct 8, 2022 via email

@TheWitness
Copy link
Member

You need to very specific. If there is more than one device, give me a count for each.

@bmfmancini
Copy link
Member Author

bmfmancini commented Oct 8, 2022 via email

@TheWitness
Copy link
Member

What RRDtool version?

@TheWitness
Copy link
Member

I have another theory...

@TheWitness
Copy link
Member

However, you need to answer the poller items question for a few of the cases.

@bmfmancini
Copy link
Member Author

bmfmancini commented Oct 10, 2022 via email

@TheWitness
Copy link
Member

Upgrade to 1.8

@bmfmancini
Copy link
Member Author

Ok updated to rrdtool 1.8

RRDtool 1.8.0  Copyright by Tobias Oetiker <tobi@oetiker.ch>
               Compiled Oct 11 2022 11:19:31

Gaps are still being seen after viewing a graph the data for that time period is removed from the poller_output_boost table
sometimes the rra is updated without problem but others the graph will show a large gap

while checking for data the poller_output_boost table will have entries in it for that data source and they will disappear from the table while the graph still shows a gap however on the next boost run the graph will start to plot again but only with the data that populated in the table since its been viewed

Here are my steps

1.) View poller_output_boost table

MariaDB [cacti]> select * from poller_output_boost where local_data_id = '67278' \G
*************************** 1. row ***************************
local_data_id: 67278
     rrd_name: discards_in
         time: 2022-10-11 13:18:02
       output: 0
*************************** 2. row ***************************
local_data_id: 67278
     rrd_name: discards_out
         time: 2022-10-11 13:18:02
       output: 0
*************************** 3. row ***************************
local_data_id: 67278
     rrd_name: errors_in
         time: 2022-10-11 13:18:02
       output: 0
*************************** 4. row ***************************
local_data_id: 67278
     rrd_name: errors_out
         time: 2022-10-11 13:18:02
       output: 0
4 rows in set (0.001 sec)

2.) View the graph

image

3.) Check poller_output_boost table entries will be removed for the timespan you are viewing except for new polled data

MariaDB [cacti]> select * from poller_output where local_data_id = '67278';
Empty set (0.000 sec)

MariaDB [cacti]> select * from poller_output where local_data_id = '67278';
Empty set (0.000 sec)

MariaDB [cacti]> select * from poller_output_boost  where local_data_id = '67278';
Empty set (0.000 sec)

MariaDB [cacti]> select * from poller_output_boost  where local_data_id = '67278';
+---------------+--------------+---------------------+--------+
| local_data_id | rrd_name     | time                | output |
+---------------+--------------+---------------------+--------+
|         67278 | discards_in  | 2022-10-11 13:23:02 | 0      |
|         67278 | discards_out | 2022-10-11 13:23:02 | 0      |
|         67278 | errors_in    | 2022-10-11 13:23:02 | 0      |
|         67278 | errors_out   | 2022-10-11 13:23:02 | 0      |
+---------------+--------------+---------------------+--------+
4 rows in set (0.001 sec)

Graph will still show the gap until boost run but only the newly polled data will make it to the rra
the other data will be lost

image

@bmfmancini
Copy link
Member Author

oops forgot to add the poller count for this example device is 12

@bmfmancini
Copy link
Member Author

@TheWitness is the above info all what you were looking for ?

TheWitness added a commit that referenced this issue Oct 16, 2022
This change simplifies the function to handle only a single local data id since the mass boost update uses it's own function.
@TheWitness
Copy link
Member

Okay, bug was confirmed and this is resolved now.

@TheWitness
Copy link
Member

This is still broken when boost is running. Looking to get a fix together.

TheWitness added a commit that referenced this issue Oct 17, 2022
The previous sort algo was not sorting properly.  Replacing with one that is known good.
@TheWitness
Copy link
Member

@bmfmancini I'm going to mark this resolved. If after updating tomorrow to the latest in test, you find issues, we can re-open.

TheWitness added a commit that referenced this issue Oct 18, 2022
So, I had to back out this version of lib/boost.php today due to some sort a new failure, but I have been unable to reproduce in my lab.  Still looking into it.  This is a minor change that I'm making as a part of my bug hunt.
TheWitness added a commit that referenced this issue Oct 19, 2022
Okay, working now.   Type and no log.
@netniV netniV changed the title Viewing graphs can break when boost is running in some rare cases When boost is running, graphs can appear broken Dec 31, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Apr 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Undesired behaviour confirmed Bug is confirm by dev team resolved A fixed issue
Projects
None yet
Development

No branches or pull requests

2 participants