New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infinite recursion is possible during a database failure #5490
Comments
What MySQL/MariaDB version version are you using on the backend? |
Well, it really doesn't matter, as it should be pretty obvious from the diagnosis above: who cares what the version of software is of a server that is unreachable or not actually running. However, since you asked, I repro'd against MariaDB-server-10.5.15-1.el8.x86_64 (probably the CentOS version thereof.) |
Yea, I understand, but there have been some recent changes as well. This is a known issue for some time. Can you generate a pull request? |
when database connection is lost.
PR raised.
It is? Because I did look and couldn't find any issue that sounded like it... |
Thanks! |
when database connection is lost.
Database failure causes infinite recursion in cacti_log() followed by machine lockup
Closing this now that I'm satisfied. Thanks fro the help! |
Using the EPEL cacti-1.2.23-1.el8.noarch.rpm (though I can't find an Issue for it here nor see any recent changes that would prevent it in the develop branch) we've come across an issue...
Under some circumstances we find that the poller_automation.php processes go into a tight loop, their memory usage skyrockets allocating all available RAM, then all available swap, until they bring the whole machine to its knees.
After some investigation it appears this happens if the connection to the mysql server is lost: whether it crashes or is stopped. In fact you can easily repro this by su to the appropriate user, running "php /usr/share/cacti/poller_automation.php -M --force", waiting for a bit (10-20 seconds is usually enough), then doing a "systemctl stop mariadb".
stracing the poller_automation processes shows that they attempt to write their next SQL query (usually one of the "SELECT id, snmp_version ..." ones), get an EPIPE because the remote end has closed, close the actual socket file descriptor, then go into a loop on mmap for 2MB chunks (which is the behaviour of malloc when leaking in a tight loop) until death (machine or process).
That doesn't really tell us much, so sprinkling a few "echo" statements around the area shows what's really going on:
So, within db_execute_prepared() we try to execute() the SQL statement which fails (and internally the datbase handle which knows it's unrecoverable closes its file descriptor). That failure is detected and cacti tries to log it - in fact for this particular error two calls to cacti_log() are made, the first reports the statement, the second reports the error. The behaviour of cacti_log() is modified by certain configuration options so it tries to read those, but config options are stored in the database so for some of those at least it makes a call back into the database layer to fetch them. The handle has already been closed so the failure is immediately reported back to cacti, which tries to log that failure... we are now in an infinite recursion which chews up RAM for the PHP stackframes and local variables until none is left. If there's no additional I/O to do (as for the original version without my additional "echo" statements and with things like DEBUG_SQL_CMD turned off) that actually happens pretty fast so the machine becomes responsive soon after.
So:
The text was updated successfully, but these errors were encountered: