New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spine Memory map errors show up randomly #154
Comments
Ok....Ive taced back what was done before this happend There was a modifcation done to a threshold.... then shortly after the above backtrace happend 2020/04/08 14:59:51 - CMDPHP PHP ERROR Backtrace: (CactiShutdownHandler()) Followed by 2020/04/08 15:04:13 - CMDPHP PHP ERROR Backtrace: (CactiShutdownHandler()) I did report in Thold that a couple of weeks ago when editing a THOLD things went nuts I since upgraded to 1.4 of thold and that didnt happen perhaps there is something lurking either in their or a spine condition .... |
THold report |
To me, and I could be wrong here, the double free is occurring within the MySQL client library itself. Have you updated MySQL recently? |
No changes to Mysql These are the library versions that are installed rpm -qa | grep -i mariadb I cant easily reproduce this the only thing I caught was that last time Thold went bonkers and shutdown and that happened before as well but im not sure if its the cause or a side effect of spine crashing or having an issue causing the timeouts for thold |
Oh btw I found this is my messge log [604396.693209] spine[18286]: segfault at 0 ip (null) sp 00007f277f7d8608 error 14 in spine[400000+1c000] |
I caught this today 2020/04/15 10:59:17 - SPINE: Poller[Main Poller] PID[23607] FATAL: Spine Encountered a Segmentation Fault [9, Bad file descriptor] (Spine thread) It happens during replication of the remote pollers |
Capture a core file and then load it with gdb and do a backtrace for us. We need to know where it's failing. |
Is this case still occurring @bmfmancini ? |
@TheWitness Just saw this same type of behavior on 1.2.12 ======= Memory map: ======== not seeing any actual impact during the collection process it So I am thinking these errors must be some verbose dummy errors when spine times out ???? |
You are likely running out of connections.
|
Connections seem fine MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'max_connections'; MariaDB [(none)]> SHOW GLOBAL STATUS LIKE 'max_used_connections'; |
Could be shutdown handler. Look for poller runtime very close or above the poller interval in the cacti.log for that time range. Within 2 seconds. |
And bingo that it the few times this has happened when I search the log its when the poller timed out |
So it sounds like Spine needs a better handling of the loss of connections by cascading that up |
Just did a commit. You should test with it. Check the CHANGELOG. We've had other customers reporting connections issues lately. |
Bad file descriptor? Hmm. |
Try this in /etc/security/limits.d [root@vmhost3 limits.d]# cat 21-cacti.conf
* soft nofile 65535 |
Update to the latest 1.2.x branch and report back @bmfmancini. |
Will do give me a few days |
Sean, please update to the latest 1.2.x. I found an issue in the re-index logic that was causing segfaults. I think this explains the somewhat random nature of the crashes. |
Related to #193 |
Hello Everyone
I have spine 1.2.9 on our prod system and I notice that sometimes the following errors come up
======= Memory map: ========
/lib64/libc.so.6(clone+0x6d)[0x7f71426eb02d]
/lib64/libpthread.so.0(+0x7dd5)[0x7f7142ec7dd5]
/usr/local/spine/bin/spine[0x40f29c]
/usr/local/spine/bin/spine[0x40cf2d]
/lib64/libmysqlclient.so.18(mysql_close+0x1a)[0x7f7143782a4a]
/lib64/libmysqlclient.so.18(+0x2da14)[0x7f7143782a14]
/lib64/libmysqlclient.so.18(+0x30606)[0x7f7143785606]
/lib64/libmysqlclient.so.18(+0x2aab5)[0x7f714377fab5]
/lib64/libmysqlclient.so.18(+0x5a2f7)[0x7f71437af2f7]
/lib64/libc.so.6(+0x81609)[0x7f714266e609]
======= Backtrace: =========
*** Error in `/usr/local/spine/bin/spine': double free or corruption (out): 0x00007f712400bc70 ***
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
7ffe16cb7000-7ffe16cb9000 r-xp 00000000 00:00 0 [vdso]
7ffe16b80000-7ffe16c81000 rw-p 00000000 00:00 0 [stack]
7fad3dd04000-7fad3dd05000 rw-p 00000000 00:00 0
Followed by spine having a hard time polling then eventually the errors stop with no intervention and polling continues as normal
We recently moved from 1.2.7 to 1.2.9 and in the last 3 days this is the 2nd time this has happened
No memory issues happening on the host and you will also see this on the remote collectors as well
Please let me know if this is a spine level known issue or if this could be something else
Thanks !
The text was updated successfully, but these errors were encountered: