getting a crash of server with segmentation fault on SLES 11 64 SP2 #3233
Last updated: 2013-03-07 12:41:22 +0100
Date: 2013-02-12 19:45:54 +0100
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17
We got below when a monet db compiled against sles 10 SP4 was run on SLES 11 SP2. Any suggestion filing as a bug as don't have access to my registered email any poionter wil help!
cat /var/log/messages | grep mserver Feb 12 14:54:17 blrec3vm6 kernel: mserver5: segfault at 2000 ip 00007fa105edb4d5 sp 00007fa0ffffee30 error 6 in lib_sql.so[7fa105dd8000+140000] Feb 12 15:03:03 blrec3vm6 kernel: mserver5: segfault at 28 ip 00007fc0eb916e95 sp 00007fc0e78d4f20 error 6 in libmonetdb5.so.13[7fc0eb467000+63b000]
Date: 2013-02-13 17:26:43 +0100
This I got with latest candidate build also, any suggestions when this can happen?
Feb 13 14:16:36 blrec3vm30 kernel: mserver5: segfault at 28 ip 00007f712a08de95 sp 00007f71260cbf20 error 6 in libmonetdb5.so.13[7f7129bde000+63b000]
Date: 2013-02-13 17:40:24 +0100
No, for sensible information you should use compilation with
--enable-assert --enable-debug --disable-optimize --enable-strict
Followed by a backtrace of the threads, e.g.
thr apply all where
Date: 2013-02-14 10:00:03 +0100
When I was comiling with these options as suggested by you it is giving me error below any pointers please ?
/bin/sh ../libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I.. -I. -I../common/options -I./../common/options -I../common/stream -I./../common/stream -I../common/utils -I./../common/utils -DLIBGDK -g -Werror -Wall -Wextra -W -Werror-implicit-function-declaration -Wpointer-arith -Wdeclaration-after-statement -Wundef -Wformat=2 -Wno-format-nonliteral -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wpacked -Wunknown-pragmas -Wvariadic-macros -fstack-protector-all -Wstack-protector -Wmissing-include-dirs -D_REENTRANT -c -o libbat_la-gdk_utils.lo
Date: 2013-02-14 10:19:04 +0100
Date: 2013-02-14 10:21:01 +0100
Moreover, with out knowing at what your doing (e.g., which kind of workload you run on what kind of data) that does/might trigger the segfault, we cannot say much about why they happen ...
Date: 2013-02-14 10:33:31 +0100
removing enable strict did the trick.
Date: 2013-02-18 14:03:17 +0100
I have created core dump files with backtrack which shall be helpful for you.
Core was generated by `/usr/local/bin/mserver5 --dbpath=/opt/ashishtest/testfarm/pbsworksdb --set mero'.
Thread 9 (Thread 0x2b73390a7f90 (LWP 7635)):
Thread 8 (Thread 7636):
Thread 7 (Thread 7637):
Thread 6 (Thread 7638):
Thread 5 (Thread 7640):
Thread 4 (Thread 7796):
Thread 3 (Thread 7797):
Thread 2 (Thread 7798):
Thread 1 (Thread 0x41a93940 (LWP 9131)):
Date: 2013-02-18 14:03:55 +0100
Created attachment 180
Date: 2013-02-18 14:04:07 +0100
Created attachment 181
Date: 2013-02-18 14:06:45 +0100
A description of our use case is :
One of our application component does following activities:
Above can happen in parallel.
Please guide this crash issue is very critical for our successful usage of monetdb and is kind of showstopper for us.
Date: 2013-02-18 14:09:49 +0100
This is using feb candidate build from http://dev.monetdb.org/downloads/testing/sources/Latest/
Date: 2013-02-18 17:07:27 +0100
Just wondering if anyone can have a look on the issue , and can suggest some workarounds / fixes??
Date: 2013-02-18 17:16:03 +0100
Does this still happen with the latest nightly build?
Date: 2013-02-18 17:18:56 +0100
I took one posted in latest testing area, which was placed on 12 th of this month, are you suggesting something like this was fixed after that?
Date: 2013-02-18 18:06:45 +0100
Latest is not building from feb branch!
Date: 2013-02-18 18:24:03 +0100
You took the sources of the Feb2013 release candidate. After the release candidate was built there have been fixes to the Feb2013 branch from which it was built. What I want to know is whether one of those fixes fixed your issue already (in which case I may decide to build a new relase candidate).
I hope this is clear.
Date: 2013-02-18 18:33:38 +0100
Yes this is clear , when I am trying to compile latest from February 2013 dev branch it is giving me errors below:
I am doing :
Change list I took is f82d192b0bf7
12th Feb release candidate was building fine though.
-I../common/stream -I./../common/stream -I../common/utils -I./../common/utils -DLIBGDK -g -Werror -Wall -Wextra -W -Werror-implicit-function-declaration -Wpointer-arith -Wdeclaration-after-statement -Wundef -Wformat=2 -Wno-format-nonliteral -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wpacked -Wunknown-pragmas -Wvariadic-macros -fstack-protector-all -Wstack-protector -Wmissing-include-dirs -D_REENTRANT -c gdk_search.c -fPIC -DPIC -o .libs/libbat_la-gdk_search.o
Date: 2013-02-18 18:51:26 +0100
(In reply to comment 17)
That is not the correct branch. Please use the command I gave you:
hg clone -u Feb2013 http://dev.monetdb.org/hg/MonetDB
(Or clean up everything that was created by this attempt and then do:
Date: 2013-02-18 18:56:46 +0100
(In reply to comment 13)
Date: 2013-02-18 19:15:32 +0100
Thanks! now tests are running will let you know about outcome.
Date: 2013-02-18 19:27:36 +0100
No Luck, Houston we have a problem!!
Program received signal SIGABRT, Aborted.
Thread 46 (Thread 0x407f8940 (LWP 1865)):
Thread 7 (Thread 0x41edf940 (LWP 1130)):
Thread 6 (Thread 0x40312940 (LWP 1131)):
Thread 5 (Thread 0x420e0940 (LWP 1132)):
Thread 4 (Thread 0x4194b940 (LWP 1133)):
Thread 3 (Thread 0x41631940 (LWP 1215)):
Thread 2 (Thread 0x422e1940 (LWP 1216)):
Thread 1 (Thread 0x2af73df16f30 (LWP 1129)):
Date: 2013-02-18 19:37:49 +0100
did your last run start with a clean database or some older (and therefor possibly corrupt) db? Just to exclude this case..
Date: 2013-02-18 19:47:05 +0100
That was with OLD database:
Below is the crash with a fresh DB farm :
(gdb) thr app all bt
Thread 40 (Thread 0x4201a940 (LWP 4668)):
Thread 7 (Thread 0x40cac940 (LWP 4190)):
Thread 6 (Thread 0x41e19940 (LWP 4191)):
Thread 5 (Thread 0x40a0a940 (LWP 4192)):
Thread 4 (Thread 0x417e4940 (LWP 4193)):
Thread 3 (Thread 0x41bb3940 (LWP 4227)):
Thread 2 (Thread 0x4221b940 (LWP 4228)):
Thread 1 (Thread 0x2b502caf0f30 (LWP 4189)):
Date: 2013-02-18 20:03:17 +0100
in your debugger, once the assertion happens, could you please go to the very thread that it happened in (thread 40 in your latest trace), there "up" to the function where the assertion was triggered ("delta_append_val()"), execute the following print commands, and share their output:
Date: 2013-02-18 20:12:04 +0100
That session got closed , and a new crash is showing different log: Do you want me to run same commands and share output?
0 0x0000003722430285 in raise () from /lib64/libc.so.6
Thread 56 (Thread 0x40cea940 (LWP 7012)):
Thread 7 (Thread 0x4128d940 (LWP 6804)):
Thread 6 (Thread 0x41ad0940 (LWP 6805)):
Thread 5 (Thread 0x41eed940 (LWP 6806)):
Thread 4 (Thread 0x4148e940 (LWP 6807)):
Thread 3 (Thread 0x408e8940 (LWP 6819)):
Thread 2 (Thread 0x40ae9940 (LWP 6820)):
Thread 1 (Thread 0x2b2a48745f30 (LWP 6803)):
Date: 2013-02-18 20:15:41 +0100
No need now. Lets first have a look at the java code you have..
Date: 2013-02-18 20:17:01 +0100
Created attachment 182
Make Changes for DB connections in ConfigurationDBConnection.Java Class
Create Schema using schema file
Run com.altair.test.MonetDBLoadTestBasedOnQueueAndQueryFrequently class
Date: 2013-02-18 20:17:25 +0100
Created attachment 183
Date: 2013-02-18 20:18:10 +0100
(In reply to comment 26)
Shared Let me know if you need an online skype meeting to get you up to speed with this.
Date: 2013-02-19 13:45:45 +0100
Any thoughts on this issue? Work arounds / possible time for fixes?
Date: 2013-02-20 11:08:51 +0100
One of our appplication release is held waiting for this issue and we are pressed on time, we are ready to try out any work arounds if you can suggest which can help avoiding this issue as an intermediate step.
Any suggestions ill be helpful! Thanks in advance!
Date: 2013-02-20 11:10:24 +0100
Corrected typo below
(In reply to comment 31)
Date: 2013-02-20 12:04:27 +0100
I remind you that you are relying on the MonetDB as open-source project
Thank you for reporting the issues. If fixes are required they will appear
regards, Martin Kersten
Date: 2013-02-20 14:27:23 +0100
I understand it. Your technology was surely very useful for us!
Date: 2013-02-21 08:28:15 +0100
Dear Niels and all,
I ran a slightly modified tests today which has not crashed so far in 10 hours, description is below:
I hope this info might be useful in digging down the issue. Please share your thoughts, findings , suggestions on the same.
Date: 2013-02-21 12:47:30 +0100
For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=ba7ad0186586
Date: 2013-02-22 08:10:35 +0100
Fix works great, straight 10 hours no crash so far. Will it be possible to have a February 2013 SP1 with this fix any time soon, I think it's important as it was leading to invalid database state/corruption?
Date: 2013-02-25 16:12:41 +0100
I assume that the bug is now fixed.
Date: 2013-03-07 12:41:22 +0100
Feb2013-SP1 has been released.
The text was updated successfully, but these errors were encountered: