Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion on BATgroup_internal #3237

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

Assertion on BATgroup_internal #3237

monetdb-team opened this issue Nov 30, 2020 · 0 comments

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2013-02-21 13:25:24 +0100
From: @swingbit
To: GDK devs <>
Version: 11.15.1 (Feb2013)
CC: @drstmane

Last updated: 2013-03-07 12:41:23 +0100

Comment 18534

Date: 2013-02-21 13:25:24 +0100
From: @swingbit

User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17
Build Identifier:

Unfortunately it is not easy to make this issue reproducible.

I got the issue on a query that looks like this:

CREATE TABLE "_attributesString" (
"subject" INTEGER,
"attribute" CHARACTER LARGE OBJECT,
"value" CHARACTER LARGE OBJECT,
"prob" DOUBLE DEFAULT 1.0
);

SELECT subject, attribute, value, MAX(prob) as prob FROM "_attributesString" GROUP BY subject, attribute, value;

However, the failure is data-dependent and the data that triggers the failure is produced inside an iterative process.

I got two types of error, depending on the data at hand:

An assertion: gdk/gdk_group.c:482: BATgroup_internal: Assertion `hs->link[hb] == ((BUN) 9223372036854775807LL) || hs->link[hb] < hb' failed.

And a SEGFAULT (I am relatively sure this happens when evaluating the same query):

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f9fbc3ff700 (LWP 17502)]
0x00007f9fc74cf8ca in runMALsequence (cntxt=0xf86630, mb=0x7f9f941c24e0, startpc=1, stoppc=49, stk=0x7f9f94e5dc50, env=0x7f9f94e198f0, pcicaller=0x7f9f94debfa0)
at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_interpreter.c:801
801 if (isaBatType(getArgType(mb, pci, i))) {
(gdb) bt
0 0x00007f9fc74cf8ca in runMALsequence (cntxt=0xf86630, mb=0x7f9f941c24e0, startpc=1, stoppc=49, stk=0x7f9f94e5dc50, env=0x7f9f94e198f0, pcicaller=0x7f9f94debfa0)
at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_interpreter.c:801
1 0x00007f9fc74cf2b3 in runMALsequence (cntxt=0xf86630, mb=0x7f9f941b4860, startpc=1, stoppc=0, stk=0x7f9f94e198f0, env=0x0, pcicaller=0x0)
at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_interpreter.c:720
2 0x00007f9fc74ce335 in callMAL (cntxt=0xf86630, mb=0x7f9f941b4860, env=0x7f9fbc3feba0, argv=0x7f9f94c96ee0, debug=0 '\000')
at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_interpreter.c:469
3 0x00007f9fbf282f56 in SQLexecutePrepared (c=0xf86630, be=0x7f9f94164740, q=0x7f9f9415c600) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/sql/backends/monet5/sql_scenario.c:1840
4 0x00007f9fbf283345 in SQLengineIntern (c=0xf86630, be=0x7f9f94164740) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/sql/backends/monet5/sql_scenario.c:1907
5 0x00007f9fbf2838ba in SQLengine (c=0xf86630) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/sql/backends/monet5/sql_scenario.c:2008
6 0x00007f9fc74fba95 in runPhase (c=0xf86630, phase=4) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_scenario.c:522
7 0x00007f9fc74fbc82 in runScenarioBody (c=0xf86630) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_scenario.c:567
8 0x00007f9fc74fbdb4 in runScenario (c=0xf86630) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_scenario.c:586
9 0x00007f9fc74fce4a in MSserveClient (dummy=0xf86630) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_session.c:431
10 0x0000003599007761 in start_thread () from /lib64/libpthread.so.0
11 0x0000003598ce098d in clone () from /lib64/libc.so.6

Reproducible: Sometimes

MonetDB 5 server v11.15.2 (64-bit, 64-bit oids)
This is an unreleased version
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2013 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 35.5GiB available memory, 8 available cpu cores
Libraries:
libpcre: 7.8 2008-09-05 (compiled with 7.8)
openssl: OpenSSL 1.0.0d 8 Feb 2011 (compiled with OpenSSL 1.0.0d-fips 8 Feb 2011)
libxml2: 2.7.7 (compiled with 2.7.7)
Compiled by: roberto@spinque01.ins.cwi.nl (x86_64-unknown-linux-gnu)
Compilation: gcc -g -Werror -Wall -Wextra -W -Werror-implicit-function-declaration -Wpointer-arith -Wdeclaration-after-statement -Wformat=2 -Wno-format-nonliteral -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wpacked -Wunknown-pragmas -Wvariadic-macros -fstack-protector-all -Wstack-protector -Wpacked-bitfield-compat -Wsync-nand -Wmissing-include-dirs
Linking : /usr/bin/ld -m elf_x86_64

Comment 18535

Date: 2013-02-21 14:13:17 +0100
From: @sjoerdmullender

This is the assertion that our hashes link backwards. Apparently they don't always.

Roberto, when this happens in the debugger, can you print both hb and hs->link[hb]?

Comment 18536

Date: 2013-02-21 14:37:54 +0100
From: @swingbit

(gdb) p hb
$1 = 140
(gdb) p hs->link[hb]
$2 = 282
(gdb)

A bit more context:

0 0x0000003598c328f5 in raise () from /lib64/libc.so.6
1 0x0000003598c340d5 in abort () from /lib64/libc.so.6
2 0x0000003598c2b8b5 in __assert_fail () from /lib64/libc.so.6
3 0x00007f1bb3421368 in BATgroup_internal (groups=0x7f1ba8481448, extents=0x7f1ba8481440, histo=0x7f1ba8481438, b=0x7f1b5c3c7610, g=0x7f1b5cac82b0, e=0x0, h=0x0, subsorted=0)
at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/gdk/gdk_group.c:481
4 0x00007f1bb3427762 in BATgroup (groups=0x7f1ba8481448, extents=0x7f1ba8481440, histo=0x7f1ba8481438, b=0x7f1b5c3c7610, g=0x7f1b5cac82b0, e=0x0, h=0x0)
at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/gdk/gdk_group.c:734
5 0x00007f1bb3baad52 in GRPsubgroup4 (ngid=0x7f1b5c5945f0, next=0x7f1b5c594600, nhis=0x7f1b5c594610, bid=0x7f1b5c5944f0, gid=0x7f1b5c5945c0, eid=0x0, hid=0x0)
at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/modules/kernel/group.mx:565
6 0x00007f1bb3baae93 in GRPsubgroup2 (ngid=0x7f1b5c5945f0, next=0x7f1b5c594600, nhis=0x7f1b5c594610, bid=0x7f1b5c5944f0, gid=0x7f1b5c5945c0)
at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/modules/kernel/group.mx:586

Comment 18537

Date: 2013-02-21 14:56:05 +0100
From: @sjoerdmullender

It would be interesting to know how the bat (the first argument to the MAL function) got its hash table. Do you know?
Is it possible that the hash table was updated after it was created?

The simple solution would be to not depend on the supposed fact that the linked list in the hash table always point towards lower BUNs. But it would be interesting where in the code that supposition is broken.

Comment 18538

Date: 2013-02-21 15:06:02 +0100
From: @swingbit

Unfortunately I have difficulties getting the MAL plan of the failing transaction.

Comment 18541

Date: 2013-02-22 11:06:56 +0100
From: @sjoerdmullender

Changeset 88acc1bec9c7 made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=88acc1bec9c7

Changeset description:

We can't exploit hash links being in reverse order for existing hash tables.
We tried to exploit the supposed fact that links in the collision
lists of our hash tables were in reverse order of the BUN number.  It
seems this assumption was not correct, at least not in all cases.
This should fix bug #3237.

Comment 18542

Date: 2013-02-22 11:07:20 +0100
From: @sjoerdmullender

Roberto, can you test please.

Comment 18543

Date: 2013-02-22 13:35:41 +0100
From: @swingbit

The problem seems indeed solved, thanks!

Comment 18545

Date: 2013-02-22 15:31:59 +0100
From: @swingbit

The SEGFAULT mentioned in the initial bug report is apparently not (directly) related to the bug fix, as it still occurs (changing the bug title accordingly).

Comment 18546

Date: 2013-02-22 15:49:43 +0100
From: @sjoerdmullender

Can you then submit a fresh bug report for the segfault?

Comment 18549

Date: 2013-02-23 00:06:32 +0100
From: @drstmane

Does that segfault also occur with assertions enabled?

Comment 18550

Date: 2013-02-25 10:07:44 +0100
From: @swingbit

Stefan, I will try that.

What I found so far is that the SEGFAULT does not seem to be related to grouping at all.

I have reduced it now to a simple selection on a view with an user function that does string processing (pcre). But again, it seems data-dependent, so I will file a bug report as soon as I can make it somewhat reproducible (unfortunately I am not allowed to share the whole data as they are).

Comment 18551

Date: 2013-02-25 10:17:53 +0100
From: @swingbit

Stefan,

Now that I think about it, I am compiling as developer, so assertions are enabled:

--enable-strict --enable-assert --enable-debug --disable-optimize

Comment 18593

Date: 2013-03-07 12:41:23 +0100
From: @sjoerdmullender

Feb2013-SP1 has been released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant