Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV in BATins_kdiff #3577

monetdb-team opened this issue Nov 30, 2020 · 0 comments

SIGSEGV in BATins_kdiff #3577

monetdb-team opened this issue Nov 30, 2020 · 0 comments


Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2014-09-19 16:09:53 +0200
From: Richard Hughes <<richard.monetdb>>
To: GDK devs <>
Version: 11.17.21 (Jan2014-SP3)
CC: @njnes

Last updated: 2014-10-31 14:13:49 +0100

Comment 20168

Date: 2014-09-19 16:09:53 +0200
From: Richard Hughes <<richard.monetdb>>

[This bug report is somewhat of a continuation of a conversation Niels and I have been having via private e-mail. See that for some more of the history, but I'll try to repeat everything here]

We are currently running a build that is Jan2014 head (b8aff254b139) plus a patch Niels sent me on August 17 (which got committed as part of the unrelated 9698a5f83bf6) plus a cherry-pick of fe0b3b84c297 This survives our torture tests considerably better than any previous build, but after 130 minutes of simulating traffic as fast as possible (starting from a new, clean database) things started going downhill. We first got a SEGV:

0 0x00007f0b361f8f0b in BATins_kdiff (bn=0x7f0b363e0ab0 <oid_nil>,
bn@entry=0x7f0af84abe30, l=0x7f0930face18, r=0x7f0af82b3aa8,
r@entry=0x7f0b201e0598) at gdk_setop.c:794
1 0x00007f0b3622205c in diff_intersect (set=0, diff=1, r=,
l=0x7f0af82b3900) at gdk_setop.c:833
2 BATkdiff (l=l@entry=0x7f0af82b3900, r=0x7f0b201e0598) at gdk_setop.c:883
3 0x00007f0b2f929b54 in SQLtid (cntxt=, mb=,
stk=, pci=) at sql.c:2244
4 0x00007f0b36753168 in runMALsequence (cntxt=0x7f0b363e0ab0 <oid_nil>,
mb=0x7f09cb729130, startpc=-131384664, stoppc=0, stk=0x7f094b7aee60,
env=0x7f0b3660e3d0 <BATatoms+1104>, pcicaller=0x0)
at mal_interpreter.c:651
5 0x00007f0b367552c5 in DFLOWworker (T=0x7f0b363e0ab0 <oid_nil>)
at mal_dataflow.c:358
6 0x00007f0b350e70a4 in start_thread (arg=0x7f0a7c89f700)
at pthread_create.c:309
7 0x00007f0b34e1c04d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

then exactly 300 seconds after it came back up we got "42S01!CREATE VIEW: name 'uniondata' already in use" [this error came from an event type (4) described below, which makes it especially incomprehensible given that the immediately preceding statement was DROP VIEW.] followed by "GDK reported error. BATproject: does not match always" followed by the whole mserver5 wedging up and refusing to accept any more connections.

The load we're using is generated by two processes. One just does SELECTs of the usual sorts of query that this database is expected to handle (e.g. SELECT ... FROM uniondata WHERE ... GROUP BY ... ORDER BY ...) and the other is inserting new data into those tables by replicating it from an external row store.

The replication consists of four types of event, all of them in manual-commit mode:

  1. Normally rows are inserted with COPY INTO staging FROM STDIN; UPDATE replication_control SET ... [this is a single-row table];COMMIT

  2. Very occasionally a row is UPDATEd in the row store, which we replicate to MonetDB using CREATE TEMP TABLE updates AS SELECT * FROM staging WITH NO DATA;COPY INTO updates FROM STDIN;DELETE FROM updates WHERE EXISTS(SELECT * FROM staging x WHERE AND x.ver>=updates.ver);DELETE FROM staging WHERE EXISTS(SELECT * FROM updates x WHERE AND x.ver>staging.ver);INSERT INTO staging SELECT * from updates;DROP TABLE updates;COMMIT

  3. Every 100,000 rows we flush the "staging" table using INSERT INTO data_yyyymm;DELETE FROM staging;COMMIT

  4. If that flush contains data from a month we haven't seen before then we'll populate it instead with CREATE TABLE data_yyyymm AS SELECT * FROM staging WITH DATA;DROP VIEW uniondata;CREATE VIEW uniondata AS SELECT * FROM data_201405 UNION ALL SELECT * FROM data_201406 UNION ALL ...;COMMIT.

The crash in the backtrace above was immediately after a row update as described in (2), between the DROP TABLE updates and the COMMIT. It was the fourth occurrence of this in our test load, amid tens of millions of normal inserts.

I have no consistent reproduction recipe at the moment and the problem is clearly intermittent. I have the core dump if you want any more information from it. I had a quick look at the BATins_kdiff macro to see if I could figure out which line crashed, but quickly got lost. If you're having no luck figuring it out then I can have another go.

I suspect (but don't know) that we're looking at two different problems here: the initial crash is just a simple bug which might be easy to find, and all the subsequent problems are caused by some mis-initialization in log replay which could be harder to spot.

Comment 20194

Date: 2014-09-22 11:55:55 +0200
From: Richard Hughes <<richard.monetdb>>

Grovelling through the disassembly and the preprocessed source, I think the crash is actually on line 487 of gdk_setop.c:

HASHloopa4(ri, r->H->hash, s2, h) {

specifically, r->H->hash->width fails because r->H->hash is NULL. Here are some local variables you might be interested in:

(gdb) p r->H
$1 = (COLrec *) 0x0
(gdb) p *r
$2 = {batCacheid = 0, H = 0x0, T = 0x0, P = 0x224f06, U = 0x224f06}
(gdb) p *l
Cannot access memory at address 0x7f0930face18
(gdb) i loc
p1 = 1522116
s2 = 139677453176344
li = {b = 0x7f0af82b3900, hvid = 41994288, tvid = 41994288}
hash = 2089412968
(gdb) p *li.b->H
$3 = {id = 0x7f0b363d947f "h", width = 0, type = 0 '\000', shift = 0 '\000',
varsized = 1, key = 0, dense = 0, nonil = 1, nil = 0, sorted = 1,
revsorted = 0, align = 150795355, nokey = {0, 0}, nosorted = 0,
norevsorted = 0, nodense = 0, seq = 40472172, heap = {free = 0, size = 0,
base = 0x0, filename = 0x0, copied = 0, hashash = 0, forcemap = 0,
storage = STORE_MEM, newstorage = STORE_MEM, dirty = 0 '\000',
parentid = 0}, vheap = 0x0, hash = 0x0, imprints = 0x0, props = 0x0}
(gdb) p *li.b->T
$4 = {id = 0x7f0b363e8565 "t", width = 0, type = 0 '\000', shift = 0 '\000',
varsized = 1, key = 0, dense = 0, nonil = 1, nil = 0, sorted = 1,
revsorted = 0, align = 150795356, nokey = {0, 0}, nosorted = 0,
norevsorted = 0, nodense = 0, seq = 40472172, heap = {free = 0, size = 0,
base = 0x0, filename = 0x0, copied = 0, hashash = 0, forcemap = 0,
storage = STORE_MEM, newstorage = STORE_MEM, dirty = 0 '\000',
parentid = 0}, vheap = 0x0, hash = 0x0, imprints = 0x0, props = 0x0}

I'm fairly sure I'm reading the disassembly right, which would imply that we're looking at a race where one thread is clearing out r while this thread is reading it. I say this because the current values of the variables imply that we should have crashed trying to get r->H->hash, but we got one step further.

Nearly all threads are either idle (in select()/sem_wait()) or in BATins_kdiff(). The only one that isn't is:

(gdb) bt
0 0x00007f0b34e10d17 in unlink () at ../sysdeps/unix/syscall-template.S:81
1 0x00007f0b362a13ec in GDKunlink (dir=dir@entry=0x7f0b363e8563 "bat",
nme=nme@entry=0x7f0ac80c7480 "21/77/217714",
ext=ext@entry=0x7f0b363d9724 "tail") at gdk_storage.c:197
2 0x00007f0b36190db6 in HEAPdelete (h=,
o=o@entry=0x7f0ac80c7480 "21/77/217714",
ext=ext@entry=0x7f0b363d9724 "tail") at gdk_heap.c:743
3 0x00007f0b362a3453 in BATdelete (b=b@entry=0x7f0adc103c70)
at gdk_storage.c:802
4 0x00007f0b3618cd1a in BBPdestroy (b=b@entry=0x7f0adc103c70)
at gdk_bbp.c:2446
5 0x00007f0b3618bd9a in decref (i=73676, logical=,
releaseShare=, lock=1) at gdk_bbp.c:2180
6 0x00007f0b36752ff6 in runMALsequence (cntxt=0x7f0b2eef0920,
mb=0x7f0b219e0b10, startpc=4, stoppc=-1, stk=0x7f0a713d5e20,
env=0x7f0b350d8460 <_IO_strn_jumps>, pcicaller=0x0)
at mal_interpreter.c:815
7 0x00007f0b36754690 in callMAL (cntxt=0x1c81368, mb=0x7f0b363d9724,
mb@entry=0x7f0b219e0b10, env=0x7f0b2eef1dc8, argv=0x7f0b2eef1dc0,
debug=0 '\000') at mal_interpreter.c:465
8 0x00007f0b2f93906f in SQLexecutePrepared (be=,
q=0x7f0b23b57ca0, c=) at sql_scenario.c:2224
9 SQLengineIntern (c=, be=)
at sql_scenario.c:2286
10 0x00007f0b3676f4f7 in runPhase (phase=4, c=0x1c81368)
at mal_scenario.c:528
11 runScenarioBody (c=c@entry=0x1c81368) at mal_scenario.c:572
12 0x00007f0b3677005d in runScenario (c=c@entry=0x1c81368)
at mal_scenario.c:592
13 0x00007f0b36770128 in MSserveClient (dummy=0x1c81368) at mal_session.c:456
14 0x00007f0b362bddeb in thread_starter (arg=0x7f0b28000a20)
at gdk_system.c:505
15 0x00007f0b350e70a4 in start_thread (arg=0x7f0b2eef2700)
at pthread_create.c:309
16 0x00007f0b34e1c04d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

I can't see any relationship between the data being accessed in those two threads, but I don't have a good enough picture of the code to be at all confident. It's also possible (likely, even) that one or more threads ran on a considerable distance before the kernel could catch it in order to perform the code dump.

Comment 20198

Date: 2014-09-25 19:16:16 +0200
From: Richard Hughes <<richard.monetdb>>

NB: some of the conclusions in my previous comment were wrong because gdb was lying to me about the values of local variables.

The two stacks which are fighting are:

0 0x00007ffff72c3caf in BATins_kdiff (bn=0x7fffc80096f0, l=0x7fffc80030f0,
r=0x7fffe0158458) at gdk_setop.c:794
1 0x00007ffff72c91bb in diff_intersect (l=0x7fffc80030f0, r=0x7fffe0158458,
diff=1, set=0) at gdk_setop.c:833
2 0x00007ffff72c9730 in BATkdiff (l=0x7fffc80030f0, r=0x7fffe0158458)
at gdk_setop.c:883
3 0x00007fffef5f0d2e in SQLtid (cntxt=0x628470, mb=0x7fff944cbc10,
stk=0x7fff941995e0, pci=0x7fff94143d00) at sql.c:2244
4 0x00007ffff78a9486 in runMALsequence (cntxt=0x628470, mb=0x7fff944cbc10,
startpc=4, stoppc=5, stk=0x7fff941995e0, env=0x0, pcicaller=0x0)
at mal_interpreter.c:651


HASHremove (gdk_search.c:497) ("b->H->hash = NULL;")
HASHdestroy (gdk_search.c:508) ("HASHremove(BATmirror(b));")
BATextend (gdk_bat.c:493)
BATappend (gdk_batop.c:499)
delta_delete_bat (bat_storage.c:612)
delete_tab (bat_storage.c:688)
mvc_delete_wrap (sql.c:1886)
runMALsequence (mal_interpreter.c:651)

That second stack trace was reverse-engineered from printf debugging, hence the strange format and lack of detail and line numbers that don't sync with any upstream tree.

Tomorrow's task is to figure out which mutex is supposed to stop that happening, and thence why it isn't.

Comment 20204

Date: 2014-09-26 16:55:09 +0200
From: Richard Hughes <<richard.monetdb>>

My mutex search is not going well.

GDKhashLock appears to protect writes to ->hash (although it doesn't seem to be taken in all cases (e.g. HASHremove()). Still trying to figure out if those cases are safe because of the type of write they're doing).

http//devmonetdborg/hg/MonetDB?cmd=changeset;node=ffabd521cfbf fixed exactly the problem in this bug when it was triggered by the MAL optimizers scheduling deletes in parallel with reads within the same query. That was done by preventing the optimizers from doing that transform, so doesn't apply in this case.

I can't find anything that protects reads in different queries to deletes. Should GDKhashLock be a reader-writer lock instead? Should there be some MVCC trick that makes it work? I'm stuck.

Comment 20210

Date: 2014-09-28 15:15:14 +0200
From: @njnes

There is no mutex protecting deletes and concurrent reads. The trick used is to
keep deletes in a seperate delete list (bat). For each transaction these delete
lists should be private.

Comment 20212

Date: 2014-09-28 16:44:15 +0200
From: Richard Hughes <<richard.monetdb>>

Hi Niels,

I'd found the delete bat, and I've been assuming the kdiff in the reader thread's stack is for the purpose of removing the list of already-deleted rows from the full table being returned. What I can't figure out is what happens on the second (or subsequent) delete transaction to the same table.

Are you saying that there are multiple delete bats on a single table? I can't see where those are resolved on the reader side given that neither SQLtid() nor bind_del() have a suitable-looking loop in them.

Or are you saying that a transaction has a private delete bat until commit time, when it is appended to the single global delete bat? If that's the case then there is still going to be an instant when the hash on the global delete bat (which is what I think is being raced on here) is in an unstable state for concurrent readers.

Comment 20213

Date: 2014-09-28 16:58:14 +0200
From: @njnes


Indeed the bat with deletes holds the already deleted rows (ie from previous
transactions). Any new transaction should get a new (copy) of this bat.

At commit the deletes are moved to the central transaction, but not by inserting.
The full deletion list is replaced.

Comment 20214

Date: 2014-09-28 17:08:29 +0200
From: @njnes

the fresh copy is created in the dup_dbat code.

Comment 20215

Date: 2014-09-28 18:16:21 +0200
From: Richard Hughes <<richard.monetdb>>

Ah, that makes much more sense than some of the wild theories I was trying to come up with.

So, things to look in to tomorrow (my test system is offline today):

  • Is dup_dbat() not being called (e.g. because !(!t->data || !t->base.allocated) in delete_tab())?
  • Is the duplicate being published to the world prematurely (e.g. directly referencing dbat rather than using timestamp_dbat())?
  • Are writes being done to the original (or both) when they should be being done to the copy only?

Comment 20216

Date: 2014-09-29 14:28:38 +0200
From: Richard Hughes <<richard.monetdb>>

Definitely option 2: the reader thread is accessing data which it's not supposed to be able to see yet. At bind_del() time everything seems right, but by the time of the SEGV the data seems to have been switched out from underneath and it's all gone horribly wrong.

How's this for a patch?

diff -r 8f10a8b13e77 sql/storage/bat/bat_storage.c
--- a/sql/storage/bat/bat_storage.c Tue Sep 23 08:02:46 2014 +0200
+++ b/sql/storage/bat/bat_storage.c Mon Sep 29 13:24:17 2014 +0100
@@ -520,7 +520,7 @@
bat->wtime = obat->wtime;
if (bat->dbid) {
if (is_new) {

  •  	obat->dbid = temp_copy(bat->dbid, temp);
  •  	bat->dbid = temp_copy(bat->dbid, temp);
     } else {
     	bat->dbid = ebat_copy(bat->dbid, 0, temp);

It survives my test case (which I still haven't been able to reduce successfully) and vaguely seems like it ought to be right.

Comment 20217

Date: 2014-09-29 16:43:08 +0200
From: @njnes

is_new path should not be taken as your table isn't new. The use of obat here looks indeed wrong.

Comment 20218

Date: 2014-09-29 17:11:32 +0200
From: Richard Hughes <<richard.monetdb>>

Yeah, I was a bit confused about that. Nevertheless, that is definitely the branch being taken (bat->dbid=0, is_new=1, temp=0). I can put some more printf calls in if you tell me where you want them.

Comment 20219

Date: 2014-09-29 17:15:56 +0200
From: Richard Hughes <<richard.monetdb>>

I should clarify: I printed those values at the top of the function, not immediately prior to the first if-statement. bat->dbid is non-zero by the time of the first if, of course.

Comment 20220

Date: 2014-09-29 19:24:04 +0200
From: @njnes

we need to figure out why the is_new is set. Starting one level up (ie any call to dub_dbat, and in these functions we need to know the table name.

Comment 20221

Date: 2014-09-29 19:45:01 +0200
From: Richard Hughes <<richard.monetdb>>

Breakpoint 1, dup_dbat (tr=0x7fffe0007f90, obat=0x7fffe026b450,
bat=0x7fffe02a6410, is_new=1, temp=0) at bat_storage.c:523
523 {
(gdb) up
1 0x00007fffef4c2f07 in delete_tab (tr=0x7fffe0007f90, t=0x7fffe0bd6e20,
ib=0x7fff88004990, tpe=4) at bat_storage.c:659
659 dup_dbat(tr, obat, bat, isNew(ot), isTempTable(t));
(gdb) p *t
$1 = {base = {wtime = 0, rtime = 83, allocated = 0, flag = 0, id = 7080,
name = 0x7fffe0bd6f50 "data_201407"}, type = 0, system = 0 '\000',
persistence = SQL_PERSIST, commit_action = CA_COMMIT, readonly = 0 '\000',
query = 0x0, sz = 1024, pkey = 0x0, columns = {sa = 0x7fffe0008020,
destroy = 0x7fffef4a5d68 <column_destroy>, set = 0x7fffe0bd7030,
dset = 0x0, nelm = 0x0}, idxs = {sa = 0x7fffe0008020,
destroy = 0x7fffef4a5c4a <idx_destroy>, set = 0x0, dset = 0x0,
nelm = 0x0}, keys = {sa = 0x7fffe0008020,
destroy = 0x7fffef4a5abb <key_destroy>, set = 0x0, dset = 0x0,
nelm = 0x0}, triggers = {sa = 0x7fffe0008020,
destroy = 0x7fffef4a5ce9 <trigger_destroy>, set = 0x0, dset = 0x0,
nelm = 0x0}, tables = {sa = 0x0, destroy = 0x0, set = 0x0, dset = 0x0,
nelm = 0x0}, drop_action = 0, cleared = 0, data = 0x7fffe02a6410,
s = 0x7fffe0008290, p = 0x0}
(gdb) p *ot
$2 = {base = {wtime = 83, rtime = 82, allocated = 1, flag = 1, id = 7080,
name = 0x1c014a0 "data_201407"}, type = 0, system = 0 '\000',
persistence = SQL_PERSIST, commit_action = CA_COMMIT, readonly = 0 '\000',
query = 0x0, sz = 1024, pkey = 0x0, columns = {sa = 0x188e760,
destroy = 0x7fffef4a5d68 <column_destroy>, set = 0x1c01530, dset = 0x0,
nelm = 0x0}, idxs = {sa = 0x188e760,
destroy = 0x7fffef4a5c4a <idx_destroy>, set = 0x0, dset = 0x0,
nelm = 0x0}, keys = {sa = 0x188e760,
destroy = 0x7fffef4a5abb <key_destroy>, set = 0x0, dset = 0x0,
nelm = 0x0}, triggers = {sa = 0x188e760,
destroy = 0x7fffef4a5ce9 <trigger_destroy>, set = 0x0, dset = 0x0,
nelm = 0x0}, tables = {sa = 0x0, destroy = 0x0, set = 0x0, dset = 0x0,
nelm = 0x0}, drop_action = 0, cleared = 0, data = 0x7fffe026b450,
s = 0x19e3600, p = 0x0}
(gdb) p *tr
$3 = {name = 0x0, stime = 83, wstime = 84, rtime = 83, wtime = 84,
schema_number = 33, schema_updates = 0, status = 0, dropped = 0x0,
schemas = {sa = 0x7fffe0008020, destroy = 0x7fffef4a5e76 <schema_destroy>,
set = 0x7fffe003fb80, dset = 0x0, nelm = 0x0}, sa = 0x7fffe0008020,
parent = 0x18d3fd0, stk = 0}
(gdb) p *obat
$4 = {dname = 0x7fffe026ad70 "D_sys_data_201407", dbid = 946, cnt = 96452,
wtime = 31, next = 0x0}
(gdb) p *bat
$5 = {dname = 0x0, dbid = 0, cnt = 0, wtime = 0, next = 0x0}

"data_201407" is indeed one of the tables on which delete statements execute (see comment 1 for more detail).

Comment 20222

Date: 2014-09-29 19:54:56 +0200
From: @njnes

that data_* table exists longer, ie isNew(ot) should be 0 (but is not). I'll have to see why this is the case.

Comment 20260

Date: 2014-10-08 09:12:20 +0200
From: @njnes

fixed by the collective patches on the storage layer

Comment 20272

Date: 2014-10-08 19:59:57 +0200
From: Richard Hughes <<richard.monetdb>>

Created attachment 302
patch to store.c

Did these patches ever get checked in? I'm tracking two patches related to this bug and neither appears to have gone upstream yet.

You e-mailed me this diff on 2014-09-29 18:22 UTC

Attached file: tr-old-tr-new.patch (text/plain, 1925 bytes)
Description: patch to store.c

Comment 20273

Date: 2014-10-08 20:01:08 +0200
From: Richard Hughes <<richard.monetdb>>

Created attachment 303
patch to bat_storage.c

...and this is the patch from comment 9

Attached file: dup_dbat-isnew.patch (text/plain, 494 bytes)
Description: patch to bat_storage.c

Comment 20274

Date: 2014-10-08 21:57:29 +0200
From: MonetDB Mercurial Repository <>

Changeset a2d9f9e5a5af made by Niels Nes in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=a2d9f9e5a5af

Changeset description:

once tables get persistent they should no longer be flaged as 'new'. Solves
problems with deletes. Thanks too Richard Hughes for finding the problem.
See also bug #3577.

Comment 20275

Date: 2014-10-08 21:59:19 +0200
From: @njnes

checked the patch (302), which should be enough for this problem.

Comment 20339

Date: 2014-10-31 14:13:49 +0100
From: @sjoerdmullender

Oct2014 has been released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant