Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double-free of imprints #3599

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

Double-free of imprints #3599

monetdb-team opened this issue Nov 30, 2020 · 0 comments

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2014-10-08 13:27:38 +0200
From: Richard Hughes <<richard.monetdb>>
To: GDK devs <>
Version: 11.17.21 (Jan2014-SP3)

Last updated: 2014-10-31 14:14:40 +0100

Comment 20263

Date: 2014-10-08 13:27:38 +0200
From: Richard Hughes <<richard.monetdb>>

I apologise in advance for the poor quality of this bug report. The 'steps to reproduce' are a lot of high-load read/write traffic involving tens of GB of data and several different machines. I've already described our traffic pattern in bug #3577, so refer to that for more background. I can't find any correlation between steps being performed by our clients and the crashes. Sometimes it crashes after 10 minutes, sometimes it goes for hours.

Build is Oct2014 1acfd2fe6767 plus a bunch of local patches (most from Niels).

Nevertheless, here's what I've got:

The final result is usually a SIGSEGV in HEAPdelete (I once managed to get AddressSanitizer to pick up the double-free, but that's unusual):
0 0x00007ffff59e2679 in HEAPdelete (h=0x676e69676174735f,
o=0x604000665fa0 "27/73/277342", ext=0x7ffff61c04a0 "timprints")
at gdk_heap.c:733
1 0x00007ffff60cf190 in IMPSremove (b=0x61500058a010) at gdk_imprints.c:892
2 0x00007ffff60cf525 in IMPSdestroy (b=0x61500058a010) at gdk_imprints.c:909
3 0x00007ffff5ce3959 in BATdelete (b=0x61500058a010) at gdk_storage.c:828
4 0x00007ffff59cbe49 in BBPdestroy (b=0x61500058a010) at gdk_bbp.c:2527
5 0x00007ffff59c364f in decref (i=98018, logical=1, releaseShare=0, lock=1)
at gdk_bbp.c:2260
6 0x00007ffff59c3acc in BBPdecref (i=98018, logical=1) at gdk_bbp.c:2291
7 0x00007ffff660d287 in runMALsequence (cntxt=0x62e0000006f0,
mb=0x61100036c1d0, startpc=34, stoppc=35, stk=0x6210000de910, env=0x0,
pcicaller=0x0) at mal_interpreter.c:816
8 0x00007ffff6616781 in DFLOWworker (T=0x7ffff6e2e978 <workers+56>)
at mal_dataflow.c:362
9 0x00007ffff45fe0a4 in start_thread (arg=0x7fffcec1e700)
at pthread_create.c:309
10 0x00007ffff4333c2d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Notice that the 'h' parameter to HEAPdelete is gibberish (its value (not the pointed-to data) is the ASCII "_staging", which is actually a snippet of the name of one of our tables). Looking in IMPSremove() for where that value came from, the entirety of 'imprints' has been overwritten by unrelated data.

At this stage I moved on to printf debugging. Here's the complete history of the imprint 0x6070000a0e00 since mserver5 started:

926:7fffe6584700 bat2 0x615000023810 ((nil)=0x6070000a0e00) 0x7ffff595f32e 0x7ffff596f546 0x7ffff596fd04
1028:7ffda0d08700 viewcreate 0x615000171b90=0x615000023810/0x615000023810 ((nil) 0x6070000a0e00 = (nil) 0x6070000a0e00) 0x7ffff55c2d0a 0x7ffff61276ba 0x7fffeab44a66
1227:7fffe5b76700 viewcreate 0x61500024d190=0x615000023810/0x615000023810 ((nil) 0x6070000a0e00 = (nil) 0x6070000a0e00) 0x7ffff55c2d0a 0x7ffff61276ba 0x7fffeab44a66
1228:7fffe1514700 viewcreate 0x61500027b310=0x61500024d190/0x61500024d190 ((nil) 0x6070000a0e00 = (nil) 0x6070000a0e00) 0x7ffff55c2d0a 0x7ffff61276ba 0x7ffff6913614
3234:7fffe7499700 viewcreate 0x6150003f6e10=0x615000023810/0x615000023810 ((nil) 0x6070000a0e00 = (nil) 0x6070000a0e00) 0x7ffff55c2d0a 0x7ffff61276ba 0x7fffeab44a66
3235:7fffe1f22700 viewcreate 0x6150001f3b90=0x6150003f6e10/0x6150003f6e10 ((nil) 0x6070000a0e00 = (nil) 0x6070000a0e00) 0x7ffff55c2d0a 0x7ffff61276ba 0x7ffff6913614
10648:7fffe475a700 viewcreate 0x61500019d290=0x615000023810/0x615000023810 ((nil) 0x6070000a0e00 = (nil) 0x6070000a0e00) 0x7ffff55c2d0a 0x7ffff61276ba 0x7fffeab44a66
10649:7fffe2930700 viewcreate 0x615000521510=0x61500019d290/0x61500019d290 ((nil) 0x6070000a0e00 = (nil) 0x6070000a0e00) 0x7ffff55c2d0a 0x7ffff61276ba 0x7ffff6913614
17401:7fffe79a0700 viewcreate 0x6150002ce510=0x615000023810/0x615000023810 ((nil) 0x6070000a0e00 = (nil) 0x6070000a0e00) 0x7ffff55c2d0a 0x7ffff61276ba 0x7fffeab44a66
17402:7ffda0d08700 viewcreate 0x615000316090=0x6150002ce510/0x6150002ce510 ((nil) 0x6070000a0e00 = (nil) 0x6070000a0e00) 0x7ffff55c2d0a 0x7ffff61276ba 0x7ffff6913614
23746:7fffe1514700 viewcreate 0x6150004de210=0x615000023810/0x615000023810 ((nil) 0x6070000a0e00 = (nil) 0x6070000a0e00) 0x7ffff55c2d0a 0x7ffff61276ba 0x7fffeab44a66
23747:7ffda0d08700 viewcreate 0x61500028f810=0x6150004de210/0x6150004de210 ((nil) 0x6070000a0e00 = (nil) 0x6070000a0e00) 0x7ffff55c2d0a 0x7ffff61276ba 0x7ffff6913614
31248:7fffe6f92700 viewcreate 0x615000418790=0x615000023810/0x615000023810 ((nil) 0x6070000a0e00 = (nil) 0x6070000a0e00) 0x7ffff55c2d0a 0x7ffff61276ba 0x7fffeab44a66
31249:7fffe83ae700 viewcreate 0x61500023e190=0x615000418790/0x615000418790 ((nil) 0x6070000a0e00 = (nil) 0x6070000a0e00) 0x7ffff55c2d0a 0x7ffff61276ba 0x7ffff6913614
355379:7ffda0d08700 viewcreate 0x615000641490=0x615000023810/0x615000023810 ((nil) 0x6070000a0e00 = (nil) 0x6070000a0e00) 0x7ffff55c2d0a 0x7ffff61276ba 0x7fffeab44a66
355847:7fffdbb3c700 viewcreate 0x615000772210=0x615000023810/0x615000023810 ((nil) 0x6070000a0e00 = (nil) 0x6070000a0e00) 0x7ffff55c2d0a 0x7ffff61276ba 0x7fffeab44a66
355848:7fffdbb3c700 viewcreate 0x615000245990=0x615000772210/0x615000772210 ((nil) 0x6070000a0e00 = (nil) 0x6070000a0e00) 0x7ffff5995965 0x7ffff5999ef7 0x7ffff5d748df
355849:7fffe83ae700 impsremove 0x615000772210 (0x6070000a0e00=NULL) 0x7ffff60cf525 0x7ffff5ce3959 0x7ffff59cbe49
362783:7fffe2930700 viewcreate 0x61500053de90=0x615000023810/0x615000023810 ((nil) 0x6070000a0e00 = (nil) 0x6070000a0e00) 0x7ffff55c2d0a 0x7ffff61276ba 0x7fffeab44a66
362784:7fffe4f67700 viewcreate 0x61500055e190=0x61500053de90/0x61500053de90 ((nil) 0x6070000a0e00 = (nil) 0x6070000a0e00) 0x7ffff55c2d0a 0x7ffff61276ba 0x7ffff6913614
370993:7fffe333e700 viewcreate 0x61500058a010=0x615000023810/0x615000023810 ((nil) 0x6070000a0e00 = (nil) 0x6070000a0e00) 0x7ffff55c2d0a 0x7ffff61276ba 0x7fffeab44a66
370994:7fffe333e700 viewcreate 0x61500044b190=0x61500058a010/0x61500058a010 ((nil) 0x6070000a0e00 = (nil) 0x6070000a0e00) 0x7ffff5995965 0x7ffff5999ef7 0x7ffff5d748df
370995:7fffcec1e700 impsremove 0x61500058a010 (0x6070000a0e00=NULL) 0x7ffff60cf525 0x7ffff5ce3959 0x7ffff59cbe49

The layout is:
[line number in log file]:[pthread_self] [event] [BAT owning imprint][more event-specific data] [caller's intruction pointer] [caller's caller] [caller's caller's caller]

'bat2' is BATimprints() at gdk_imprints.c:789 (my line numbers might differ from yours due to the insertion of printfs). The parameters are [BAT] ([oldimprintsvalue]=[newimprintsvalue]

'viewcreate' is VIEWcreate_() at gdk_align.c:350. The parameters are [bn]=[h]/[t] ([bn->H->imprints] [bn->T->imprints] = [h->H->prints] [t->T->imprints])

'impsremove' is IMPSremove() at gdk_imprints.c:890. The parameters are [b] ([b->T->imprints]=NULL)

Notice that IMPSremove() is called for the same imprint pointer value without any intermediate creation event, hence me titling this bug report a double-free. Also notice that the BAT parameter to IMPSremove came from a prior call to VIEWcreate(), which seems suspicious.

The stack from calling IMPSremove() (from the last three parameters) is:
1 0x7ffff60cf525 IMPSdestroy() gdk_imprints.c:912
2 0x7ffff5ce3959 BATdelete() gdk_storage.c:830
3 0x7ffff59cbe49 BBPdestroy() gdk_bbp.c:2529

The stack from calling VIEWcreate_() is either:
1 0x7ffff55c2d0a BATslice() gdk_batop.c:816
2 0x7ffff61276ba BATproject() gdk_join.c:2944
3 0x7fffeab44a66 DELTAproject() sql.c:2198
or
1 0x7ffff5995965 VIEWcreate() gdk_align.c:373
2 0x7ffff5999ef7 VIEWreset() gdk_align.c:660
3 0x7ffff5d748df BATsetaccess() gdk_bat.c:2584

That's all the facts I've got. Now I move on to unsubstantiated hypotheses.

How's this for a fix?

diff -r d106b7648549 gdk/gdk_align.c
--- a/gdk/gdk_align.c Tue Oct 07 13:26:20 2014 +0100
+++ b/gdk/gdk_align.c Wed Oct 08 12:23:24 2014 +0100
@@ -566,6 +566,12 @@
b->H->hash = NULL;
if (tpb && b->T->hash && b->T->hash == tpb->H->hash)
b->T->hash = NULL;
+

  •           /* unlink imprints shared with parent */
    
  •           if (hpb && b->H->imprints && b->H->imprints == hpb->H->imprints)
    
  •                   b->H->imprints = NULL;
    
  •           if (tpb && b->T->imprints && b->T->imprints == tpb->H->imprints)
    
  •                   b->T->imprints = NULL;
      }
    

}

Comment 20264

Date: 2014-10-08 13:32:54 +0200
From: Richard Hughes <<richard.monetdb>>

Created attachment 301
history of imprint 0x6070000a0e00

Hmm, Bugzilla mangled by line breaks in the data dump above. Here's a copy of the same text as an attachment.

Attached file: monetdb-bug3599-imprints-6070000a0e00.txt (text/plain, 3676 bytes)
Description: history of imprint 0x6070000a0e00

Comment 20270

Date: 2014-10-08 18:34:41 +0200
From: MonetDB Mercurial Repository <>

Changeset 108f44871d45 made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=108f44871d45

Changeset description:

Unlink imprints shared with parent.
Patch submitted by Richard Hughes.
This fixes bug #3599.

Comment 20271

Date: 2014-10-08 18:35:23 +0200
From: @sjoerdmullender

Thanks Richard. Your analysis makes perfect sense. I have applied your patch.

Comment 20385

Date: 2014-10-31 14:14:40 +0100
From: @sjoerdmullender

Oct2014 has been released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant