Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on vacuum with parallel updates #4048

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

Segfault on vacuum with parallel updates #4048

monetdb-team opened this issue Nov 30, 2020 · 0 comments

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2016-08-01 15:24:06 +0200
From: anthonin.bonnefoy
To: SQL devs <>
Version: 11.23.7 (Jun2016-SP1)
CC: frederic.jolliton+monetdb, @njnes, richard.monetdb, @yzchang

Last updated: 2017-01-26 14:56:27 +0100

Comment 22275

Date: 2016-08-01 15:24:06 +0200
From: anthonin.bonnefoy

User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0
Build Identifier:

Calling vacuum with to segfault and even corrupt database (depending on which table is vacuumed)

Reproducible: Always

Steps to Reproduce:

Launch the given script:


!/bin/bash
set -e

mclient nova -s "create schema sact;" 2> /dev/null || true

line="9GJ3152\t1467287703373954\t1467287703759937\t3\t3\t10\t\t62438190489824\t116350668306\t3\t0\t3232238295\t\t3223098188\t\t\t\t55460\t443\t0\t0\t0\t0\t52\t52\t\t46\tEthernet/IPv4/TCP\t21ae5e54-637e-405f-99f5-41b93d9d769a\t258\t132\t4\t2\t0\t0\t1\t1\t0\t0\t0\t0\t0\t1\t1\t0\t0\t0\t1\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t1\t274\t75076\t1\t150529\t22658979841\t0\t0\t0\t0\t0\t0\t0\t0\t0\t1\t605119\t366169004161\t\n"
rm -f /tmp/data
for (( i = 0; i < 10; i++ )); do
echo -e -n $line >> /tmp/data
done

function drop_tables() {
echo "Launch drop of tables"
tables=$(mclient nova -f csv -s "select name from _tables where schema_id > 5000 ;")
drop_query=""
local i=0
for table in $tables ; do
drop_query="$drop_query drop table sact.$table;"
i=$((i + 1))
if [[ $i == 25 ]]; then
break
fi
done
mclient nova -a -s "$drop_query; commit; rollback;" > /dev/null
}

function update_tables() {
echo "Launch update"
tables=$(mclient nova -f csv -s "select name from _tables where schema_id > 5000 order by id desc;")
local i=0
for table in $tables ; do
mclient nova -s "update sact.$table set toto1 = 'C8CCF6D5-A7B7-449E-911E-5D9082D73640' where toto1 = '9GJ3152';" > /dev/null
if [[ $i == 10 ]]; then
break
fi
i=$((i + 1))
done
echo "Update finished"
}

drop_tables

for (( i = 0; i < 10000; i++ )); do
table="sact.test_$i"

 create_query="$create_query CREATE TABLE $table (toto1 TEXT, toto2 BIGINT, toto3 BIGINT, toto4 SMALLINT, toto5 SMALLINT, toto6 INT, toto7 INT, toto8 BIGINT, toto9 BIGINT, toto10 INT, toto11 INT, toto12 BIGINT, toto13 HUGEINT, toto14 BIGINT, toto15 HUGEINT, toto16 BIGINT, toto17 HUGEINT, toto18 INT, toto19 INT, toto20 SMALLINT, toto21 SMALLINT, toto22 SMALLINT, toto23 SMALLINT, toto24 INT, toto25 INT, toto26 TEXT, toto27 INT, toto28 TEXT, toto29 UUID, toto30 BIGINT, toto31 BIGINT, toto32 BIGINT, toto33 BIGINT, toto34 BIGINT, toto35 BIGINT, toto36 BIGINT, toto37 BIGINT, toto38 BIGINT, toto39 BIGINT, toto40 BIGINT, toto41 BIGINT, toto42 BIGINT, toto43 BIGINT, toto44 BIGINT, toto45 BIGINT, toto46 BIGINT, toto47 BIGINT, toto48 BIGINT, toto49 BIGINT, toto50 BIGINT, toto51 INT, toto52 INT, toto53 BIGINT, toto54 BIGINT, toto55 HUGEINT, toto56 BIGINT, toto57 BIGINT, toto58 HUGEINT, toto59 BIGINT, toto60 BIGINT, toto61 HUGEINT, toto62 BIGINT, toto63 BIGINT, toto64 HUGEINT, toto65 BIGINT, toto66 BIGINT, toto67 HUGEINT, toto68 BIGINT, toto69 BIGINT, toto70 HUGEINT, toto71 BIGINT, toto72 BIGINT, toto73 HUGEINT, toto74 BIGINT, toto75 BIGINT, toto76 HUGEINT, toto77 UUID);"
 create_query="$create_query COPY INTO $table FROM '/tmp/data' DELIMITERS '\t','\n','\"' NULL AS '<NULL>';"

 if [[ $((i % 4)) == 0 ]]; then
     echo "Launching create at $i"
     mclient nova -a -s "$create_query; commit; rollback;" > /dev/null
     create_query=""
 fi

 if [[ $((i % 10)) == 0 ]]; then
     update_tables &
 fi

 if [[ $((i % 20)) == 0 ]]; then
     drop_tables
 fi

  if [[ $((i % 30)) == 0 ]]; then
      mclient nova -s "call sys.vacuum('sys', '_tables');" 2> /dev/null || true
      mclient nova -s "call sys.vacuum('sys', '_columns');" 2> /dev/null || true
  fi

done

Actual Results:

With only _columns vacuumed, i had a double free occurring:

0 0x00007fa9248ad067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
1 0x00007fa9248ae448 in __GI_abort () at abort.c:89
2 0x00007fa9248eb1b4 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7fa9249e0530 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
3 0x00007fa9248f098e in malloc_printerr (action=1, str=0x7fa9249e0638 "double free or corruption (!prev)", ptr=) at malloc.c:4996
4 0x00007fa9248f1696 in _int_free (av=, p=, have_lock=0) at malloc.c:3840
5 0x00007fa926082860 in GDKfree (blk=0x7fa904497980) at gdk_utils.c:748
6 0x00007fa9260194b9 in HEAPfree (h=0x7fa9044fb7a8, remove=1) at gdk_heap.c:564
7 0x00007fa9261196f7 in BATdelete (b=0x7fa9044fb680) at gdk_storage.c:963
8 0x00007fa926011ca0 in BBPdestroy (b=0x7fa9044fb680) at gdk_bbp.c:2637
9 0x00007fa92600faed in decref (i=9647, logical=1, releaseShare=0, lock=1) at gdk_bbp.c:2381
10 0x00007fa92600fcc2 in BBPdecref (i=9647, logical=1) at gdk_bbp.c:2412
11 0x00007fa91f7e059a in vacuum (cntxt=0x7fa91fdd0330, mb=0x7fa9046d5690, stk=0x7fa904248e00, pci=0x7fa9042d6c10, func=0x7fa9267251dd , name=0x7fa91f939488 "sql.reuse") at sql.c:4549
12 0x00007fa91f7e0692 in SQLreuse (cntxt=0x7fa91fdd0330, mb=0x7fa9046d5690, stk=0x7fa904248e00, pci=0x7fa9042d6c10) at sql.c:4564
13 0x00007fa91f7e0aaf in SQLvacuum (cntxt=0x7fa91fdd0330, mb=0x7fa9046d5690, stk=0x7fa904248e00, pci=0x7fa9042d6c10) at sql.c:4629
14 0x00007fa9266a57eb in runMALsequence (cntxt=0x7fa91fdd0330, mb=0x7fa9046d5690, startpc=1, stoppc=0, stk=0x7fa904248e00, env=0x0, pcicaller=0x0) at mal_interpreter.c:631
15 0x00007fa9266a4c8b in callMAL (cntxt=0x7fa91fdd0330, mb=0x7fa9046d5690, env=0x7fa91de6cae8, argv=0x7fa91de6cb40, debug=0 '\000') at mal_interpreter.c:447
16 0x00007fa91f7ea8fe in SQLexecutePrepared (c=0x7fa91fdd0330, be=0x7fa9041805b0, q=0x7fa9042c7f80) at sql_execute.c:328
17 0x00007fa91f7ead54 in SQLengineIntern (c=0x7fa91fdd0330, be=0x7fa9041805b0) at sql_execute.c:390
18 0x00007fa91f7e98f6 in SQLengine (c=0x7fa91fdd0330) at sql_scenario.c:1323
19 0x00007fa9266c63b6 in runPhase (c=0x7fa91fdd0330, phase=4) at mal_scenario.c:515
20 0x00007fa9266c6594 in runScenarioBody (c=0x7fa91fdd0330) at mal_scenario.c:559
21 0x00007fa9266c66a4 in runScenario (c=0x7fa91fdd0330) at mal_scenario.c:579
22 0x00007fa9266c777a in MSserveClient (dummy=0x7fa91fdd0330) at mal_session.c:439
23 0x00007fa9266c72e9 in MSscheduleClient (command=0x7fa90426d850 "", challenge=0x7fa91de6ce70 "0FK6fYQkzIq", fin=0x7fa904282c30, fout=0x7fa904922810) at mal_session.c:319
24 0x00007fa926784f52 in doChallenge (data=0x7fa9100008d0) at mal_mapi.c:184
25 0x00007fa92613ce85 in thread_starter (arg=0x7fa910000d20) at gdk_system.c:459
26 0x00007fa924c2b0a4 in start_thread (arg=0x7fa91de6d700) at pthread_create.c:309
27 0x00007fa92496087d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

With only _tables vacuumed, i have the segfault from 4047

0 0x00007f2d28983498 in count_col (tr=0x23bf3b0, c=0x7f2cf528a2d0, all=1) at bat_storage.c:826
1 0x00007f2d2887e57b in SQLgetStatistics (cntxt=0x7f2d28e3a640, m=0x7f2cf8d46490, mb=0x7f2cf8a72f70) at sql_optimizer.c:168
2 0x00007f2d2887eabd in addOptimizers (c=0x7f2d28e3a640, mb=0x7f2cf8a72f70, pipe=0x7f2cf8192eb0 "default_pipe") at sql_optimizer.c:251
3 0x00007f2d2887ec78 in addQueryToCache (c=0x7f2d28e3a640) at sql_optimizer.c:293
4 0x00007f2d2887ccff in backend_dumpproc (be=0x7f2cf93dde20, c=0x7f2d28e3a640, cq=0x7f2cf8e53e10, s=0x7f2cf8a5b060) at sql_gencode.c:2815
5 0x00007f2d2884d2fa in SQLparser (c=0x7f2d28e3a640) at sql_scenario.c:1216
6 0x00007f2d2f72a3b6 in runPhase (c=0x7f2d28e3a640, phase=1) at mal_scenario.c:515
7 0x00007f2d2f72a501 in runScenarioBody (c=0x7f2d28e3a640) at mal_scenario.c:550
8 0x00007f2d2f72a6a4 in runScenario (c=0x7f2d28e3a640) at mal_scenario.c:579
9 0x00007f2d2f72b77a in MSserveClient (dummy=0x7f2d28e3a640) at mal_session.c:439
10 0x00007f2d2f72b2e9 in MSscheduleClient (command=0x7f2cf817cfd0 "", challenge=0x7f2d26ed0e70 "cF8qOyXd", fin=0x7f2cf8a7ce10, fout=0x7f2cf90554b0) at mal_session.c:319
11 0x00007f2d2f7e8f52 in doChallenge (data=0x7f2d18000a60) at mal_mapi.c:184
12 0x00007f2d2f1a0e85 in thread_starter (arg=0x7f2d18000d50) at gdk_system.c:459
13 0x00007f2d2dc8f0a4 in start_thread (arg=0x7f2d26ed1700) at pthread_create.c:309
14 0x00007f2d2d9c487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

And mclient will crash with the given message

CREATE FILTER FUNCTION: name 'like' (clob, clob, clob) already in use
ParseException:SQLparser:CREATE FILTER FUNCTION: name 'like' (clob, clob, clob) already in use

CREATE FILTER FUNCTION: name 'like' (clob, clob, clob) already in use
ParseException:SQLparser:CREATE FILTER FUNCTION: name 'like' (clob, clob, clob) already in use

When inspecting manually sys._tables name content, i find:

sys._tables name
Tail: 06/655.tail

h t name
void str type

[ 0@0, "schemas" ]
[ 1@0, "types" ]
[ 2@0, "functions" ]
[ 3@0, "args" ]
[ 4@0, "sequences" ]
[ 5@0, "dependencies" ]
[ 6@0, "connections" ]
[ 7@0, "_tables" ]
[ 8@0, "_columns" ]
[ 9@0, "keys" ]
[ 10@0, "idxs" ]
[ 11@0, "triggers" ]
[ 12@0, "objects" ]
[ 13@0, "_tables" ]
[ 14@0, "_columns" ]
[ 15@0, "keys" ]
[ 16@0, "idxs" ]
[ 17@0, "triggers" ]
[ 18@0, "objects" ]
[ 19@0, "tables" ]
[ 20@0, "columns" ]
[ 21@0, "db_user_info" ]
[ 22@0, "users" ]
[ 23@0, "user_role" ]
[ 24@0, "auths" ]
[ 25@0, "privileges" ]
[ 26@0, "querylog_catalog" ]
[ 27@0, "querylog_calls" ]
[ 28@0, "querylog_history" ]
[ 29@0, "tracelog" ]
[ 30@0, "sessions" ]
[ 31@0, "optimizers" ]
[ 32@0, "environment" ]
[ 33@0, "queue" ]
[ 34@0, "rejects" ]
[ 35@0, "keywords" ]
[ 36@0, "table_types" ]
[ 37@0, "dependency_types" ]
[ 38@0, "storage" ]
[ 39@0, "storagemodelinput" ]
[ 40@0, "test_568" ]
[ 41@0, "test_567" ]
[ 42@0, "test_566" ]
[ 43@0, "test_565" ]
[ 44@0, "test_564" ]

Tables storagemodel, tablestoragemodel, statistics and systemfunctions are missing.

Comment 22313

Date: 2016-08-19 17:02:06 +0200
From: @njnes

Is this also the case on Jun2016

Comment 22329

Date: 2016-08-29 14:48:39 +0200
From: Frédéric Jolliton <<frederic.jolliton+monetdb>>

Jun2016 shows the exact same stacktrace, pretty quickly, using the script.

The tested version:

MonetDB 5 server v11.23.8 (64-bit, 64-bit oids, 128-bit integers)
This is an unreleased version
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2016 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 15.6GiB available memory, 4 available cpu cores
Libraries:
libpcre: 8.39 2016-06-14 (compiled with 8.39)
openssl: OpenSSL 1.0.2h 3 May 2016 (compiled with OpenSSL 1.0.2h 3 May 2016)
libxml2: 2.9.4 (compiled with 2.9.4)
Compiled by: fjolliton@localhost (x86_64-pc-linux-gnu)
Compilation: gcc -O3 -fomit-frame-pointer -pipe -Werror -Wall -Wextra -W -Werror-implicit-function-declaration -Wpointer-arith -Wdeclaration-after-statement -Wundef -Wformat=2 -Wno-format-nonliteral -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wpacked -Wunknown-pragmas -Wvariadic-macros -fstack-protector-all -Wstack-protector -Wpacked-bitfield-compat -Wsync-nand -Wjump-misses-init -Wmissing-include-dirs -Wlogical-op -Wunreachable-code -D_FORTIFY_SOURCE=2
Linking : /usr/bin/ld -m elf_x86_64

Comment 22335

Date: 2016-08-29 15:16:29 +0200
From: Frédéric Jolliton <<frederic.jolliton+monetdb>>

Changing version.

Comment 24811

Date: 2016-12-18 21:04:40 +0100
From: @njnes

we now disallow vacuum on system tables. The vacuum function isn't safe enough for these tables. A better vacuum solution for the system tables is needed.

Comment 24813

Date: 2016-12-19 12:26:24 +0100
From: Richard Hughes <<richard.monetdb>>

Did 4e604f0989bc introduce a new bug?

If ordered and 0 < BATcount(del) <= cnt/20 then BBPunfix(del->batCacheid) will get called twice. Is this a double-free or is there some other protection in place that I don't know about?

[This is from reading the patch - I haven't tried executing it].

Comment 24814

Date: 2016-12-19 12:43:54 +0100
From: @sjoerdmullender

(In reply to Richard Hughes from comment 5)

Did 4e604f0989bc introduce a new bug?

If ordered and 0 < BATcount(del) <= cnt/20 then BBPunfix(del->batCacheid)
will get called twice. Is this a double-free or is there some other
protection in place that I don't know about?

[This is from reading the patch - I haven't tried executing it].

Looks like you're right. And also del is dereferenced after the unfix.

Comment 24838

Date: 2016-12-20 10:59:39 +0100
From: @yzchang

(In reply to Sjoerd Mullender from comment 6)

(In reply to Richard Hughes from comment 5)

Did 4e604f0989bc introduce a new bug?

If ordered and 0 < BATcount(del) <= cnt/20 then BBPunfix(del->batCacheid)
will get called twice. Is this a double-free or is there some other
protection in place that I don't know about?

[This is from reading the patch - I haven't tried executing it].

Looks like you're right. And also del is dereferenced after the unfix.

Just to keep the record, this is fixed in Changeset: 9931514f6477

(In reply to the original bug report)
One problem with the current vacuum function, is that after vacuuming, it doesn't update the foreign keys accordingly. So, it's a particularly dangerous action to apply vacuum on the system tables.

With Changeset: 8f3ba20b071e we now automatically vacuum the SQL catalogue tables in the background, when the DB is idle.

Comment 24916

Date: 2017-01-26 14:56:27 +0100
From: @kutsurak

Fixed in version Dec2016-SP1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant