Using a savepoint in a session may crash the database if no commit is done and another session is open thereafter.
From the merovingian.log:
[...]
2015-10-28 16:52:17 MSG db[2428]: loading sql script: 99_system.sql
2015-10-28 16:52:17 MSG merovingian[2420]: proxying client (local) for database 'db' to mapi:monetdb:///tmp/farm/db/.mapi.sock?database=db
2015-10-28 16:52:17 MSG merovingian[2420]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying
2015-10-28 16:52:17 MSG merovingian[2420]: proxying client (local) for database 'db' to mapi:monetdb:///tmp/farm/db/.mapi.sock?database=db
2015-10-28 16:52:17 MSG merovingian[2420]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying
2015-10-28 16:52:17 ERR db[2428]: *** Error in `/data/securactive/main/sact.nova/parts/monetdb/bin/mserver5': free(): invalid pointer: 0x00007fc70c162fd0 ***
2015-10-28 16:52:17 MSG merovingian[2420]: database 'db' (2428) was killed by signal SIGABRT
The first connection executed a simple table creation in a savepoint, the other immediately crashed the database.
Reproducible: Always
Steps to Reproduce:
The following shell script list all the steps to reproduce:
!/bin/sh
trigger the crash
mclient -a "$db" << SQL
SAVEPOINT failsafe;
-- need to do something
create table blublu (x int);
RELEASE SAVEPOINT failsafe;
-- do not commit
SQL
mclient -a "$db" reconnect to trigger the crash
In addition to what my coworker said, I can tell that I also get the crash (different Linux distribution -Gentoo- and tools version, so different environment.) So I'm reopening this bug. We can provide more details if necessary.
I've performed the following steps:
took the most recent Jul2015 branch
./bootstrap
./configure --prefix=/some/where (no other flags)
make -j8
make install
cd /some/where
bin/monetdbd create /tmp/bug3840
bin/monetdbd set port=53000 /tmp/bug3840
bin/monetdbd start /tmp/bug3840
bin/monetdb -p 53000 create db
bin/monetdb -p 53000 start db
The rest produces the same thing that Kevin described:
fjolliton@workstation $ bin/mclient -p 53000 -a db
Welcome to mclient, the MonetDB/SQL interactive terminal (unreleased)
Database: MonetDB v11.21.12 (unreleased), 'mapi:monetdb://workstation:53000/db'
Type \q to quit, ? for a list of available commands
auto commit mode: off
sql>SAVEPOINT failsafe;
auto commit mode: off
sql>create table blublu (x int);
operation successful (0.647ms)
sql>RELEASE SAVEPOINT failsafe;
auto commit mode: off
sql>^D
fjolliton@workstation $ bin/mclient -p 53000 -a db
<NOTHING - MonetDB crashed>
fjolliton@workstation $ tail /tmp/bug3840/merovagian.log
2015-11-02 09:43:10 MSG db[16208]: loading sql script: 80_udf.sql
2015-11-02 09:43:10 MSG db[16208]: loading sql script: 80_udf_hge.sql
2015-11-02 09:43:10 MSG db[16208]: loading sql script: 90_generator.sql
2015-11-02 09:43:10 MSG db[16208]: loading sql script: 90_generator_hge.sql
2015-11-02 09:43:10 MSG db[16208]: loading sql script: 99_system.sql
2015-11-02 09:43:24 MSG merovingian[16157]: proxying client (local) for database 'db' to mapi:monetdb:///tmp/bug3840/db/.mapi.sock?database=db
2015-11-02 09:43:24 MSG merovingian[16157]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying
2015-11-02 09:43:45 MSG merovingian[16157]: proxying client (local) for database 'db' to mapi:monetdb:///tmp/bug3840/db/.mapi.sock?database=db
2015-11-02 09:43:45 MSG merovingian[16157]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying
2015-11-02 09:43:45 MSG merovingian[16157]: database 'db' (16208) was killed by signal SIGSEGV
$ bin/mserver5 --version
MonetDB 5 server v11.21.12 (64-bit, 64-bit oids, 128-bit integers)
This is an unreleased version
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2015 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 15.6GiB available memory, 4 available cpu cores
Libraries:
libpcre: 8.36 2014-09-26 (compiled with 8.36)
openssl: OpenSSL 1.0.1g 7 Apr 2014 (compiled with OpenSSL 1.0.1g 7 Apr 2014)
libxml2: 2.9.1 (compiled with 2.9.1)
Compiled by: fjolliton@workstation (x86_64-unknown-linux-gnu)
Compilation: gcc -g -Werror -Wall -Wextra -W -Werror-implicit-function-declaration -Wpointer-arith -Wdeclaration-after-statement -Wundef -Wformat=2 -Wno-format-nonliteral -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wpacked -Wunknown-pragmas -Wvariadic-macros -fstack-protector-all -Wstack-protector -Wpacked-bitfield-compat -Wsync-nand -Wjump-misses-init -Wmissing-include-dirs -Wlogical-op -Wunreachable-code
Linking : /usr/x86_64-pc-linux-gnu/bin/ld -m elf_x86_64
We upgraded our test database to the latest Jul2015 (from today, november 17th) and we still have the same crash. We're building it from scratch each time to ensure that we do not rely of an unclean state.
Are you confirming that no crash occurs on your side on the latest Jul2015?
I was able to reproduce this. Looks like a double free. Here is the stack trace, not the address given to GDKfree.
0 0x00007ffff7277bac in GDKfree (blk=0xdbdbdbdbdbdbdbdb)
at /ufs/sjoerd/src/MonetDB/stable/gdk/gdk_utils.c:724
1 0x00007fffe9088fb7 in destroy_dbat (tr=0x0, bat=0x7fffd4229b80)
at /ufs/sjoerd/src/MonetDB/stable/sql/storage/bat/bat_storage.c:1392
2 0x00007fffe908911b in destroy_del (tr=0x0, t=0x1f9e260)
at /ufs/sjoerd/src/MonetDB/stable/sql/storage/bat/bat_storage.c:1414
3 0x00007fffe9076270 in reset_table (tr=0x1eaa1e0, ft=0x1f9e260,
pft=0x1f93c70) at /ufs/sjoerd/src/MonetDB/stable/sql/storage/store.c:3162
4 0x00007fffe9075bf2 in reset_changeset (tr=0x1eaa1e0, fs=0x1f9abc0,
pfs=0x1f90670, b=0x1f9ab90, rf=0x7fffe90761d0 <reset_table>,
fd=0x7fffe90726f3 <table_dup>)
at /ufs/sjoerd/src/MonetDB/stable/sql/storage/store.c:3048
5 0x00007fffe907678a in reset_schema (tr=0x1eaa1e0, fs=0x1f9ab90,
pfs=0x1f90640) at /ufs/sjoerd/src/MonetDB/stable/sql/storage/store.c:3240
6 0x00007fffe9075bf2 in reset_changeset (tr=0x1eaa1e0, fs=0x1eaa210,
pfs=0x1eaa180, b=0x1eaa150, rf=0x7fffe90764bc <reset_schema>,
fd=0x7fffe9073282 <schema_dup>)
at /ufs/sjoerd/src/MonetDB/stable/sql/storage/store.c:3048
7 0x00007fffe907688e in reset_trans (tr=0x1eaa1e0, ptr=0x1eaa150)
at /ufs/sjoerd/src/MonetDB/stable/sql/storage/store.c:3257
8 0x00007fffe907ed27 in sql_trans_begin (s=0x7fffd41b3080)
at /ufs/sjoerd/src/MonetDB/stable/sql/storage/store.c:5187
9 0x00007fffe8fe0939 in mvc_trans (m=0x7fffd41b3570)
at /ufs/sjoerd/src/MonetDB/stable/sql/server/sql_mvc.c:169
10 0x00007fffe8f28d2c in monet5_user_set_def_schema (m=0x7fffd41b3570, user=0)
at /ufs/sjoerd/src/MonetDB/stable/sql/backends/monet5/sql_user.c:470
11 0x00007fffe8f2ab26 in SQLinitClient (c=0x7fffea70a328)
at /ufs/sjoerd/src/MonetDB/stable/sql/backends/monet5/sql_scenario.c:458
12 0x00007ffff7929254 in runPhase (c=0x7fffea70a328, phase=5)
at /ufs/sjoerd/src/MonetDB/stable/monetdb5/mal/mal_scenario.c:515
13 0x00007ffff792936e in runScenarioBody (c=0x7fffea70a328)
at /ufs/sjoerd/src/MonetDB/stable/monetdb5/mal/mal_scenario.c:542
14 0x00007ffff79295a6 in runScenario (c=0x7fffea70a328)
at /ufs/sjoerd/src/MonetDB/stable/monetdb5/mal/mal_scenario.c:579
15 0x00007ffff792b057 in MSserveClient (dummy=0x7fffea70a328)
at /ufs/sjoerd/src/MonetDB/stable/monetdb5/mal/mal_session.c:439
16 0x00007ffff792aa50 in MSscheduleClient (command=0x7fffd41994d0 "",
challenge=0x7fffdf813d70 "um3uQq5g", fin=0x7fffd413ea50,
fout=0x7fffd416faf0)
at /ufs/sjoerd/src/MonetDB/stable/monetdb5/mal/mal_session.c:319
17 0x00007ffff7a1ab1a in doChallenge (data=0x7fffd0000a60)
at /ufs/sjoerd/src/MonetDB/stable/monetdb5/modules/mal/mal_mapi.c:184
18 0x00007ffff7340fe5 in thread_starter (arg=0x7fffd0000e70)
at /ufs/sjoerd/src/MonetDB/stable/gdk/gdk_system.c:458
19 0x00007ffff4abf555 in start_thread () from /lib64/libpthread.so.0
20 0x00007ffff47fab9d in clone () from /lib64/libc.so.6
Date: 2015-12-29 15:06:19 +0100
From: Kevin Boulain <<kevin.boulain>>
Testing with the Jul2015 branch, we do not encounter the problem described here any more.
Do you know which particular commit fixed it (is there a unit test for this special case?).
Date: 2015-10-28 17:04:36 +0100
From: Kevin Boulain <<kevin.boulain>>
To: SQL devs <>
Version: 11.21.5 (Jul2015)
CC: frederic.jolliton+monetdb, @njnes
Last updated: 2016-01-15 11:38:02 +0100
Comment 21416
Date: 2015-10-28 17:04:36 +0100
From: Kevin Boulain <<kevin.boulain>>
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:39.0) Gecko/20100101 Firefox/39.0
Build Identifier:
Using a savepoint in a session may crash the database if no commit is done and another session is open thereafter.
From the merovingian.log:
[...]
2015-10-28 16:52:17 MSG db[2428]: loading sql script: 99_system.sql
2015-10-28 16:52:17 MSG merovingian[2420]: proxying client (local) for database 'db' to mapi:monetdb:///tmp/farm/db/.mapi.sock?database=db
2015-10-28 16:52:17 MSG merovingian[2420]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying
2015-10-28 16:52:17 MSG merovingian[2420]: proxying client (local) for database 'db' to mapi:monetdb:///tmp/farm/db/.mapi.sock?database=db
2015-10-28 16:52:17 MSG merovingian[2420]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying
2015-10-28 16:52:17 ERR db[2428]: *** Error in `/data/securactive/main/sact.nova/parts/monetdb/bin/mserver5': free(): invalid pointer: 0x00007fc70c162fd0 ***
2015-10-28 16:52:17 MSG merovingian[2420]: database 'db' (2428) was killed by signal SIGABRT
The first connection executed a simple table creation in a savepoint, the other immediately crashed the database.
Reproducible: Always
Steps to Reproduce:
The following shell script list all the steps to reproduce:
!/bin/sh
farm=/tmp/farm
db=db
usual setup
monetdbd create "$farm"
monetdbd start "$farm"
monetdb create "$db"
monetdb release "$db"
trigger the crash
mclient -a "$db" << SQL
SAVEPOINT failsafe;
-- need to do something
create table blublu (x int);
RELEASE SAVEPOINT failsafe;
-- do not commit
SQL
mclient -a "$db" reconnect to trigger the crash
cleanup
tail "$farm/merovingian.log"
monetdbd stop "$farm"
rm -rf "$farm"
Actual Results:
Database is crashing.
Expected Results:
Database should not crash?
Reproduced on rel-Jul2015 (commit was: http://dev.monetdb.org/hg/MonetDB/rev/1290110df036)
Comment 21425
Date: 2015-10-30 22:38:33 +0100
From: @njnes
I cannot reproduce this on jul2015 branch.
Comment 21426
Date: 2015-11-02 11:01:33 +0100
From: Frédéric Jolliton <<frederic.jolliton+monetdb>>
In addition to what my coworker said, I can tell that I also get the crash (different Linux distribution -Gentoo- and tools version, so different environment.) So I'm reopening this bug. We can provide more details if necessary.
I've performed the following steps:
The rest produces the same thing that Kevin described:
fjolliton@workstation $ bin/mclient -p 53000 -a db
Welcome to mclient, the MonetDB/SQL interactive terminal (unreleased)
Database: MonetDB v11.21.12 (unreleased), 'mapi:monetdb://workstation:53000/db'
Type \q to quit, ? for a list of available commands
auto commit mode: off
sql>SAVEPOINT failsafe;
auto commit mode: off
sql>create table blublu (x int);
operation successful (0.647ms)
sql>RELEASE SAVEPOINT failsafe;
auto commit mode: off
sql>^D
fjolliton@workstation $ bin/mclient -p 53000 -a db
<NOTHING - MonetDB crashed>
fjolliton@workstation $ tail /tmp/bug3840/merovagian.log
2015-11-02 09:43:10 MSG db[16208]: loading sql script: 80_udf.sql
2015-11-02 09:43:10 MSG db[16208]: loading sql script: 80_udf_hge.sql
2015-11-02 09:43:10 MSG db[16208]: loading sql script: 90_generator.sql
2015-11-02 09:43:10 MSG db[16208]: loading sql script: 90_generator_hge.sql
2015-11-02 09:43:10 MSG db[16208]: loading sql script: 99_system.sql
2015-11-02 09:43:24 MSG merovingian[16157]: proxying client (local) for database 'db' to mapi:monetdb:///tmp/bug3840/db/.mapi.sock?database=db
2015-11-02 09:43:24 MSG merovingian[16157]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying
2015-11-02 09:43:45 MSG merovingian[16157]: proxying client (local) for database 'db' to mapi:monetdb:///tmp/bug3840/db/.mapi.sock?database=db
2015-11-02 09:43:45 MSG merovingian[16157]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying
2015-11-02 09:43:45 MSG merovingian[16157]: database 'db' (16208) was killed by signal SIGSEGV
$ bin/mserver5 --version
MonetDB 5 server v11.21.12 (64-bit, 64-bit oids, 128-bit integers)
This is an unreleased version
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2015 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 15.6GiB available memory, 4 available cpu cores
Libraries:
libpcre: 8.36 2014-09-26 (compiled with 8.36)
openssl: OpenSSL 1.0.1g 7 Apr 2014 (compiled with OpenSSL 1.0.1g 7 Apr 2014)
libxml2: 2.9.1 (compiled with 2.9.1)
Compiled by: fjolliton@workstation (x86_64-unknown-linux-gnu)
Compilation: gcc -g -Werror -Wall -Wextra -W -Werror-implicit-function-declaration -Wpointer-arith -Wdeclaration-after-statement -Wundef -Wformat=2 -Wno-format-nonliteral -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wpacked -Wunknown-pragmas -Wvariadic-macros -fstack-protector-all -Wstack-protector -Wpacked-bitfield-compat -Wsync-nand -Wjump-misses-init -Wmissing-include-dirs -Wlogical-op -Wunreachable-code
Linking : /usr/x86_64-pc-linux-gnu/bin/ld -m elf_x86_64
Comment 21535
Date: 2015-11-17 12:53:16 +0100
From: Frédéric Jolliton <<frederic.jolliton+monetdb>>
Some update on this bug.
We upgraded our test database to the latest Jul2015 (from today, november 17th) and we still have the same crash. We're building it from scratch each time to ensure that we do not rely of an unclean state.
Are you confirming that no crash occurs on your side on the latest Jul2015?
Comment 21558
Date: 2015-11-19 17:29:06 +0100
From: @sjoerdmullender
I was able to reproduce this. Looks like a double free. Here is the stack trace, not the address given to GDKfree.
0 0x00007ffff7277bac in GDKfree (blk=0xdbdbdbdbdbdbdbdb)
at /ufs/sjoerd/src/MonetDB/stable/gdk/gdk_utils.c:724
1 0x00007fffe9088fb7 in destroy_dbat (tr=0x0, bat=0x7fffd4229b80)
at /ufs/sjoerd/src/MonetDB/stable/sql/storage/bat/bat_storage.c:1392
2 0x00007fffe908911b in destroy_del (tr=0x0, t=0x1f9e260)
at /ufs/sjoerd/src/MonetDB/stable/sql/storage/bat/bat_storage.c:1414
3 0x00007fffe9076270 in reset_table (tr=0x1eaa1e0, ft=0x1f9e260,
pft=0x1f93c70) at /ufs/sjoerd/src/MonetDB/stable/sql/storage/store.c:3162
4 0x00007fffe9075bf2 in reset_changeset (tr=0x1eaa1e0, fs=0x1f9abc0,
pfs=0x1f90670, b=0x1f9ab90, rf=0x7fffe90761d0 <reset_table>,
fd=0x7fffe90726f3 <table_dup>)
at /ufs/sjoerd/src/MonetDB/stable/sql/storage/store.c:3048
5 0x00007fffe907678a in reset_schema (tr=0x1eaa1e0, fs=0x1f9ab90,
pfs=0x1f90640) at /ufs/sjoerd/src/MonetDB/stable/sql/storage/store.c:3240
6 0x00007fffe9075bf2 in reset_changeset (tr=0x1eaa1e0, fs=0x1eaa210,
pfs=0x1eaa180, b=0x1eaa150, rf=0x7fffe90764bc <reset_schema>,
fd=0x7fffe9073282 <schema_dup>)
at /ufs/sjoerd/src/MonetDB/stable/sql/storage/store.c:3048
7 0x00007fffe907688e in reset_trans (tr=0x1eaa1e0, ptr=0x1eaa150)
at /ufs/sjoerd/src/MonetDB/stable/sql/storage/store.c:3257
8 0x00007fffe907ed27 in sql_trans_begin (s=0x7fffd41b3080)
at /ufs/sjoerd/src/MonetDB/stable/sql/storage/store.c:5187
9 0x00007fffe8fe0939 in mvc_trans (m=0x7fffd41b3570)
at /ufs/sjoerd/src/MonetDB/stable/sql/server/sql_mvc.c:169
10 0x00007fffe8f28d2c in monet5_user_set_def_schema (m=0x7fffd41b3570, user=0)
at /ufs/sjoerd/src/MonetDB/stable/sql/backends/monet5/sql_user.c:470
11 0x00007fffe8f2ab26 in SQLinitClient (c=0x7fffea70a328)
at /ufs/sjoerd/src/MonetDB/stable/sql/backends/monet5/sql_scenario.c:458
12 0x00007ffff7929254 in runPhase (c=0x7fffea70a328, phase=5)
at /ufs/sjoerd/src/MonetDB/stable/monetdb5/mal/mal_scenario.c:515
13 0x00007ffff792936e in runScenarioBody (c=0x7fffea70a328)
at /ufs/sjoerd/src/MonetDB/stable/monetdb5/mal/mal_scenario.c:542
14 0x00007ffff79295a6 in runScenario (c=0x7fffea70a328)
at /ufs/sjoerd/src/MonetDB/stable/monetdb5/mal/mal_scenario.c:579
15 0x00007ffff792b057 in MSserveClient (dummy=0x7fffea70a328)
at /ufs/sjoerd/src/MonetDB/stable/monetdb5/mal/mal_session.c:439
16 0x00007ffff792aa50 in MSscheduleClient (command=0x7fffd41994d0 "",
challenge=0x7fffdf813d70 "um3uQq5g", fin=0x7fffd413ea50,
fout=0x7fffd416faf0)
at /ufs/sjoerd/src/MonetDB/stable/monetdb5/mal/mal_session.c:319
17 0x00007ffff7a1ab1a in doChallenge (data=0x7fffd0000a60)
at /ufs/sjoerd/src/MonetDB/stable/monetdb5/modules/mal/mal_mapi.c:184
18 0x00007ffff7340fe5 in thread_starter (arg=0x7fffd0000e70)
at /ufs/sjoerd/src/MonetDB/stable/gdk/gdk_system.c:458
19 0x00007ffff4abf555 in start_thread () from /lib64/libpthread.so.0
20 0x00007ffff47fab9d in clone () from /lib64/libc.so.6
Comment 21559
Date: 2015-11-19 17:35:25 +0100
From: @sjoerdmullender
(In reply to Sjoerd Mullender from comment 4)
Comment 21693
Date: 2015-12-25 09:54:50 +0100
From: @njnes
the crash we saw was fixed recently.
Comment 21697
Date: 2015-12-29 15:06:19 +0100
From: Kevin Boulain <<kevin.boulain>>
Testing with the Jul2015 branch, we do not encounter the problem described here any more.
Do you know which particular commit fixed it (is there a unit test for this special case?).
Comment 21704
Date: 2015-12-31 02:13:47 +0100
From: @sjoerdmullender
(In reply to Kevin Boulain from comment 7)
According to hg bisect, that was changeset 93e7f9dbca06
The text was updated successfully, but these errors were encountered: