Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

savepoints may crash the database #3840

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

savepoints may crash the database #3840

monetdb-team opened this issue Nov 30, 2020 · 0 comments

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2015-10-28 17:04:36 +0100
From: Kevin Boulain <<kevin.boulain>>
To: SQL devs <>
Version: 11.21.5 (Jul2015)
CC: frederic.jolliton+monetdb, @njnes

Last updated: 2016-01-15 11:38:02 +0100

Comment 21416

Date: 2015-10-28 17:04:36 +0100
From: Kevin Boulain <<kevin.boulain>>

User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:39.0) Gecko/20100101 Firefox/39.0
Build Identifier:

Using a savepoint in a session may crash the database if no commit is done and another session is open thereafter.

From the merovingian.log:
[...]
2015-10-28 16:52:17 MSG db[2428]: loading sql script: 99_system.sql
2015-10-28 16:52:17 MSG merovingian[2420]: proxying client (local) for database 'db' to mapi:monetdb:///tmp/farm/db/.mapi.sock?database=db
2015-10-28 16:52:17 MSG merovingian[2420]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying
2015-10-28 16:52:17 MSG merovingian[2420]: proxying client (local) for database 'db' to mapi:monetdb:///tmp/farm/db/.mapi.sock?database=db
2015-10-28 16:52:17 MSG merovingian[2420]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying
2015-10-28 16:52:17 ERR db[2428]: *** Error in `/data/securactive/main/sact.nova/parts/monetdb/bin/mserver5': free(): invalid pointer: 0x00007fc70c162fd0 ***
2015-10-28 16:52:17 MSG merovingian[2420]: database 'db' (2428) was killed by signal SIGABRT

The first connection executed a simple table creation in a savepoint, the other immediately crashed the database.

Reproducible: Always

Steps to Reproduce:

The following shell script list all the steps to reproduce:
!/bin/sh

farm=/tmp/farm
db=db

usual setup
monetdbd create "$farm"
monetdbd start "$farm"
monetdb create "$db"
monetdb release "$db"

trigger the crash
mclient -a "$db" << SQL
SAVEPOINT failsafe;
-- need to do something
create table blublu (x int);
RELEASE SAVEPOINT failsafe;
-- do not commit
SQL
mclient -a "$db" reconnect to trigger the crash

cleanup
tail "$farm/merovingian.log"
monetdbd stop "$farm"
rm -rf "$farm"

Actual Results:

Database is crashing.

Expected Results:

Database should not crash?

Reproduced on rel-Jul2015 (commit was: http://dev.monetdb.org/hg/MonetDB/rev/1290110df036)

Comment 21425

Date: 2015-10-30 22:38:33 +0100
From: @njnes

I cannot reproduce this on jul2015 branch.

Comment 21426

Date: 2015-11-02 11:01:33 +0100
From: Frédéric Jolliton <<frederic.jolliton+monetdb>>

In addition to what my coworker said, I can tell that I also get the crash (different Linux distribution -Gentoo- and tools version, so different environment.) So I'm reopening this bug. We can provide more details if necessary.

I've performed the following steps:

  • took the most recent Jul2015 branch
  • ./bootstrap
  • ./configure --prefix=/some/where (no other flags)
  • make -j8
  • make install
  • cd /some/where
  • bin/monetdbd create /tmp/bug3840
  • bin/monetdbd set port=53000 /tmp/bug3840
  • bin/monetdbd start /tmp/bug3840
  • bin/monetdb -p 53000 create db
  • bin/monetdb -p 53000 start db

The rest produces the same thing that Kevin described:

fjolliton@workstation $ bin/mclient -p 53000 -a db
Welcome to mclient, the MonetDB/SQL interactive terminal (unreleased)
Database: MonetDB v11.21.12 (unreleased), 'mapi:monetdb://workstation:53000/db'
Type \q to quit, ? for a list of available commands
auto commit mode: off
sql>SAVEPOINT failsafe;
auto commit mode: off
sql>create table blublu (x int);
operation successful (0.647ms)
sql>RELEASE SAVEPOINT failsafe;
auto commit mode: off
sql>^D

fjolliton@workstation $ bin/mclient -p 53000 -a db
<NOTHING - MonetDB crashed>

fjolliton@workstation $ tail /tmp/bug3840/merovagian.log
2015-11-02 09:43:10 MSG db[16208]: loading sql script: 80_udf.sql
2015-11-02 09:43:10 MSG db[16208]: loading sql script: 80_udf_hge.sql
2015-11-02 09:43:10 MSG db[16208]: loading sql script: 90_generator.sql
2015-11-02 09:43:10 MSG db[16208]: loading sql script: 90_generator_hge.sql
2015-11-02 09:43:10 MSG db[16208]: loading sql script: 99_system.sql
2015-11-02 09:43:24 MSG merovingian[16157]: proxying client (local) for database 'db' to mapi:monetdb:///tmp/bug3840/db/.mapi.sock?database=db
2015-11-02 09:43:24 MSG merovingian[16157]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying
2015-11-02 09:43:45 MSG merovingian[16157]: proxying client (local) for database 'db' to mapi:monetdb:///tmp/bug3840/db/.mapi.sock?database=db
2015-11-02 09:43:45 MSG merovingian[16157]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying
2015-11-02 09:43:45 MSG merovingian[16157]: database 'db' (16208) was killed by signal SIGSEGV

$ bin/mserver5 --version
MonetDB 5 server v11.21.12 (64-bit, 64-bit oids, 128-bit integers)
This is an unreleased version
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2015 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 15.6GiB available memory, 4 available cpu cores
Libraries:
libpcre: 8.36 2014-09-26 (compiled with 8.36)
openssl: OpenSSL 1.0.1g 7 Apr 2014 (compiled with OpenSSL 1.0.1g 7 Apr 2014)
libxml2: 2.9.1 (compiled with 2.9.1)
Compiled by: fjolliton@workstation (x86_64-unknown-linux-gnu)
Compilation: gcc -g -Werror -Wall -Wextra -W -Werror-implicit-function-declaration -Wpointer-arith -Wdeclaration-after-statement -Wundef -Wformat=2 -Wno-format-nonliteral -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wpacked -Wunknown-pragmas -Wvariadic-macros -fstack-protector-all -Wstack-protector -Wpacked-bitfield-compat -Wsync-nand -Wjump-misses-init -Wmissing-include-dirs -Wlogical-op -Wunreachable-code
Linking : /usr/x86_64-pc-linux-gnu/bin/ld -m elf_x86_64

Comment 21535

Date: 2015-11-17 12:53:16 +0100
From: Frédéric Jolliton <<frederic.jolliton+monetdb>>

Some update on this bug.

We upgraded our test database to the latest Jul2015 (from today, november 17th) and we still have the same crash. We're building it from scratch each time to ensure that we do not rely of an unclean state.

Are you confirming that no crash occurs on your side on the latest Jul2015?

Comment 21558

Date: 2015-11-19 17:29:06 +0100
From: @sjoerdmullender

I was able to reproduce this. Looks like a double free. Here is the stack trace, not the address given to GDKfree.

0 0x00007ffff7277bac in GDKfree (blk=0xdbdbdbdbdbdbdbdb)
at /ufs/sjoerd/src/MonetDB/stable/gdk/gdk_utils.c:724
1 0x00007fffe9088fb7 in destroy_dbat (tr=0x0, bat=0x7fffd4229b80)
at /ufs/sjoerd/src/MonetDB/stable/sql/storage/bat/bat_storage.c:1392
2 0x00007fffe908911b in destroy_del (tr=0x0, t=0x1f9e260)
at /ufs/sjoerd/src/MonetDB/stable/sql/storage/bat/bat_storage.c:1414
3 0x00007fffe9076270 in reset_table (tr=0x1eaa1e0, ft=0x1f9e260,
pft=0x1f93c70) at /ufs/sjoerd/src/MonetDB/stable/sql/storage/store.c:3162
4 0x00007fffe9075bf2 in reset_changeset (tr=0x1eaa1e0, fs=0x1f9abc0,
pfs=0x1f90670, b=0x1f9ab90, rf=0x7fffe90761d0 <reset_table>,
fd=0x7fffe90726f3 <table_dup>)
at /ufs/sjoerd/src/MonetDB/stable/sql/storage/store.c:3048
5 0x00007fffe907678a in reset_schema (tr=0x1eaa1e0, fs=0x1f9ab90,
pfs=0x1f90640) at /ufs/sjoerd/src/MonetDB/stable/sql/storage/store.c:3240
6 0x00007fffe9075bf2 in reset_changeset (tr=0x1eaa1e0, fs=0x1eaa210,
pfs=0x1eaa180, b=0x1eaa150, rf=0x7fffe90764bc <reset_schema>,
fd=0x7fffe9073282 <schema_dup>)
at /ufs/sjoerd/src/MonetDB/stable/sql/storage/store.c:3048
7 0x00007fffe907688e in reset_trans (tr=0x1eaa1e0, ptr=0x1eaa150)
at /ufs/sjoerd/src/MonetDB/stable/sql/storage/store.c:3257
8 0x00007fffe907ed27 in sql_trans_begin (s=0x7fffd41b3080)
at /ufs/sjoerd/src/MonetDB/stable/sql/storage/store.c:5187
9 0x00007fffe8fe0939 in mvc_trans (m=0x7fffd41b3570)
at /ufs/sjoerd/src/MonetDB/stable/sql/server/sql_mvc.c:169
10 0x00007fffe8f28d2c in monet5_user_set_def_schema (m=0x7fffd41b3570, user=0)
at /ufs/sjoerd/src/MonetDB/stable/sql/backends/monet5/sql_user.c:470
11 0x00007fffe8f2ab26 in SQLinitClient (c=0x7fffea70a328)
at /ufs/sjoerd/src/MonetDB/stable/sql/backends/monet5/sql_scenario.c:458
12 0x00007ffff7929254 in runPhase (c=0x7fffea70a328, phase=5)
at /ufs/sjoerd/src/MonetDB/stable/monetdb5/mal/mal_scenario.c:515
13 0x00007ffff792936e in runScenarioBody (c=0x7fffea70a328)
at /ufs/sjoerd/src/MonetDB/stable/monetdb5/mal/mal_scenario.c:542
14 0x00007ffff79295a6 in runScenario (c=0x7fffea70a328)
at /ufs/sjoerd/src/MonetDB/stable/monetdb5/mal/mal_scenario.c:579
15 0x00007ffff792b057 in MSserveClient (dummy=0x7fffea70a328)
at /ufs/sjoerd/src/MonetDB/stable/monetdb5/mal/mal_session.c:439
16 0x00007ffff792aa50 in MSscheduleClient (command=0x7fffd41994d0 "",
challenge=0x7fffdf813d70 "um3uQq5g", fin=0x7fffd413ea50,
fout=0x7fffd416faf0)
at /ufs/sjoerd/src/MonetDB/stable/monetdb5/mal/mal_session.c:319
17 0x00007ffff7a1ab1a in doChallenge (data=0x7fffd0000a60)
at /ufs/sjoerd/src/MonetDB/stable/monetdb5/modules/mal/mal_mapi.c:184
18 0x00007ffff7340fe5 in thread_starter (arg=0x7fffd0000e70)
at /ufs/sjoerd/src/MonetDB/stable/gdk/gdk_system.c:458
19 0x00007ffff4abf555 in start_thread () from /lib64/libpthread.so.0
20 0x00007ffff47fab9d in clone () from /lib64/libc.so.6

Comment 21559

Date: 2015-11-19 17:35:25 +0100
From: @sjoerdmullender

(In reply to Sjoerd Mullender from comment 4)

trace, not the address given to GDKfree.
s/not/note/

Comment 21693

Date: 2015-12-25 09:54:50 +0100
From: @njnes

the crash we saw was fixed recently.

Comment 21697

Date: 2015-12-29 15:06:19 +0100
From: Kevin Boulain <<kevin.boulain>>

Testing with the Jul2015 branch, we do not encounter the problem described here any more.
Do you know which particular commit fixed it (is there a unit test for this special case?).

Comment 21704

Date: 2015-12-31 02:13:47 +0100
From: @sjoerdmullender

(In reply to Kevin Boulain from comment 7)

Do you know which particular commit fixed it (is there a unit test for this
special case?).

According to hg bisect, that was changeset 93e7f9dbca06

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant