Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory corruption on unclean connection shutdown with local temporary tables #3648

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2015-01-12 21:20:39 +0100
From: sorear
To: SQL devs <>
Version: 11.19.7 (Oct2014-SP1)
CC: @njnes

Last updated: 2015-05-07 12:38:06 +0200

Comment 20552

Date: 2015-01-12 21:20:39 +0100
From: sorear

User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36
Build Identifier:

If a connection is dropped while temp tables are active, trans_drop_tmp will remove them from the active transaction without resetting tables.nelm on the relevant sql_schema object.

The next time that transaction is cleaned, reset_schema will call list_remove_node on the (dangling) table node; not finding it, list_remove_node will decrement tables->set.cnt and perform no other action.

If as a result of this tables->set.cnt is -1 but h and t are null, the next call to list_append will dereference a null pointer.

The attached patch fixes the initial issue in trans_drop_tmp and adds asserts to catch the cases where nelm points outside the list and the count goes below zero. It's not clear to me what list_remove_node should do if passed a pointer which is not a node in the list; it's possible there should be an assert(p != NULL) in list_remove_node somewhere.

What I did with "cs_check" is a little weird too. Ideally we'd avoid the function call and the iteration under -DNDEBUG; perhaps also there should be more cs_foo functions added so that nobody outside sql_changeset.c ever has to modify the set lists?

Reproducible: Always

Steps to Reproduce:

Start a MonetDB instance, create and release a database called "test". Run the following:

for i in $(seq 1 100); do echo $i; perl -MDBI -e 'my $db = DBI->connect("dbi:monetdb:database=test;host=localhost;port=50000", "monetdb", "monetdb"); $db->do("START TRANSACTION"); $db->do("CREATE LOCAL TEMPORARY TABLE foo ( bar int ) ON COMMIT DROP")'; done

Actual Results:

mserver5 segfaults once every 8 connections.

Expected Results:

No segfaults.

The bug was first observed with Jan2014-SP1. It may be older than that.
Not certain about the severity setting - it crashes frequently for us, but most people probably aren't using local temporary tables.

Comment 20553

Date: 2015-01-12 21:22:43 +0100
From: sorear

Created attachment 314
Patch which appears to fix the bug and adds assertions to avoid a 3rd-level crash

Attached file: temp_table_crash.patch (text/plain, 2623 bytes)
Description: Patch which appears to fix the bug and adds assertions to avoid a 3rd-level crash

Comment 20577

Date: 2015-01-26 19:15:26 +0100
From: @njnes

added a new cs_remove_node function which includes (re)setting the nelm pointer.

Comment 20579

Date: 2015-01-26 20:14:48 +0100
From: MonetDB Mercurial Repository <>

Changeset 9335d29210f2 made by Niels Nes niels@cwi.nl in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=9335d29210f2

Changeset description:

fixed bug #3648, ie properly set nelm after removing a node.
Thanks you sorear@gudtech.com for reporting the bug and supppling the patch.

Comment 20580

Date: 2015-01-27 02:05:57 +0100
From: sorear

Isn't the use of n->next in cs_remove_node a use after free, since list_remove_node calls node_destroy which (may) call GDKfree?

Comment 20585

Date: 2015-01-27 09:28:42 +0100
From: @njnes

indeep patch will follow soon..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant