Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assert failure on concurrent queries when querying sys.queue #3939

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

Assert failure on concurrent queries when querying sys.queue #3939

monetdb-team opened this issue Nov 30, 2020 · 0 comments

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2016-03-02 09:14:04 +0100
From: anthonin.bonnefoy
To: MonetDB5 devs <>
Version: 11.21.19 (Jul2015-SP4)
CC: @mlkersten

Last updated: 2016-10-13 10:04:07 +0200

Comment 21849

Date: 2016-03-02 09:14:04 +0100
From: anthonin.bonnefoy

User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0 Iceweasel/45.0
Build Identifier:

When doing concurrent 2 mclient with one querying sys.queue, mserver segfault with the given (abbreviated) stacktrace

11 0x00007f4257d5a312 in __GI___assert_fail (assertion=0x7f42598838de "(v[i] & 0xC0) == 0x80", file=0x7f4259883710 "gdk_atoms.c", line=1283, function=0x7f4259883af6 <PRETTY_FUNCTION.10385> "strPut") at assert.c:101
12 0x00007f425959c941 in strPut (h=0x7f42341f3050, dst=0x7f4250dad290, v=0x7f41d6596b20 '\333' <repeats 160 times>, "\300") at gdk_atoms.c:1283
13 0x00007f425963fef5 in BUNappend (b=0x7f42342cc010, t=0x7f41d6596b20, force=0 '\000') at gdk_bat.c:1318
14 0x00007f4259d93785 in SYSMONqueue (cntxt=0x7f4252f7f330, mb=0x7f4234239880, stk=0x7f42342dee20, pci=0x7f42342fb5c0) at sysmon.c:103
15 0x00007f4259c1418a in runMALsequence (cntxt=0x7f4252f7f330, mb=0x7f4234239880, startpc=1, stoppc=0, stk=0x7f42342dee20, env=0x0, pcicaller=0x0) at mal_interpreter.c:631
16 0x00007f4259c132db in callMAL (cntxt=0x7f4252f7f330, mb=0x7f4234239880, env=0x7f4250dad998, argv=0x7f4250dada40, debug=0 '\000') at mal_interpreter.c:447

From what I could debug, here is what happened:

  • QRYqueue[0] and QRYqueue[1] are filled
  • QRYqueue[0] is finished and cleaned in runtimeProfileFinish, QRYqueue[1] is moved in QRYqueue[0].
    We still have a QRYqueue information in QRYqueue[1], that should not pose problems since qtop is here
  • When a query is done on sys.queue, the loop in sysmon.c:89 is done from 0 to QRYqueue[i].tag, so we hit QRYqueue[1].query which has been freed but not setted to NULL.

Exporting qtop and doing the loop until qtop seems to solve the problem.

Reproducible: Always

Comment 21850

Date: 2016-03-02 09:40:06 +0100
From: anthonin.bonnefoy

By the way, in gdk_atoms.c:1247, the line

memset(h->base + h->free, 0, h->size - h->free);

Looks like an off-by one (h->base[h->size] is set to 0)

Comment 21909

Date: 2016-03-15 10:30:58 +0100
From: @sjoerdmullender

(In reply to anthonin.bonnefoy from comment 1)

By the way, in gdk_atoms.c:1247, the line

memset(h->base + h->free, 0, h->size - h->free);

Looks like an off-by one (h->base[h->size] is set to 0)

No. memset sets the memory starting at h->base+h->free for (h->size-h->free) bytes. That is to say, from h->base+h->free up to (but not including(!)) h->base+h->free+(h->size-h->free) == h->base+h->size.

Comment 22072

Date: 2016-04-21 10:19:41 +0200
From: anthonin.bonnefoy

Yeah, my bad.

Still, the bug on sys.queue is valid. Freed memory is read and garbage can appear in the query column.

Just by doing 2 simultaneous

while mclient db -s "select * from sys.queue;" ; do :; done

This will trigger an "utf8strlenmax: Assertion `n == 0' failed."

Comment 22254

Date: 2016-07-25 21:04:50 +0200
From: @mlkersten

Running the one liner does not cause problems.

Running two clients causes an assertion in BUNappend
in SYSMONqueue, which is protected by a mal_delaylock (weird)

0 0x00007fd50b4389c8 in raise () from /lib64/libc.so.6
1 0x00007fd50b43a65a in abort () from /lib64/libc.so.6
2 0x00007fd50b431187 in __assert_fail_base () from /lib64/libc.so.6
3 0x00007fd50b431232 in __assert_fail () from /lib64/libc.so.6
4 0x00007fd50e0f44f1 in strPut (h=0x7fd454155c60, dst=0x7fd461f2f288, v=0x7fd45414f1f0 '\333' <repeats 40 times>, "q") at /export/scratch1/mk/default//package/gdk/gdk_atoms.c:1323
5 0x00007fd50e18ef54 in BUNappend (b=0x7fd454155b20, t=0x7fd45414f1f0, force=0 '\000') at /export/scratch1/mk/default//package/gdk/gdk_bat.c:955
6 0x00007fd50e83af7d in SYSMONqueue (cntxt=0x7fd5014bc330, mb=0x7fd454107280, stk=0x7fd45415a320, pci=0x7fd4541371b0) at /export/scratch1/mk/default//package/monetdb5/modules/mal/sysmon.c:77

Comment 22257

Date: 2016-07-25 21:37:33 +0200
From: @sjoerdmullender

Run the server under valgrind. One of the messages is:

==21299== Invalid read of size 1
==21299== at 0x5566DF1: strCmp (gdk_atoms.c:1084)
==21299== by 0x5601AF9: setcolprops (gdk_bat.c:818)
==21299== by 0x56025CC: BUNappend (gdk_bat.c:952)
==21299== by 0x4F94D5D: SYSMONqueue (sysmon.c:76)
==21299== by 0x4E85B1E: runMALsequence (mal_interpreter.c:629)
==21299== by 0x4E84CE6: callMAL (mal_interpreter.c:434)
==21299== by 0x18C816C1: SQLexecutePrepared (sql_execute.c:525)
==21299== by 0x18C81ADB: SQLengineIntern (sql_execute.c:576)
==21299== by 0x18C7F1E4: SQLengine (sql_scenario.c:1178)
==21299== by 0x4EACB4D: runPhase (mal_scenario.c:531)
==21299== by 0x4EACD2F: runScenarioBody (mal_scenario.c:574)
==21299== by 0x4EACE3F: runScenario (mal_scenario.c:594)
==21299== Address 0x1353c920 is 16 bytes inside a block of size 56 free'd
==21299== at 0x4C2CD5A: free (vg_replace_malloc.c:530)
==21299== by 0x55614DA: GDKfree (gdk_utils.c:1804)
==21299== by 0x4E7AF32: runtimeProfileFinish (mal_runtime.c:113)
==21299== by 0x4E86654: runMALsequence (mal_interpreter.c:760)
==21299== by 0x4E84CE6: callMAL (mal_interpreter.c:434)
==21299== by 0x18C816C1: SQLexecutePrepared (sql_execute.c:525)
==21299== by 0x18C81ADB: SQLengineIntern (sql_execute.c:576)
==21299== by 0x18C7F1E4: SQLengine (sql_scenario.c:1178)
==21299== by 0x4EACB4D: runPhase (mal_scenario.c:531)
==21299== by 0x4EACD2F: runScenarioBody (mal_scenario.c:574)
==21299== by 0x4EACE3F: runScenario (mal_scenario.c:594)
==21299== by 0x4EAE928: MSserveClient (mal_session.c:449)
==21299== Block was alloc'd at
==21299== at 0x4C2BBAD: malloc (vg_replace_malloc.c:299)
==21299== by 0x55610E0: GDKmalloc_prefixsize (gdk_utils.c:1679)
==21299== by 0x55611F7: GDKmallocmax (gdk_utils.c:1722)
==21299== by 0x55612FF: GDKmalloc (gdk_utils.c:1746)
==21299== by 0x55617B8: GDKstrdup (gdk_utils.c:1881)
==21299== by 0x4E7AA32: runtimeProfileInit (mal_runtime.c:87)
==21299== by 0x4E851E5: runMALsequence (mal_interpreter.c:500)
==21299== by 0x4E84CE6: callMAL (mal_interpreter.c:434)
==21299== by 0x18C816C1: SQLexecutePrepared (sql_execute.c:525)
==21299== by 0x18C81ADB: SQLengineIntern (sql_execute.c:576)
==21299== by 0x18C7F1E4: SQLengine (sql_scenario.c:1178)
==21299== by 0x4EACB4D: runPhase (mal_scenario.c:531)

In other words, the query being added by SYSMONqueue has aleady been freed.

Comment 22259

Date: 2016-07-25 21:58:24 +0200
From: MonetDB Mercurial Repository <>

Changeset 9f67d824a657 made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=9f67d824a657

Changeset description:

Provide sentinel for SYSMONqueue().
This fixs bug #3939.

Comment 22260

Date: 2016-07-25 22:00:12 +0200
From: @sjoerdmullender

Should be fixed now.

Comment 24492

Date: 2016-10-13 10:04:07 +0200
From: @sjoerdmullender

Jun2016-SP2 has been released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant