Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in mal_dataflow's q_dequeue() (SciQL-2 branch) #3346

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

Deadlock in mal_dataflow's q_dequeue() (SciQL-2 branch) #3346

monetdb-team opened this issue Nov 30, 2020 · 0 comments

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2013-08-20 17:26:15 +0200
From: @drstmane
To: MonetDB5 devs <>
Version: 11.15.11 (Feb2013-SP3)
CC: @mlkersten

Last updated: 2013-09-27 13:47:18 +0200

Comment 19035

Date: 2013-08-20 17:26:15 +0200
From: @drstmane

Created attachment 223
SciQL test script

On my 8-core (4 physical core with hyperthreading, the attached SciQL script finished succesfully in less than 2 seconds (debug build) when run with t = {1,7,8} threads (mserver5 --set gdk_nr_threads=t).

However, when run with t={2,3,4,5,6} threads, the script hangs as several threads --- 3(!) when the server run with 2(!) threads --- hang on the MT_sema_down(&q->s, "q_dequeue"); in dataflow's q_dequeue() function; see also the gdk trace below.

This happens with the latest version of the SciQL-2 branch (changeset c935ec8da74c), but similar (or identical?) behaviour has been observed also with earlier versions, i.e., before the recent re-cast of the worker-pool had been propagated from the Feb2013 branch.

(gdb) thread apply all bt

Thread 8 (Thread 0x7fffea7d7700 (LWP 27115)):
0 0x000000337ce0d6a0 in sem_wait () from /lib64/libpthread.so.0
1 0x00007ffff7903b3b in q_dequeue (q=0x7fffd4003660) at /ufs/manegold//Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_dataflow.c:210
2 0x00007ffff7905530 in DFLOWscheduler (flow=0x7fffd40035d0) at /ufs/manegold/
/Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_dataflow.c:573
3 0x00007ffff7905b33 in runMALdataflow (cntxt=0x628028, mb=0x7fffdc2eed50, startpc=12, stoppc=43, stk=0x7fffd4003e70) at /ufs/manegold//Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_dataflow.c:653
4 0x00007ffff7aa719f in MALstartDataflow (cntxt=0x628028, mb=0x7fffdc2eed50, stk=0x7fffd4003e70, pci=0x7fffdc526270) at /ufs/manegold/
/Monet/HG/ANY/source/MonetDB/monetdb5/modules/mal/language.c:136
5 0x00007ffff790083c in runMALsequence (cntxt=0x628028, mb=0x7fffdc2eed50, startpc=1, stoppc=54, stk=0x7fffd4003e70, env=0x7fffdc591100, pcicaller=0x7fffdc5257b0) at /ufs/manegold//Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_interpreter.c:650
6 0x00007ffff7900c8f in runMALsequence (cntxt=0x628028, mb=0x7fffdc33b810, startpc=38, stoppc=39, stk=0x7fffdc591100, env=0x0, pcicaller=0x0) at /ufs/manegold/
/Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_interpreter.c:730
7 0x00007ffff7903f95 in DFLOWworker (t=0x7ffff7f2e028 <workers+8>) at /ufs/manegold/_/Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_dataflow.c:301
8 0x000000337ce07d15 in start_thread () from /lib64/libpthread.so.0
9 0x000000337c2f253d in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7fffea9d8700 (LWP 27114)):
0 0x000000337ce0d6a0 in sem_wait () from /lib64/libpthread.so.0
1 0x00007ffff7903b3b in q_dequeue (q=0x7fffe00037b0) at /ufs/manegold//Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_dataflow.c:210
2 0x00007ffff7905530 in DFLOWscheduler (flow=0x7fffe0003670) at /ufs/manegold/
/Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_dataflow.c:573
3 0x00007ffff7905b33 in runMALdataflow (cntxt=0x628028, mb=0x7fffdc528030, startpc=12, stoppc=43, stk=0x7fffe00040b0) at /ufs/manegold//Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_dataflow.c:653
4 0x00007ffff7aa719f in MALstartDataflow (cntxt=0x628028, mb=0x7fffdc528030, stk=0x7fffe00040b0, pci=0x7fffdc525390) at /ufs/manegold/
/Monet/HG/ANY/source/MonetDB/monetdb5/modules/mal/language.c:136
5 0x00007ffff790083c in runMALsequence (cntxt=0x628028, mb=0x7fffdc528030, startpc=1, stoppc=54, stk=0x7fffe00040b0, env=0x7fffdc591100, pcicaller=0x7fffdc56f5b0) at /ufs/manegold//Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_interpreter.c:650
6 0x00007ffff7900c8f in runMALsequence (cntxt=0x628028, mb=0x7fffdc33b810, startpc=41, stoppc=42, stk=0x7fffdc591100, env=0x0, pcicaller=0x0) at /ufs/manegold/
/Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_interpreter.c:730
7 0x00007ffff7903f95 in DFLOWworker (t=0x7ffff7f2e020 ) at /ufs/manegold/_/Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_dataflow.c:301
8 0x000000337ce07d15 in start_thread () from /lib64/libpthread.so.0
9 0x000000337c2f253d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7fffeabd9700 (LWP 27113)):
0 0x000000337ce0d6a0 in sem_wait () from /lib64/libpthread.so.0
1 0x00007ffff7903b3b in q_dequeue (q=0x7fffdc338b90) at /ufs/manegold//Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_dataflow.c:210
2 0x00007ffff7905530 in DFLOWscheduler (flow=0x7fffdc2eb7c0) at /ufs/manegold/
/Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_dataflow.c:573
3 0x00007ffff7905b33 in runMALdataflow (cntxt=0x628028, mb=0x7fffdc33b810, startpc=21, stoppc=42, stk=0x7fffdc591100) at /ufs/manegold//Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_dataflow.c:653
4 0x00007ffff7aa719f in MALstartDataflow (cntxt=0x628028, mb=0x7fffdc33b810, stk=0x7fffdc591100, pci=0x7fffdc33e8d0) at /ufs/manegold/
/Monet/HG/ANY/source/MonetDB/monetdb5/modules/mal/language.c:136
5 0x00007ffff790083c in runMALsequence (cntxt=0x628028, mb=0x7fffdc33b810, startpc=1, stoppc=104, stk=0x7fffdc591100, env=0x7fffdc594120, pcicaller=0x7fffdc582590) at /ufs/manegold//Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_interpreter.c:650
6 0x00007ffff7900c8f in runMALsequence (cntxt=0x628028, mb=0x7fffdc11d020, startpc=1, stoppc=288, stk=0x7fffdc594120, env=0x7fffdc58bdf0, pcicaller=0x7fffdc2f3cc0) at /ufs/manegold/
/Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_interpreter.c:730
7 0x00007ffff7900c8f in runMALsequence (cntxt=0x628028, mb=0x7fffdc313f00, startpc=1, stoppc=0, stk=0x7fffdc58bdf0, env=0x0, pcicaller=0x0) at /ufs/manegold//Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_interpreter.c:730
8 0x00007ffff78ffd01 in callMAL (cntxt=0x628028, mb=0x7fffdc313f00, env=0x7fffeabd8b48, argv=0x7fffeabd8ba0, debug=0 '\000') at /ufs/manegold/
/Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_interpreter.c:472
9 0x00007fffef073b06 in SQLexecutePrepared (c=0x628028, be=0x7fffdc02b470, q=0x7fffdc2044c0) at /ufs/manegold//Monet/HG/ANY/source/MonetDB/sql/backends/monet5/sql_scenario.c:1888
10 0x00007fffef073f42 in SQLengineIntern (c=0x628028, be=0x7fffdc02b470) at /ufs/manegold/
/Monet/HG/ANY/source/MonetDB/sql/backends/monet5/sql_scenario.c:1951
11 0x00007fffef0745c7 in SQLengine (c=0x628028) at /ufs/manegold//Monet/HG/ANY/source/MonetDB/sql/backends/monet5/sql_scenario.c:2057
12 0x00007ffff792e12a in runPhase (c=0x628028, phase=4) at /ufs/manegold/
/Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_scenario.c:522
13 0x00007ffff792e313 in runScenarioBody (c=0x628028) at /ufs/manegold//Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_scenario.c:566
14 0x00007ffff792e436 in runScenario (c=0x628028) at /ufs/manegold/
/Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_scenario.c:586
15 0x00007ffff792f4a8 in MSserveClient (dummy=0x628028) at /ufs/manegold/_/Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_session.c:431
16 0x000000337ce07d15 in start_thread () from /lib64/libpthread.so.0
17 0x000000337c2f253d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7fffeadda700 (LWP 27110)):
0 0x000000337c2eb863 in select () from /lib64/libc.so.6
1 0x00007ffff713bb95 in MT_sleep_ms (ms=50) at /ufs/manegold//Monet/HG/ANY/source/MonetDB/gdk/gdk_posix.c:1226
2 0x00007fffef198995 in store_manager () at /ufs/manegold/
/Monet/HG/ANY/source/MonetDB/sql/storage/store.c:1593
3 0x00007fffef11806e in mvc_logmanager () at /ufs/manegold/_/Monet/HG/ANY/source/MonetDB/sql/server/sql_mvc.c:195
4 0x000000337ce07d15 in start_thread () from /lib64/libpthread.so.0
5 0x000000337c2f253d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7fffeafdb700 (LWP 27109)):
0 0x000000337c2eb863 in select () from /lib64/libc.so.6
1 0x00007ffff7aad259 in SERVERlistenThread (Sock=0x1a5cd10) at /ufs/manegold/_/Monet/HG/ANY/source/MonetDB/monetdb5/modules/mal/mal_mapi.c:209
2 0x000000337ce07d15 in start_thread () from /lib64/libpthread.so.0
3 0x000000337c2f253d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7ffff03cb700 (LWP 27108)):
0 0x000000337c2eb863 in select () from /lib64/libc.so.6
1 0x00007ffff713bb95 in MT_sleep_ms (ms=1000) at /ufs/manegold//Monet/HG/ANY/source/MonetDB/gdk/gdk_posix.c:1226
2 0x00007ffff791d9ac in profilerHeartbeat (dummy=0x0) at /ufs/manegold/
/Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_profiler.c:1431
3 0x000000337ce07d15 in start_thread () from /lib64/libpthread.so.0
4 0x000000337c2f253d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7ffff05cc700 (LWP 27107)):
0 0x000000337c2eb863 in select () from /lib64/libc.so.6
1 0x00007ffff713bb95 in MT_sleep_ms (ms=50) at /ufs/manegold//Monet/HG/ANY/source/MonetDB/gdk/gdk_posix.c:1226
2 0x00007ffff707712c in GDKvmtrim (limit=0x7ffff779ecf8 <GDK_mem_maxsize>) at /ufs/manegold/
/Monet/HG/ANY/source/MonetDB/gdk/gdk_utils.c:921
3 0x000000337ce07d15 in start_thread () from /lib64/libpthread.so.0
4 0x000000337c2f253d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7ffff6b3e840 (LWP 27082)):
0 0x000000337ce0e12d in read () from /lib64/libpthread.so.0
1 0x0000003380e2a3c1 in rl_getc () from /lib64/libreadline.so.6
2 0x0000003380e2abc9 in rl_read_key () from /lib64/libreadline.so.6
3 0x0000003380e15d51 in readline_internal_char () from /lib64/libreadline.so.6
4 0x0000003380e162a5 in readline () from /lib64/libreadline.so.6
5 0x00007ffff791e7d2 in getConsoleInput (c=0x627d40, prompt=0x65ac70 ">", linemode=0, exit_on_error=1) at /ufs/manegold//Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_readline.c:329
6 0x00007ffff791ed94 in readConsole (cntxt=0x627d40) at /ufs/manegold/
/Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_readline.c:473
7 0x00007ffff792f6f2 in MALreader (c=0x627d40) at /ufs/manegold//Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_session.c:491
8 0x00007ffff792e12a in runPhase (c=0x627d40, phase=0) at /ufs/manegold/
/Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_scenario.c:522
9 0x00007ffff792e229 in runScenarioBody (c=0x627d40) at /ufs/manegold//Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_scenario.c:552
10 0x00007ffff792e436 in runScenario (c=0x627d40) at /ufs/manegold/
/Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_scenario.c:586
11 0x00007ffff792f4a8 in MSserveClient (dummy=0x627d40) at /ufs/manegold//Monet/HG/ANY/source/MonetDB/monetdb5/mal/mal_session.c:431
12 0x000000000040367c in main (argc=3, av=0x7fffffffd6c8) at /ufs/manegold/
/Monet/HG/ANY/source/MonetDB/tools/mserver/mserver5.c:622
(gdb)

Attached file: deadlock.sciql (application/octet-stream, 4253 bytes)
Description: SciQL test script

Comment 19036

Date: 2013-08-20 19:54:35 +0200
From: @mlkersten

Yes, there might be a potential deadlock in the following situation. Recall that we have a single pool per session. If we call a function containing a parallel block, then we effectively have reduced the available pool with 1 worker, because the calling MAL instruction is put on hold without releasing the worker thread. After a few calls, all workers may be occupied by handling a MAL function call, putting new instructions in the queue.

Comment 19037

Date: 2013-08-20 21:22:52 +0200
From: @mlkersten

Situation is re-created with the test BugTracker-2013/Tests/nestedcalls.sql
A related on is BugTracker-2013/Tests/recursive.sql

Comment 19040

Date: 2013-08-21 08:41:04 +0200
From: @drstmane

Please be aware that the problem exists (also) in the Feb2013 (release) branch (and SciQL-2 branch that is spawned off the Feb2013 branch). --- You added you tests (only) to the default (development) branch.

Comment 19041

Date: 2013-08-21 08:48:32 +0200
From: @drstmane

Do I understand you correctly, that with "an unfortunate constellation" of nested/recursive MAL function calls --- only if the calling or the called or both functions involve dataflow blocks? ---, all worker threads might become DFLOWscheduler's, and thus there are no threads left to do the actual work (DFLOWworker), and thus, all DFLOWscheduler's wait for the work to be done by no awailable worker threads?

Given that the DFLOWscheduler is not supposed/expected to do much work itself, would it be an option to spawn a new scheduler-thread with each (non-inlined/-inlineable) MAL function call --- only if the calling or the called or both functions involve dataflow blocks? ---, thus keeping the worker threads free to do the "real" work?

Comment 19042

Date: 2013-08-21 11:39:30 +0200
From: MonetDB Mercurial Repository <>

Changeset 98ac58eef94c made by Stefan Manegold Stefan.Manegold@cwi.nl in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=98ac58eef94c

Changeset description:

revert unintened(?) changes that "slipped in" with changeset [489815265a61](https://dev.monetdb.org/hg/MonetDB?cmd=changeset;node=489815265a61) 

these appear unrelated to fixing bug #3346,
and rather the result of too coarse copy-&-paste

Comment 19047

Date: 2013-08-21 23:37:05 +0200
From: @drstmane

More exhaustive testing indicates that Martin's fix (changeset 489815265a61 ff.) does also work fine with the initial SciQL problem.

Thanks to Martin for the prompt fix and for providing a concise SQL-only test in sql/test/BugTracker-2013/Tests/nestedcalls.sql .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant