Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduling issues #3258

monetdb-team opened this issue Nov 30, 2020 · 0 comments

Scheduling issues #3258

monetdb-team opened this issue Nov 30, 2020 · 0 comments


Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2013-03-18 07:51:43 +0100
From: @mlkersten
To: MonetDB5 devs <>
Version: 11.15.15 (Feb2013-SP4)

Last updated: 2013-12-03 13:59:32 +0100

Comment 18624

Date: 2013-03-18 07:51:43 +0100
From: @mlkersten

User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:19.0) Gecko/20100101 Firefox/19.0
Build Identifier:

When you have a big query running, all worker threads are busy.
Then trying to connect with another mclient is hindered, because
all instructions are merged into a global queue. This most likely
means you can not easily stop the big query.

Solutions: either have a worker set per Client connection (preferred) or
change scheduler to better balance instructions.

Reproducible: Always

Steps to Reproduce:

  1. run sf100
  2. separate start of mclient to perform a sys.pause(x)

Comment 19003

Date: 2013-08-15 23:57:29 +0200
From: MonetDB Mercurial Repository <>

Changeset 3669ddd28bf0 made by Martin Kersten in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=3669ddd28bf0

Changeset description:

New thread group per dataflow block
The centralized worker thread group could lead to an unacceptable
situation. If a user is heavily processing complex queries, then
no other user could even log into the system, for its MAL statements
ended up at the end of the shared queues.

The problem has been resolved by introducing a thread group per dataflow
block. This may lead to a large number of processes, whose resources
are managed by the OS.

It solves bug #3258

Comment 19162

Date: 2013-09-18 18:01:45 +0200
From: MonetDB Mercurial Repository <>

Changeset 8a77f78a4fe4 made by Sjoerd Mullender in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=8a77f78a4fe4

Changeset description:

Revert creating dataflow pools per client and fix problem differently.

When a MAL function calls language.dataflow(), the thread executing
the call waits until the whole dataflow block is executed by the other
threads in the dataflow pool.  If this is done recursively, we go
through all available threads and all threads end up waiting for their
dataflow block to finish, which doesn't happen since there are no
worker threads available anymore.  The solution that was tried before
was to create N threads whenever language.dataflow() was called, and
those threads never exited.  This can very quickly cause very many
threads to be created (I have seen over 1300 threads on a system with
many cores).  The current solutions instead creates a single extra
thread whenever a thread is blocked waiting for the dataflow block to
be finished, and when the block is finished, it stops a single thread
(possibly a different one, but who cares: the result is the same).

This may also fix bug #3258 in a different way then the original fix.

Comment 19163

Date: 2013-09-18 19:33:16 +0200
From: @mlkersten

This 'new' solution does not handle the main problem addressed with the thread
pool per client!

The point is that once a complex/expensive query is started from a fixed pool
it effectively blocks (administrative) SQL access to the server for querying.
Such server access is needed to inspect the query queue and kill e.g. the rogue query. => there should be a worker pool /client session !
Also parallel execution of concurrent queries is left to the OS thread scheduling.

If you want to reduce the poolt hen garbage collect the threads set aside per client when his session terminates.

Recursion will now also still create a large number threads(=equal to the recursion depth. Thereby not addressing the real problem. The MAL interpreter
has a high-water mark to stop too deeply recursive function calls (against
stack overflow). A similar approach should be considered if you want to
control the number processes. => The stack depth can be used to assess if dataflow parallelism should be used. Otherwise, simply continue in serial mode.
(I thought it already did so)

I propose to revert this patch in the light of this consideration.

Comment 19164

Date: 2013-09-18 19:51:19 +0200
From: @mlkersten

DFLOWinitialize... here we stop parallel interpretation and continue serial
MT_lock_unset(&mal_contextLock, "DFLOWinitialize");
if (grp > THREADS) {
// continue non-parallel
return -1;
runMALdataflow.... here we stop parallel processing if there are too many pools
/* too many threads turns dataflow processing off */
if ( cntxt->idx > MAXQ){
*ret = TRUE;
Recursive calls can 'steal' a worker pool, because the recursive function may
have to be ran in parallel as well.

Also if you can not create a worker thread we continue with serial execution

On a large multicore system you may end up with MAXQ * THREADS processes

Comment 19165

Date: 2013-09-18 20:42:27 +0200
From: @mlkersten

A test like this could be used also in dataflow to turn off parallel processing

if ((unsigned)stk->stkdepth > THREAD_STACK_SIZE / sizeof(mb->var[0]) / 4 && THRhighwater())
/* we are running low on stack space */

Comment 19323

Date: 2013-11-05 14:05:56 +0100
From: MonetDB Mercurial Repository <>

Changeset 21892a4f04a1 made by Sjoerd Mullender in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=21892a4f04a1

Changeset description:

Fix for bug #3258.
We now maintain a pool of N-1 generic worker threads which is extended
by one client-specific worker thread for each client that enters the
dataflow code.

Comment 19324

Date: 2013-11-05 15:20:20 +0100
From: @sjoerdmullender

Unless proven otherwise, this is now fixed.

Comment 19376

Date: 2013-12-03 13:59:32 +0100
From: @sjoerdmullender

Feb2013-SP6 has been released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant