feature request: @distributed_threads and pmap_threads #67

orenbenkiki · 2020-01-08T14:13:21Z

In a setup where:

There are multiple worker processes (possibly on different servers in a cluster).
There are multiple threads in each process (possibly different amounts in different servers).

It isn't trivial to create such a setup - one needs to tweak launching worker processes to be multi-threaded. It would be easy if there was a command-line flag for julia that specified the number of threads, requested in JuliaLang/julia#34309. But it is still possible to create such a setup today with a bit of effort, and it is useful as all the threads in each worker process benefit from automatic shared memory "everything", rather than being restricted to constructs such as SharedArray. Of course this means one needs to be careful.

In such a scenario, the current behavior is very clear:

A @threads loop uses the threads of the current (main or worker) process.
A @distributed loop and pmap use a single thread in each worker process.

This has the advantage of simplicity and clarity. It also allows using a nested @threads in each iteration of @distributed or pmap to utilize all the threads in all the machines.

However, it would also be useful to have @distributed_threads and pmap_threads.

A @distributed_threads would statically allocate the same number of iterations for each thread across all the machines - that is, will allocate more iterations to worker processes with more threads, and then internally use @threads to execute these on each of the worker process threads. This would be the natural extension of @distributed, which uses static allocation of iterations to processes.

A pmap_threads would dynamically allocate tasks to each thread across all machines. The batch size, if specified, will individually apply to each thread. It might be useful to add a second batch group size (a positive number of batches) such that each worker process would get a whole group of batches at once, and use the threads to execute the smaller batches, to reduce the amount of cross-process coordination required. This would be the natural extension of pmap which uses dynamic allocation of iterations to processes.

The text was updated successfully, but these errors were encountered:

orenbenkiki mentioned this issue Jan 8, 2020

dtreduce JuliaFolds/Transducers.jl#137

Open

vtjnash transferred this issue from JuliaLang/julia Feb 11, 2024

vtjnash changed the title ~~@distributed_threads and pmap_threads~~ feature request: @distributed_threads and pmap_threads Feb 11, 2024

vtjnash added the enhancement New feature or request label Feb 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature request: @distributed_threads and pmap_threads #67

feature request: @distributed_threads and pmap_threads #67

orenbenkiki commented Jan 8, 2020

feature request: @distributed_threads and pmap_threads #67

feature request: @distributed_threads and pmap_threads #67

Comments

orenbenkiki commented Jan 8, 2020