You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are multiple worker processes (possibly on different servers in a cluster).
There are multiple threads in each process (possibly different amounts in different servers).
It isn't trivial to create such a setup - one needs to tweak launching worker processes to be multi-threaded. It would be easy if there was a command-line flag for julia that specified the number of threads, requested in JuliaLang/julia#34309. But it is still possible to create such a setup today with a bit of effort, and it is useful as all the threads in each worker process benefit from automatic shared memory "everything", rather than being restricted to constructs such as SharedArray. Of course this means one needs to be careful.
In such a scenario, the current behavior is very clear:
A @threads loop uses the threads of the current (main or worker) process.
A @distributed loop and pmap use a single thread in each worker process.
This has the advantage of simplicity and clarity. It also allows using a nested @threads in each iteration of @distributed or pmap to utilize all the threads in all the machines.
However, it would also be useful to have @distributed_threads and pmap_threads.
A @distributed_threads would statically allocate the same number of iterations for each thread across all the machines - that is, will allocate more iterations to worker processes with more threads, and then internally use @threads to execute these on each of the worker process threads. This would be the natural extension of @distributed, which uses static allocation of iterations to processes.
A pmap_threads would dynamically allocate tasks to each thread across all machines. The batch size, if specified, will individually apply to each thread. It might be useful to add a second batch group size (a positive number of batches) such that each worker process would get a whole group of batches at once, and use the threads to execute the smaller batches, to reduce the amount of cross-process coordination required. This would be the natural extension of pmap which uses dynamic allocation of iterations to processes.
The text was updated successfully, but these errors were encountered:
In a setup where:
It isn't trivial to create such a setup - one needs to tweak launching worker processes to be multi-threaded. It would be easy if there was a command-line flag for
julia
that specified the number of threads, requested in JuliaLang/julia#34309. But it is still possible to create such a setup today with a bit of effort, and it is useful as all the threads in each worker process benefit from automatic shared memory "everything", rather than being restricted to constructs such asSharedArray
. Of course this means one needs to be careful.In such a scenario, the current behavior is very clear:
@threads
loop uses the threads of the current (main or worker) process.@distributed
loop andpmap
use a single thread in each worker process.This has the advantage of simplicity and clarity. It also allows using a nested
@threads
in each iteration of@distributed
orpmap
to utilize all the threads in all the machines.However, it would also be useful to have
@distributed_threads
andpmap_threads
.A
@distributed_threads
would statically allocate the same number of iterations for each thread across all the machines - that is, will allocate more iterations to worker processes with more threads, and then internally use@threads
to execute these on each of the worker process threads. This would be the natural extension of@distributed
, which uses static allocation of iterations to processes.A
pmap_threads
would dynamically allocate tasks to each thread across all machines. The batch size, if specified, will individually apply to each thread. It might be useful to add a second batch group size (a positive number of batches) such that each worker process would get a whole group of batches at once, and use the threads to execute the smaller batches, to reduce the amount of cross-process coordination required. This would be the natural extension ofpmap
which uses dynamic allocation of iterations to processes.The text was updated successfully, but these errors were encountered: