random generators compatibility with parallel processing #513

lfarv · 2022-11-04T16:31:03Z

The simple random number generators (RNG) used up to now are not compatible with parallel processing: the streams of numbers in the different threads are likely to be identical. We'll try here to solve this problem.

lfarv · 2022-11-04T16:52:44Z

The first step is the implementation of RNGs with an external state variable. 3 are implemented (note the _r suffix indicating the external state):

double atrandd_r(pcg32_random_t *rng) for uniform [0, 1] distribution
double atrandn_r(pcg32_random_t *rng) for Gaussian distribution
int atrandp_r(pcg32_random_t *rng, double lambda) for Poisson distribution

3 equivalent generators use an internal state variable, for compatibility:

double atrandd() for uniform [0, 1] distribution
double atrandn() for Gaussian distribution
int atrandp(double lambda) for Poisson distribution

The state variable is a private type, and there are also 2 functions to initialise it (set a new seed):

void pcg32_srandom_r(pcg32_random_t* rng, uint64_t initstate, uint64_t initseq) for external seed
void pcg32_srandom(uint64_t initstate, uint64_t initseq) for internal seed

Note that the "seed" is now a set of two 64bit integers. The idea is that initstate is the real seed, and initseq is a "stream selector" , which allows to have different streams which don't overlap.
So for the moment, I removed the init_seed function which is too limited for that.

Up to now, except that the RNG is no more reinitialised at the beginning of VariableMPolePass, nothing is changed.

lfarv · 2022-11-04T17:19:10Z

Now how to use that:

either we initialise the state variables once (and provide a way to reset them to be able to "replay" a sequence). That's for MPI and hopefully python multiprocessing if we find a way to identify its threads. One could add 2 state variables in the "struct param" structure accessible in all integrators. One with the initseq parameter including the thread id so that we get independent streams in each thread (for quantum lifetime). The other one not, so that all threads will follow the same stream (for variable multipole). For OpenMP, there would be a single stream, as without multiprocessing, but we could "share" the state variable.

The initialisation might be done on the 1st pass in atpass
or we initialise the state variable on each pass in the integrator. Then we need another level of RNG to provide a new seed on each pass (or use the turn number ? But what if there are several identical elements in the lattice ?). Then everything is contained in the integrator, but it looks more complicated. Is the overhead acceptable?

There may be many other solutions, please feel free to propose !

swhite2401 · 2022-11-07T09:34:40Z

Hello @lfarv , thanks for looking into this, what you propose seems good to me.

I personally would go for the simpler option 1. I do not see what would be the practical advantages of handling everything in the integrator, is there a good reason to push for that or is it just to maintain the present strategy?

The identification of the threads of the python multiprocessing has to be done in the python and passed to atpass I believe, you can probably use multiprocessing.current_process() for that, to be tested.

lfarv · 2022-11-07T09:55:30Z

Hi @swhite2401 : I also have a preference for option 1: it makes integrators simpler, and easier allows for resetting the seeds, though it's not obvious (atpass is the single entry point). There are however a few delicate points: for the "single" random stream (variable multipole), all the threads should generate the same numbers, so we must take care that no thread misses a RNG call (because of a test causing skipping a call, for instance). Should we check at the end of tracking that all threads are still in sync?
For OpenMP, in the "shared state" option, one must take care of race conditions, using "critical blocks" or "atomic " variables. There may be a performance penalty for that, I have no experience with that, so it'll take some time to test…

Following you proposal, I'll start in this branch to experiment with option 1, and try to set up "test integrators" to check the behaviour.

swhite2401 · 2022-11-07T10:30:51Z

Sounds good.

How would you check the sync condition? You would count the number of calls?

The check would be useful only for the python multiprocessing, in the case of MPI we can check this during the tracking as processes can communicate with each other.

Race conditions are a tricky problem...however a simple solution would be to not implement openMP in the variable multipole element, this is just one element after all so the performance loss may not be so significant.

lfarv · 2022-11-09T16:35:17Z

How would you check the sync condition? You would count the number of calls?

No, after tracking is finished I would ask for another random value and check they are equal in all threads!

In the mean time, I found an unexpected behaviour of patpass: It starts one process per particle ! There are indeed as many simultaneous processes as specified in pool_size, but new processes are started as soon as the first ones are terminated.

This is far from optimal, and it makes very difficult to have all processes follow a single stream of random numbers for the variable multipole.

So I'll open another PR to solve this first.

lfarv · 2022-11-09T16:54:35Z

I also found another point to be clarified in at.tracking.particles.beam(). The particle coordinates are generated by numpy.random.normal(). This is a legacy interface, and it is not clear how it behaves in parallel computation. numpy now offers a new implementation, which by the way preferentially uses the PCG method, the one now used in the C part of AT ! The numpy doc even describes how to deal with parallelism.

So we have to also work on the python part of AT. I don't know if there are other uses to random numbers besides at.tracking.particles.beam()`…

lfarv · 2022-11-27T14:16:41Z

Here is a test showing the behaviour of the 2 python RNGs for common or thread-specific random number streams. The new at.random object has 2 attributes, common and thread which are standard numpy.random.Generators:

from at import random, DConstant
from mpi4py import MPI


def run():
    comm = MPI.COMM_WORLD
    size = comm.Get_size()
    rank = comm.Get_rank()
    ismpi = DConstant.mpi
    print("MPI: {}, rank: {}/{}, common:{}, thread:{}".format(ismpi, rank, size,
                                                     random.common.random(),
                                                     random.thread.random()))


if __name__ == '__main__':
    run()

the results is:

(mpi39) bijou:MPI $ mpiexec -n 4 python test_atrand.py
MPI: True, rank: 2/4, common:0.8699988509120198, thread:0.3919583896455968
MPI: True, rank: 3/4, common:0.8699988509120198, thread:0.8189201849105845
MPI: True, rank: 0/4, common:0.8699988509120198, thread:0.37707540876459267
MPI: True, rank: 1/4, common:0.8699988509120198, thread:0.141339263238754

The at.random.thread generator is ready to be used in the at.beam function!

lfarv · 2022-11-27T14:25:48Z

Similarly, 2 random generators are available to C integrators: the struct parameters *Param argument of each trackFunction has 2 new fields:

pcg32_random_t *common_rng for a generator which will produce identical numbers in all threads,
pcg32_random_t *thread_rng for a generator which will produce different numbers in different threads.

As examples, quantum diffusion elements should use the thread_rng generator, while the variable element should use common_rng.

These generators are available with MPI, python multiprocessing and OpenMP.

lfarv · 2022-11-27T14:27:39Z

This PR is ready for merging. It does not change any integrator nor the at.beam function. These will be upgraded in upcoming pull requests

swhite2401 · 2022-11-28T10:27:22Z

Hello @lfarv, seems fine, I think we can start disseminating this in the passmethods!
I don't think there is anything else than the particle generation in python for now.

lfarv added WIP work in progress Matlab For Matlab/Octave AT code Python For python AT code labels Nov 4, 2022

lfarv requested review from swhite2401, carmignani, lnadolski, MJGaughran and simoneliuzzo November 4, 2022 16:31

lfarv marked this pull request as draft November 7, 2022 09:56

lfarv force-pushed the random_generators_for_multiprocessing branch from 3e7383a to c7489de Compare November 16, 2022 13:38

lfarv force-pushed the random_generators_for_multiprocessing branch from 0565619 to efd7587 Compare November 27, 2022 13:49

lfarv removed the WIP work in progress label Nov 27, 2022

lfarv marked this pull request as ready for review November 27, 2022 14:14

swhite2401 approved these changes Nov 28, 2022

View reviewed changes

lfarv added 5 commits November 28, 2022 13:21

new random generator

0ffc5c1

merged from master

f357b19

send process ident

4cdc8f3

Fortran ordering in r_in

b1fef08

sync with patpass

5072991

lfarv added 7 commits November 28, 2022 13:21

restore test_patpass

f5b0d76

python common and thread-specific RNGs

dbe495e

C common and thread-specific RNGs

c69b950

merged from ;aster

3a54db4

Initializers in atrandom.c

59ee217

removed the id field

fbba7ed

taken from master

538206e

lfarv force-pushed the random_generators_for_multiprocessing branch from dd168e4 to 538206e Compare November 28, 2022 12:23

"inc" initializer must be odd

27ed2ec

MJGaughran approved these changes Nov 29, 2022

View reviewed changes

lfarv merged commit b5614c0 into master Nov 29, 2022

lfarv deleted the random_generators_for_multiprocessing branch November 29, 2022 21:10

lfarv mentioned this pull request Dec 1, 2022

Update quantum diffusion pass methods #534

Merged

lfarv mentioned this pull request Jun 7, 2023

Release 0.4.0 #616

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

random generators compatibility with parallel processing #513

random generators compatibility with parallel processing #513

lfarv commented Nov 4, 2022

lfarv commented Nov 4, 2022

lfarv commented Nov 4, 2022

swhite2401 commented Nov 7, 2022

lfarv commented Nov 7, 2022

swhite2401 commented Nov 7, 2022

lfarv commented Nov 9, 2022

lfarv commented Nov 9, 2022

lfarv commented Nov 27, 2022

lfarv commented Nov 27, 2022

lfarv commented Nov 27, 2022

swhite2401 commented Nov 28, 2022

random generators compatibility with parallel processing #513

random generators compatibility with parallel processing #513

Conversation

lfarv commented Nov 4, 2022

lfarv commented Nov 4, 2022

lfarv commented Nov 4, 2022

swhite2401 commented Nov 7, 2022

lfarv commented Nov 7, 2022

swhite2401 commented Nov 7, 2022

lfarv commented Nov 9, 2022

lfarv commented Nov 9, 2022

lfarv commented Nov 27, 2022

lfarv commented Nov 27, 2022

lfarv commented Nov 27, 2022

swhite2401 commented Nov 28, 2022