Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random generators compatibility with parallel processing #513

Merged
merged 13 commits into from
Nov 29, 2022

Conversation

lfarv
Copy link
Contributor

@lfarv lfarv commented Nov 4, 2022

The simple random number generators (RNG) used up to now are not compatible with parallel processing: the streams of numbers in the different threads are likely to be identical. We'll try here to solve this problem.

@lfarv lfarv added WIP work in progress Matlab For Matlab/Octave AT code Python For python AT code labels Nov 4, 2022
@lfarv
Copy link
Contributor Author

lfarv commented Nov 4, 2022

The first step is the implementation of RNGs with an external state variable. 3 are implemented (note the _r suffix indicating the external state):

  • double atrandd_r(pcg32_random_t *rng) for uniform [0, 1] distribution
  • double atrandn_r(pcg32_random_t *rng) for Gaussian distribution
  • int atrandp_r(pcg32_random_t *rng, double lambda) for Poisson distribution

3 equivalent generators use an internal state variable, for compatibility:

  • double atrandd() for uniform [0, 1] distribution
  • double atrandn() for Gaussian distribution
  • int atrandp(double lambda) for Poisson distribution

The state variable is a private type, and there are also 2 functions to initialise it (set a new seed):

  • void pcg32_srandom_r(pcg32_random_t* rng, uint64_t initstate, uint64_t initseq) for external seed
  • void pcg32_srandom(uint64_t initstate, uint64_t initseq) for internal seed

Note that the "seed" is now a set of two 64bit integers. The idea is that initstate is the real seed, and initseq is a "stream selector" , which allows to have different streams which don't overlap.
So for the moment, I removed the init_seed function which is too limited for that.

Up to now, except that the RNG is no more reinitialised at the beginning of VariableMPolePass, nothing is changed.

@lfarv
Copy link
Contributor Author

lfarv commented Nov 4, 2022

Now how to use that:

  • either we initialise the state variables once (and provide a way to reset them to be able to "replay" a sequence). That's for MPI and hopefully python multiprocessing if we find a way to identify its threads. One could add 2 state variables in the "struct param" structure accessible in all integrators. One with the initseq parameter including the thread id so that we get independent streams in each thread (for quantum lifetime). The other one not, so that all threads will follow the same stream (for variable multipole). For OpenMP, there would be a single stream, as without multiprocessing, but we could "share" the state variable.

    The initialisation might be done on the 1st pass in atpass

  • or we initialise the state variable on each pass in the integrator. Then we need another level of RNG to provide a new seed on each pass (or use the turn number ? But what if there are several identical elements in the lattice ?). Then everything is contained in the integrator, but it looks more complicated. Is the overhead acceptable?

There may be many other solutions, please feel free to propose !

@swhite2401
Copy link
Contributor

Hello @lfarv , thanks for looking into this, what you propose seems good to me.

I personally would go for the simpler option 1. I do not see what would be the practical advantages of handling everything in the integrator, is there a good reason to push for that or is it just to maintain the present strategy?

The identification of the threads of the python multiprocessing has to be done in the python and passed to atpass I believe, you can probably use multiprocessing.current_process() for that, to be tested.

@lfarv
Copy link
Contributor Author

lfarv commented Nov 7, 2022

Hi @swhite2401 : I also have a preference for option 1: it makes integrators simpler, and easier allows for resetting the seeds, though it's not obvious (atpass is the single entry point). There are however a few delicate points: for the "single" random stream (variable multipole), all the threads should generate the same numbers, so we must take care that no thread misses a RNG call (because of a test causing skipping a call, for instance). Should we check at the end of tracking that all threads are still in sync?
For OpenMP, in the "shared state" option, one must take care of race conditions, using "critical blocks" or "atomic " variables. There may be a performance penalty for that, I have no experience with that, so it'll take some time to test…

Following you proposal, I'll start in this branch to experiment with option 1, and try to set up "test integrators" to check the behaviour.

@lfarv lfarv marked this pull request as draft November 7, 2022 09:56
@swhite2401
Copy link
Contributor

Sounds good.

How would you check the sync condition? You would count the number of calls?

The check would be useful only for the python multiprocessing, in the case of MPI we can check this during the tracking as processes can communicate with each other.

Race conditions are a tricky problem...however a simple solution would be to not implement openMP in the variable multipole element, this is just one element after all so the performance loss may not be so significant.

@lfarv
Copy link
Contributor Author

lfarv commented Nov 9, 2022

How would you check the sync condition? You would count the number of calls?

No, after tracking is finished I would ask for another random value and check they are equal in all threads!

In the mean time, I found an unexpected behaviour of patpass: It starts one process per particle ! There are indeed as many simultaneous processes as specified in pool_size, but new processes are started as soon as the first ones are terminated.

This is far from optimal, and it makes very difficult to have all processes follow a single stream of random numbers for the variable multipole.

So I'll open another PR to solve this first.

@lfarv
Copy link
Contributor Author

lfarv commented Nov 9, 2022

I also found another point to be clarified in at.tracking.particles.beam(). The particle coordinates are generated by numpy.random.normal(). This is a legacy interface, and it is not clear how it behaves in parallel computation. numpy now offers a new implementation, which by the way preferentially uses the PCG method, the one now used in the C part of AT ! The numpy doc even describes how to deal with parallelism.

So we have to also work on the python part of AT. I don't know if there are other uses to random numbers besides at.tracking.particles.beam()`…

@lfarv lfarv force-pushed the random_generators_for_multiprocessing branch from 3e7383a to c7489de Compare November 16, 2022 13:38
@lfarv lfarv force-pushed the random_generators_for_multiprocessing branch from 0565619 to efd7587 Compare November 27, 2022 13:49
@lfarv lfarv removed the WIP work in progress label Nov 27, 2022
@lfarv lfarv marked this pull request as ready for review November 27, 2022 14:14
@lfarv
Copy link
Contributor Author

lfarv commented Nov 27, 2022

Here is a test showing the behaviour of the 2 python RNGs for common or thread-specific random number streams. The new at.random object has 2 attributes, common and thread which are standard numpy.random.Generators:

from at import random, DConstant
from mpi4py import MPI


def run():
    comm = MPI.COMM_WORLD
    size = comm.Get_size()
    rank = comm.Get_rank()
    ismpi = DConstant.mpi
    print("MPI: {}, rank: {}/{}, common:{}, thread:{}".format(ismpi, rank, size,
                                                     random.common.random(),
                                                     random.thread.random()))


if __name__ == '__main__':
    run()

the results is:

(mpi39) bijou:MPI $ mpiexec -n 4 python test_atrand.py
MPI: True, rank: 2/4, common:0.8699988509120198, thread:0.3919583896455968
MPI: True, rank: 3/4, common:0.8699988509120198, thread:0.8189201849105845
MPI: True, rank: 0/4, common:0.8699988509120198, thread:0.37707540876459267
MPI: True, rank: 1/4, common:0.8699988509120198, thread:0.141339263238754

The at.random.thread generator is ready to be used in the at.beam function!

@lfarv
Copy link
Contributor Author

lfarv commented Nov 27, 2022

Similarly, 2 random generators are available to C integrators: the struct parameters *Param argument of each trackFunction has 2 new fields:

  1. pcg32_random_t *common_rng for a generator which will produce identical numbers in all threads,
  2. pcg32_random_t *thread_rng for a generator which will produce different numbers in different threads.

As examples, quantum diffusion elements should use the thread_rng generator, while the variable element should use common_rng.

These generators are available with MPI, python multiprocessing and OpenMP.

@lfarv
Copy link
Contributor Author

lfarv commented Nov 27, 2022

This PR is ready for merging. It does not change any integrator nor the at.beam function. These will be upgraded in upcoming pull requests

@swhite2401
Copy link
Contributor

Hello @lfarv, seems fine, I think we can start disseminating this in the passmethods!
I don't think there is anything else than the particle generation in python for now.

@lfarv lfarv force-pushed the random_generators_for_multiprocessing branch from dd168e4 to 538206e Compare November 28, 2022 12:23
@lfarv lfarv merged commit b5614c0 into master Nov 29, 2022
@lfarv lfarv deleted the random_generators_for_multiprocessing branch November 29, 2022 21:10
@lfarv lfarv mentioned this pull request Jun 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Matlab For Matlab/Octave AT code Python For python AT code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants