-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collective effects with MPI + improve interface to passmethod impedance #320
Conversation
…ong, so no multiturn wakes were being applied. Second, the slices did not act on themselves which was wrong. Now the buffer sign is correct and ds=0 is allowed to pass into the slice loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK for me, I trust you for the collective effects.
Unfortunately, at the moment, one has to choose between OpenMP for tracking except WakefieldPass and MPI for Wakefields. So:
- since there is no overlap between both, can one activate both together ?
- An OpenMP option for Wakefields should be easy to add (one directive for each loop). For those who don't have a cluster, this should be very efficient. But then it's one OR the other.
In fact I thought about it and wanted to discuss it with you because it is not obvious what will happen if both are activated. So what I wanted to propose is: This should prevent unwanted combinations, any suggestions? |
I tried to activate both. No problem up to now, but I did not check OpenMP performance yet, and I miss data to test wakefields. Code outside the OpenMP directives is unchanged, and MPI is apparently limited to library calls (but I don't know anything about MPI), so I don't expect problems, but it needs more checks. |
Hi @swhite2401 . I started looking at the implementation of collective effects: very impressive ! I think it deserves some documentation (possibly separate from this PR). If you have anything, I can take care of the integration in the Web site. I have a question about the python part: as far as I could see, there is no interaction between python and MPI. It seems that python can totally ignore whether Concerning the C part, I have to look briefly at MPI… |
You are right, but for the MPI commands in C to work MPI needs to be initialized. This can be done in C by calling
This is why I have in my todo list to write an mpi wrapper class to handle all this. I can write some documentation with benchmarks added, let's wait for @lcarver study. Very good you are looking into this! Any suggestions more than welcome my MPI skills are certainly not that great, do you need example scripts using mpirun? |
So if I understand, a number of python processes will be started simultaneously by And yes, I am interested in example scripts! |
(I had to upload the python script as a txt file...you will have to change the extension yourself) Hi @lfarv, I converted one of the scripts I had and simplified it as much as I could for an example of a longitudinal resonator wakefunction with mpi. While it uses the same parameters as the plot I showed above for the benchmarking, there are subtle differences between this script and the script that generate the figure above (for the figure I used a wakepotential that was identical to the one used in the Haissinski solver and also with more particles and slices and some post processing etc, here we use a simple longitudinal resonator wake function with no post processing). This script can be run with (for 5 cores) mpirun -n 5 python example_resonator.py For me on grappa, 1 core takes 83.9 seconds with these settings and with 10 cores it takes 11.5 seconds (for the tracking part of course). |
@lfarv mpirun will launch |
Hi @swhite2401, can you send me your S28F.mat ? Thanks
|
S28F.txt |
@swhite2401: I ran your test on my Mac 4 cores), and here is what I get:
But then I modified the script to remove the Wakefield element (just commented
The shape of the result is correct (6, 50000), the bunch shape is symmetrical, so it looks correct (but I'm not sure it is). So does this mean that without any MPI call, tracking is magically parallelised ? Or are the 4 processes doing exactly the same thing, but then why is the time reduced ? And where is the sharing of particles between the processes done ? For me, there are still a number of things to be clarified about MPI… |
the number of particles used in the tracking is 50000/size where size is the number of cores you specified. For the MPI case, you specify 4 cores, each core has 50000/4 particles, then they are brought together at the end. Without the wakefield element, you are simply tracking less particles per core so it is faster. EDIT to add: in the no-wakefield element case, probably what you are doing is splitting the particles among each of the cores, they are still tracked independently with no cross talk, but then at the end the gather still gathers them so you return to shape of (6,50000) |
mpirun will basically launch four copies of your code whether mpi commands are present or not. |
This is what I understand, and crosstalk is not necessary in that case, but the question is: where does this splitting occur ? Is the generation of particles by |
Ah the splitting is implicitly done in this line:
Another possible way is to generate the full beam on rank 0 and then split:
Then at the end you gather the full array on By the way, running the code without Not sure if that answers your questions... |
I think the answer to Laurents question is that the splitting is handled by mpirun, not by python. By launching mpirun, you run the same script on each thread in an environment where you can communicate to the other threads using the comm object. |
Thanks @swhite2401 and @lcarver, it gets clearer ! So I understand that:
So in my opinion, this looks very promising, though it's not yet ready for general use. Concerning the Wakefields, it's ok for me, merge it when you are confident. |
@lfarv, I think that is a good summary! |
I made a comparison of the TMCI threshold and horizontal tune shift. Frustratingly, I was not able to recover the exact longitudinal wake potential that I used in my original simulations. Nonetheless, the TMCI threshold remains the same. As it shows, there are small differences in the tune shift but the threshold remains the same. For the horizontal tune shift. For me this is perfectly fine. I know there are small differences in the longitudinal wake between the two cases, which explains why the tune shifts are not identical. The fact that the horizontal tune shift is close also shows that the quadrupolar wake is cancelling the dipolar wake in horizontal. For me there are no major issues. I have a few other benchmarks ongoing, I am checking the resistive wall wake with theory and PyHEADTAIL and also a transverse broadband resonator. But I think this can be merged before those benchmarks are complete as this passmethod is at least doing the same as impedancetablepass. Edit: I also remind you of the previous comment on this PR that the longitudinal broadband resonator agreed perfectly with the Haissinski formula. |
…cted a typo in NormFact
I set this one a ready for review, please comment in case you see something odd, if not I think we can merge this one |
This branch introduces several changes:
-
WakefieldPass
andatimplib
: comprehensivepassmethod
and library to perform multi-turn tracking in the presence of wakefield. MPI capabilities we added,setup.py
was modified in order to compile withmpicc
-
collective/wake_object
: python object defining a wakefield-
collective/wake_elements
: classes for wake elements creation, to completed with other specific cases-
collective/wake_functions
: collection of analytical wake function, to be completedThe wake can be accumulated over multiple turns and allows to model beamloading using the
LongitudinalResonatorElement
.Full multi-bunch implementation is possible once this is validated.
Single bunch/turn results as identical to the previous
ImpedanceTablePass.c
implementation.Multi-turn results have been benchmarked by @lcarver against
BlonD
andpyHEADTAIL
: this needs to be re-validated with latest developmentsA
c
macro was added tosetup.py
in order to compile or not the MPI related codeThe default behavior of pyAT is preserved: no mpi code is compiled unless the macro is activated and collective effects modules have to be explicitly imported
Build with MPI:
rm -rf build (seems to be needed for unknown reason)
MPI=1 python ./setup.py develop
Several addition needed for future PR:
-multi-bunch implementation
-silent
atpass
to avoid allocating large memory space for multi-particle tracking when only the center of mass is of interest-class to wrap
mpi
functionalities such asgather, broadcast
etc, to simplify actions onnumpy array
such as particle generation-
passmethod
compatible with MPI that writes useful quantities to file