-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update collectionsoa.py to allow user defined MPI partitioning #1414
Conversation
This code allows users to change the function which determines which particles are run on which MPI jobs using the function setPartitionFunction().
for more information, see https://pre-commit.ci
Oops. I created this branch only a few days ago, but apparently many changes have been made to collectionsoa.py in that time. It shows that there are many changes in the code, but there are really only two that are mine -- in the new code, lines 29-83 and 131-137. I am afraid that my unfamiliarity with gitHub is showing... |
Co-authored-by: Erik van Sebille <e.vansebille@uu.nl>
Co-authored-by: Erik van Sebille <e.vansebille@uu.nl>
@erikvansebille I was trying to do this with as little change to the code as possible, since I had not originally thought of putting it into the master branch of the code. However, what you say makes sense, but I need a little time to dive into the code to make sure I understand how particle set creation integrates with functions like .from_line() and .from_list(). I will try to add a unit test, if it seems straight forward. I will be able to get back to this once I get some reviews off my desk... |
Co-authored-by: Erik van Sebille <e.vansebille@uu.nl>
Co-authored-by: Erik van Sebille <e.vansebille@uu.nl>
With these changes, any particleSet creation function should take a kwarg of partitionFunction which defines the partitioning of particles to different MPI jobs/ranks
for more information, see https://pre-commit.ci
Update particle set to specify partitionFunction
This PR is a follow-up for #1414
This code allows users to change the function which determines which particles are run on which MPI jobs using the function setPartitionFunction(). Attached to this pull request is a modified version of example_stommel.py which uses this functionality, and run_example_stommel.py which shows how to implement different partitioning schemes. Using this partitioning scheme on my global runs saves me about 20% time, which for my 10 days runs is worth noticing.
To use this function, a new partitioning function must be created with two arguments:
(coords,mpi_size=1)
The arguements and output areThe existing partitioning function in this format is now
One example I have found useful is a function that requires that the number of particles in each MPI job is roughly equal. This prevents the default KMeans algorithm from making small clusters around, for example, the Hawaiian islands. These unequal sizes of MPI jobs leads to unequal allocation of compute resources, and long runs as some MPI processes take much longer to finish. To make the equal allocation of particles, I use a constrained KMeans algorithm. This can be very slow, so I include an option (ncull) to do the initial clustering on a sub-set of the particles. It is important to note that this new partitioning function does NOT need to be included in the parcels distribution -- it is entirely created by the user of parcels.
This code, and the following example of its use, come from the attached
example_stommel.py
. To use this function, we must import the setPartitionFunction() withfrom parcels.collection.collectionsoa import setPartitionFunction
and BEFORE making the particle set, we must set the new function to be used withsetPartitionFunction(partitionParticles4MPI_KMeans_constrained)
.I have attached figures for an example in which the initial particle positions are 4 clumps of particles with greatly different numbers of particles. The default KMeans code works correctly, which means it successfully identifies the spatially separate clumps, and so creates MPI jobs with very different numbers of particles.
The constrained KMeans breaks the particles into less compact but more equally sized groups.
Now, there are clear trade-offs between equal size MPI jobs and locality of particles. But in my case, I have found equal size particles to be a big win.
If yall like where this is going, I can write up some documentation for it.
codeForStommelExample.zip