domain decomposition on (ra, dec, z) surveys leads to unbalanced loads #364

nickhand · 2017-07-17T15:13:22Z

When you convert (ra, dec, z) to Cartesian coordinates, the particles aren't uniformly distributed in the resulting box. The consequence is that when you do the normal grid domain decomposition in SurveyDataPairCount, a lot of ranks have zero or very few particles and the load is unbalanced and a few ranks are responsible for most of the work.

@rainwoodman do you see a potential fix for this, or maybe we just want to remove the domain decomposition?

The text was updated successfully, but these errors were encountered:

rainwoodman · 2017-07-17T20:29:34Z

Does domain decomposition make it slower? I prefer keeping the code formally parallel since there are non-trivial thoughts that goes into it.

We can come up with finer domain decomposition schemes. For example, we can reassign ranks to spatial regions differently and skip empty regions.

nickhand · 2017-07-17T21:05:11Z

Yes, I agree that's probably what we should do. That functionality doesn't exist right? I am not that familiar with all of the features of the GridND object, etc.

And yep, the code is significantly slower in this case b/c the distribution of ranks is quite bad. In the pair count algorithm for (ra, dec, z) input, we decompose the position array that's already on the rank (with smoothing=0) and then correlate against the decomposed position (smoothed by r-max). The problem is the number density is very non-constant in the Cartesian box because of the (ra, dec,z) --> (x,y,z) transformation so you end up with a lot of ranks doing very little (or no) work, while one or two ranks do most of the work.

I think we just need to ensure that the number of objects on each rank after that first decomposition (smoothing=0) is relatively even still.

nickhand · 2017-09-21T15:37:27Z

Initial tests seem to show that rainwoodman/pmesh#26 fixes this, although you do need to over-decompose the domain grid in order for the load balance to do anything.

Some implementation questions for @rainwoodman. In the past, I think the design we have been using is:

domain decompose pos1 and exchange with smoothing R=0
domain decompose pos2 and exchange with smoothing R=max(bins)
3 cross-correlation pos1 with pos2

Is step 1 necessary? The pos1 array is already evenly distributed before step 1, and may not be after the exchange. I guess I am wondering why any particles are exchanged at all when the smoothing = 0?

rainwoodman · 2017-09-21T19:44:49Z

It is evenly distributed but it is not spatially tight -- then it is impossible to ensure the volume in 2 encloses all particles residing on the current process.

rainwoodman · 2017-09-21T19:46:14Z

So I think you'll have to do the loadbalance with 1, then use the domain assignment of step 1 for step 2.

nickhand mentioned this issue Aug 9, 2017

Remove domain decomposition from SurveyDataPairCount #371

Merged

nickhand mentioned this issue Sep 21, 2017

Balance the load among different ranks rainwoodman/pmesh#26

Merged

nickhand mentioned this issue Sep 26, 2017

PairCount upgrades (angular pair count + load balance) #408

Merged

nickhand closed this as completed in #408 Sep 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

domain decomposition on (ra, dec, z) surveys leads to unbalanced loads #364

domain decomposition on (ra, dec, z) surveys leads to unbalanced loads #364

nickhand commented Jul 17, 2017

rainwoodman commented Jul 17, 2017

nickhand commented Jul 17, 2017

nickhand commented Sep 21, 2017

rainwoodman commented Sep 21, 2017

rainwoodman commented Sep 21, 2017

domain decomposition on (ra, dec, z) surveys leads to unbalanced loads #364

domain decomposition on (ra, dec, z) surveys leads to unbalanced loads #364

Comments

nickhand commented Jul 17, 2017

rainwoodman commented Jul 17, 2017

nickhand commented Jul 17, 2017

nickhand commented Sep 21, 2017

rainwoodman commented Sep 21, 2017

rainwoodman commented Sep 21, 2017