New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

domain decomposition on (ra, dec, z) surveys leads to unbalanced loads #364

Closed
nickhand opened this Issue Jul 17, 2017 · 5 comments

Comments

Projects
None yet
2 participants
@nickhand
Member

nickhand commented Jul 17, 2017

When you convert (ra, dec, z) to Cartesian coordinates, the particles aren't uniformly distributed in the resulting box. The consequence is that when you do the normal grid domain decomposition in SurveyDataPairCount, a lot of ranks have zero or very few particles and the load is unbalanced and a few ranks are responsible for most of the work.

@rainwoodman do you see a potential fix for this, or maybe we just want to remove the domain decomposition?

@rainwoodman

This comment has been minimized.

Member

rainwoodman commented Jul 17, 2017

Does domain decomposition make it slower? I prefer keeping the code formally parallel since there are non-trivial thoughts that goes into it.

We can come up with finer domain decomposition schemes. For example, we can reassign ranks to spatial regions differently and skip empty regions.

@nickhand

This comment has been minimized.

Member

nickhand commented Jul 17, 2017

Yes, I agree that's probably what we should do. That functionality doesn't exist right? I am not that familiar with all of the features of the GridND object, etc.

And yep, the code is significantly slower in this case b/c the distribution of ranks is quite bad. In the pair count algorithm for (ra, dec, z) input, we decompose the position array that's already on the rank (with smoothing=0) and then correlate against the decomposed position (smoothed by r-max). The problem is the number density is very non-constant in the Cartesian box because of the (ra, dec,z) --> (x,y,z) transformation so you end up with a lot of ranks doing very little (or no) work, while one or two ranks do most of the work.

I think we just need to ensure that the number of objects on each rank after that first decomposition (smoothing=0) is relatively even still.

@nickhand

This comment has been minimized.

Member

nickhand commented Sep 21, 2017

Initial tests seem to show that rainwoodman/pmesh#26 fixes this, although you do need to over-decompose the domain grid in order for the load balance to do anything.

Some implementation questions for @rainwoodman. In the past, I think the design we have been using is:

  1. domain decompose pos1 and exchange with smoothing R=0
  2. domain decompose pos2 and exchange with smoothing R=max(bins)
    3 cross-correlation pos1 with pos2

Is step 1 necessary? The pos1 array is already evenly distributed before step 1, and may not be after the exchange. I guess I am wondering why any particles are exchanged at all when the smoothing = 0?

@rainwoodman

This comment has been minimized.

Member

rainwoodman commented Sep 21, 2017

It is evenly distributed but it is not spatially tight -- then it is impossible to ensure the volume in 2 encloses all particles residing on the current process.

@rainwoodman

This comment has been minimized.

Member

rainwoodman commented Sep 21, 2017

So I think you'll have to do the loadbalance with 1, then use the domain assignment of step 1 for step 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment