Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decompose to the correct boxsize for surveydata #471

Merged
merged 3 commits into from
Apr 14, 2018

Conversation

rainwoodman
Copy link
Member

Survey data does not start from [0, 0, 0]. Thus we need to set
the left edge and right edge properly.

This is one of the cause of a crash in @martinjameswhite's script.

Survey data does not start from [0, 0, 0]. Thus we need to set
the left edge and right edge properly.

This is one of the cause of a crash in @martinjameswhite's script.
@martinjameswhite
Copy link

I don't understand the technical issue, but I agree the survey data would normally have the observer at the origin and so a box which surrounds the origin.

@rainwoodman
Copy link
Member Author

Yes. And I think we will need technical reviewers.

@rainwoodman
Copy link
Member Author

I tested the fix lets your script run though.

The question is how to deliver it before we tag a new release of nbodykit.

I can certainly merge this as deploy the master branch of nbodykit to bccpdesi, but I would suggest a decentralized way -- if each user have their own conda environment then there is no single point of failure.

(e.g. I did the testing in my own environment without touching bccpdesi, which is read-only to my nersc account as well).

@rainwoodman
Copy link
Member Author

Here is how it looks when it worked:

(bccp) yfeng1@cori08:/global/cscratch1/sd/mwhite/DESI/DarkSky> python -m pdb ~/tmp/two_point.py elg 0.8 0.801
> /global/homes/y/yfeng1/tmp/two_point.py(6)<module>()
-> import numpy as np
(Pdb) r
/global/homes/y/yfeng1/.conda/envs/bccp/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Data has size:  111188
Rand has size:  724828
[ 000001.19 ]   0: 04-11 17:07  SurveyDataPairCount INFO     using cpu grid decomposition: (1, 1, 1)
[ 000001.77 ]   0: 04-11 17:08  SurveyDataPairCount INFO     position variable range on rank 0 (max, min) = [1946.62996486 1820.80787017 1891.68533076], [-1947.1250374  -1907.56946594  -657.35840959]
[ 000001.82 ]   0: 04-11 17:08  SurveyDataPairCount INFO     correlating 111188 x 111188 objects in total
[ 000001.83 ]   0: 04-11 17:08  SurveyDataPairCount INFO     correlating A x B = 111188 x 111188 objects (median) per rank
[ 000001.83 ]   0: 04-11 17:08  SurveyDataPairCount INFO     min A load per rank = 111188
[ 000001.83 ]   0: 04-11 17:08  SurveyDataPairCount INFO     max A load per rank = 111188
[ 000001.83 ]   0: 04-11 17:08  SurveyDataPairCount INFO     (even distribution would result in 111188 x 111188)
[ 000001.84 ]   0: 04-11 17:08  MPICorrfuncCallable INFO     calling function 'Corrfunc.mocks.DDsmu_mocks.DDsmu_mocks'
[ 000003.07 ]   0: 04-11 17:08  SurveyDataPairCount INFO     using cpu grid decomposition: (1, 1, 1)
[ 000004.31 ]   0: 04-11 17:08  SurveyDataPairCount INFO     position variable range on rank 0 (max, min) = [1947.06468152 1821.4905677  1892.19278023], [-1947.15302515 -1909.30545604  -663.65326608]
[ 000004.66 ]   0: 04-11 17:08  SurveyDataPairCount INFO     correlating 724828 x 724828 objects in total
[ 000004.67 ]   0: 04-11 17:08  SurveyDataPairCount INFO     correlating A x B = 724828 x 724828 objects (median) per rank
[ 000004.67 ]   0: 04-11 17:08  SurveyDataPairCount INFO     min A load per rank = 724828
[ 000004.67 ]   0: 04-11 17:08  SurveyDataPairCount INFO     max A load per rank = 724828
[ 000004.67 ]   0: 04-11 17:08  SurveyDataPairCount INFO     (even distribution would result in 724828 x 724828)
[ 000004.69 ]   0: 04-11 17:08  MPICorrfuncCallable INFO     calling function 'Corrfunc.mocks.DDsmu_mocks.DDsmu_mocks'
[ 000071.19 ]   0: 04-11 17:09  SurveyDataPairCount INFO     using cpu grid decomposition: (1, 1, 1)
[ 000072.87 ]   0: 04-11 17:09  SurveyDataPairCount INFO     position variable range on rank 0 (max, min) = [1947.06468152 1821.4905677  1892.19278023], [-1947.15302515 -1909.30545604  -663.65326608]
[ 000073.07 ]   0: 04-11 17:09  SurveyDataPairCount INFO     correlating 111188 x 724828 objects in total
[ 000073.07 ]   0: 04-11 17:09  SurveyDataPairCount INFO     correlating A x B = 111188 x 724828 objects (median) per rank
[ 000073.07 ]   0: 04-11 17:09  SurveyDataPairCount INFO     min A load per rank = 111188
[ 000073.07 ]   0: 04-11 17:09  SurveyDataPairCount INFO     max A load per rank = 111188
[ 000073.07 ]   0: 04-11 17:09  SurveyDataPairCount INFO     (even distribution would result in 111188 x 724828)
[ 000073.09 ]   0: 04-11 17:09  MPICorrfuncCallable INFO     calling function 'Corrfunc.mocks.DDsmu_mocks.DDsmu_mocks'
--Return--
> /global/homes/y/yfeng1/tmp/two_point.py(83)<module>()->None

@martinjameswhite
Copy link

martinjameswhite commented Apr 12, 2018 via email

@rainwoodman
Copy link
Member Author

It is the first time I seriously read this code written by Nick -- optimally I'd like another person taking a look at the technical before merging this. If nothing happens by tomorrow I'll merge it regardless.

@michaelJwilson
Copy link

michaelJwilson commented Apr 12, 2018 via email

@nickhand
Copy link
Member

@rainwoodman happy to take a look at this tomorrow if you describe a bit more what the issue is. Related to position for survey data going from -L/2 to L/2?

Hopefully the code wasn’t too unreadable ;)

@rainwoodman
Copy link
Member Author

@nickhand

Survey data was decomposed assuming a periodic [0, boxsize] box. But it actually is neither periodic nor positive definite (DEC < 0 would give negative z). As a result, some points are wrapped to wrong processors.

As an example, running on a single rank, some points near 0 are wrapped to an out-of-bound rank (rank 1), and produces an exception pmesh.

@rainwoodman
Copy link
Member Author

@michaelJwilson :

In the current practice, we usually only installed released version of softwares. For this PR to go into a bccpdesi, three steps must be done:

  1. Merging the PR;
  2. Tagging a release;
  3. Installing into bccpdesi;

Tagging a release is a somewhat formal process (changelog, etc), and since I've just tagged a release this week, I don't feel comfortable tagging another in just two days for a single bug fix.

However, bccpdesi relies on SurveyData 2 point functions.

A way to ease this tension is to deploy a non-released version (head of master branch after this PR is merged) to bccpdesi.

Single point of failure:

Imagine the case where if maintainer (currently me) of the bccpdesi environment, is suddenly ran over by a bus or abduct by aliens; no one will be deploying these updates. However if each user maintain their own environment, then there is no such concern -- de-centralized. Of course the precondition is that the environment maintenance is sufficiently straight-forward, which may not be the case to some.

@rainwoodman rainwoodman merged commit af37af8 into bccp:master Apr 14, 2018
@rainwoodman rainwoodman deleted the surveydatabounds branch April 14, 2018 03:28
@rainwoodman
Copy link
Member Author

@martinjameswhite I merged this and deployed it to cori's bccpdesi env.

got it to the point

(bccpdesi) yfeng1@cori06:/global/cscratch1/sd/mwhite/DESI/DarkSky> python two_point.py elg 0.8 0.801
/usr/common/contrib/bccp/anaconda3/envs/bccpdesi/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Data has size:  111188
Rand has size:  72482788
[ 000004.91 ]   0: 04-13 20:32  SurveyDataPairCount INFO     using cpu grid decomposition: (1, 1, 1)
[ 000005.16 ]   0: 04-13 20:32  SurveyDataPairCount INFO     position variable range on rank 0 (max, min) = [1946.62996486 1820.80787017 1891.68533076], [-1947.1250374  -1907.56946594  -657.35840959]
[ 000005.21 ]   0: 04-13 20:32  SurveyDataPairCount INFO     correlating 111188 x 111188 objects in total
[ 000005.21 ]   0: 04-13 20:32  SurveyDataPairCount INFO     correlating A x B = 111188 x 111188 objects (median) per rank
[ 000005.21 ]   0: 04-13 20:32  SurveyDataPairCount INFO     min A load per rank = 111188
[ 000005.21 ]   0: 04-13 20:32  SurveyDataPairCount INFO     max A load per rank = 111188
[ 000005.21 ]   0: 04-13 20:32  SurveyDataPairCount INFO     (even distribution would result in 111188 x 111188)
[ 000005.22 ]   0: 04-13 20:32  MPICorrfuncCallable INFO     calling function 'Corrfunc.mocks.DDsmu_mocks.DDsmu_mocks'
[ 000006.38 ]   0: 04-13 20:32  SurveyDataPairCount INFO     using cpu grid decomposition: (1, 1, 1)
[ 000080.05 ]   0: 04-13 20:33  SurveyDataPairCount INFO     position variable range on rank 0 (max, min) = [1947.20344211 1823.55088511 1893.06785805], [-1947.20548096 -1909.82602368  -664.97224893]

paircounting 72 million points with a single rank is probably going to take a very long time.. The script needs to be MPI-lized.

@michaelJwilson
Copy link

michaelJwilson commented Apr 14, 2018 via email

@michaelJwilson
Copy link

michaelJwilson commented Apr 14, 2018 via email

@rainwoodman
Copy link
Member Author

@michaelJwilson PR = pull request. In our simplified process tag = release.

I can't find the place the releasing process is written. For most packages we do, it is: editing version.py to the release, git commit, git tag, git push, then edit version.py to the future 'dev' version, and do it all over again. The rest (PyPI, CI=continuous integration) are done automatically. It means each step you'll have to wait till the green light is on and immediate manual intervention is necessary if things turn sour.

For nbodykit there is an additional step of updating changelog to cross reference document all recent changes -- bug fixes, new features, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants