New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decompose to the correct boxsize for surveydata #471

Merged
merged 3 commits into from Apr 14, 2018

Conversation

Projects
None yet
4 participants
@rainwoodman
Member

rainwoodman commented Apr 11, 2018

Survey data does not start from [0, 0, 0]. Thus we need to set
the left edge and right edge properly.

This is one of the cause of a crash in @martinjameswhite's script.

Decompose to the correct boxsize for surveydata
Survey data does not start from [0, 0, 0]. Thus we need to set
the left edge and right edge properly.

This is one of the cause of a crash in @martinjameswhite's script.

@rainwoodman rainwoodman requested a review from martinjameswhite Apr 11, 2018

@martinjameswhite

This comment has been minimized.

martinjameswhite commented Apr 11, 2018

I don't understand the technical issue, but I agree the survey data would normally have the observer at the origin and so a box which surrounds the origin.

@rainwoodman

This comment has been minimized.

Member

rainwoodman commented Apr 11, 2018

Yes. And I think we will need technical reviewers.

@rainwoodman rainwoodman requested a review from michaelJwilson Apr 11, 2018

@rainwoodman

This comment has been minimized.

Member

rainwoodman commented Apr 12, 2018

I tested the fix lets your script run though.

The question is how to deliver it before we tag a new release of nbodykit.

I can certainly merge this as deploy the master branch of nbodykit to bccpdesi, but I would suggest a decentralized way -- if each user have their own conda environment then there is no single point of failure.

(e.g. I did the testing in my own environment without touching bccpdesi, which is read-only to my nersc account as well).

@rainwoodman

This comment has been minimized.

Member

rainwoodman commented Apr 12, 2018

Here is how it looks when it worked:

(bccp) yfeng1@cori08:/global/cscratch1/sd/mwhite/DESI/DarkSky> python -m pdb ~/tmp/two_point.py elg 0.8 0.801
> /global/homes/y/yfeng1/tmp/two_point.py(6)<module>()
-> import numpy as np
(Pdb) r
/global/homes/y/yfeng1/.conda/envs/bccp/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Data has size:  111188
Rand has size:  724828
[ 000001.19 ]   0: 04-11 17:07  SurveyDataPairCount INFO     using cpu grid decomposition: (1, 1, 1)
[ 000001.77 ]   0: 04-11 17:08  SurveyDataPairCount INFO     position variable range on rank 0 (max, min) = [1946.62996486 1820.80787017 1891.68533076], [-1947.1250374  -1907.56946594  -657.35840959]
[ 000001.82 ]   0: 04-11 17:08  SurveyDataPairCount INFO     correlating 111188 x 111188 objects in total
[ 000001.83 ]   0: 04-11 17:08  SurveyDataPairCount INFO     correlating A x B = 111188 x 111188 objects (median) per rank
[ 000001.83 ]   0: 04-11 17:08  SurveyDataPairCount INFO     min A load per rank = 111188
[ 000001.83 ]   0: 04-11 17:08  SurveyDataPairCount INFO     max A load per rank = 111188
[ 000001.83 ]   0: 04-11 17:08  SurveyDataPairCount INFO     (even distribution would result in 111188 x 111188)
[ 000001.84 ]   0: 04-11 17:08  MPICorrfuncCallable INFO     calling function 'Corrfunc.mocks.DDsmu_mocks.DDsmu_mocks'
[ 000003.07 ]   0: 04-11 17:08  SurveyDataPairCount INFO     using cpu grid decomposition: (1, 1, 1)
[ 000004.31 ]   0: 04-11 17:08  SurveyDataPairCount INFO     position variable range on rank 0 (max, min) = [1947.06468152 1821.4905677  1892.19278023], [-1947.15302515 -1909.30545604  -663.65326608]
[ 000004.66 ]   0: 04-11 17:08  SurveyDataPairCount INFO     correlating 724828 x 724828 objects in total
[ 000004.67 ]   0: 04-11 17:08  SurveyDataPairCount INFO     correlating A x B = 724828 x 724828 objects (median) per rank
[ 000004.67 ]   0: 04-11 17:08  SurveyDataPairCount INFO     min A load per rank = 724828
[ 000004.67 ]   0: 04-11 17:08  SurveyDataPairCount INFO     max A load per rank = 724828
[ 000004.67 ]   0: 04-11 17:08  SurveyDataPairCount INFO     (even distribution would result in 724828 x 724828)
[ 000004.69 ]   0: 04-11 17:08  MPICorrfuncCallable INFO     calling function 'Corrfunc.mocks.DDsmu_mocks.DDsmu_mocks'
[ 000071.19 ]   0: 04-11 17:09  SurveyDataPairCount INFO     using cpu grid decomposition: (1, 1, 1)
[ 000072.87 ]   0: 04-11 17:09  SurveyDataPairCount INFO     position variable range on rank 0 (max, min) = [1947.06468152 1821.4905677  1892.19278023], [-1947.15302515 -1909.30545604  -663.65326608]
[ 000073.07 ]   0: 04-11 17:09  SurveyDataPairCount INFO     correlating 111188 x 724828 objects in total
[ 000073.07 ]   0: 04-11 17:09  SurveyDataPairCount INFO     correlating A x B = 111188 x 724828 objects (median) per rank
[ 000073.07 ]   0: 04-11 17:09  SurveyDataPairCount INFO     min A load per rank = 111188
[ 000073.07 ]   0: 04-11 17:09  SurveyDataPairCount INFO     max A load per rank = 111188
[ 000073.07 ]   0: 04-11 17:09  SurveyDataPairCount INFO     (even distribution would result in 111188 x 724828)
[ 000073.09 ]   0: 04-11 17:09  MPICorrfuncCallable INFO     calling function 'Corrfunc.mocks.DDsmu_mocks.DDsmu_mocks'
--Return--
> /global/homes/y/yfeng1/tmp/two_point.py(83)<module>()->None
@martinjameswhite

This comment has been minimized.

martinjameswhite commented Apr 12, 2018

@rainwoodman rainwoodman requested a review from eelregit Apr 12, 2018

@rainwoodman

This comment has been minimized.

Member

rainwoodman commented Apr 12, 2018

It is the first time I seriously read this code written by Nick -- optimally I'd like another person taking a look at the technical before merging this. If nothing happens by tomorrow I'll merge it regardless.

@michaelJwilson

This comment has been minimized.

michaelJwilson commented Apr 12, 2018

@nickhand

This comment has been minimized.

Member

nickhand commented Apr 12, 2018

@rainwoodman happy to take a look at this tomorrow if you describe a bit more what the issue is. Related to position for survey data going from -L/2 to L/2?

Hopefully the code wasn’t too unreadable ;)

@rainwoodman

This comment has been minimized.

Member

rainwoodman commented Apr 12, 2018

@nickhand

Survey data was decomposed assuming a periodic [0, boxsize] box. But it actually is neither periodic nor positive definite (DEC < 0 would give negative z). As a result, some points are wrapped to wrong processors.

As an example, running on a single rank, some points near 0 are wrapped to an out-of-bound rank (rank 1), and produces an exception pmesh.

@rainwoodman

This comment has been minimized.

Member

rainwoodman commented Apr 12, 2018

@michaelJwilson :

In the current practice, we usually only installed released version of softwares. For this PR to go into a bccpdesi, three steps must be done:

  1. Merging the PR;
  2. Tagging a release;
  3. Installing into bccpdesi;

Tagging a release is a somewhat formal process (changelog, etc), and since I've just tagged a release this week, I don't feel comfortable tagging another in just two days for a single bug fix.

However, bccpdesi relies on SurveyData 2 point functions.

A way to ease this tension is to deploy a non-released version (head of master branch after this PR is merged) to bccpdesi.

Single point of failure:

Imagine the case where if maintainer (currently me) of the bccpdesi environment, is suddenly ran over by a bus or abduct by aliens; no one will be deploying these updates. However if each user maintain their own environment, then there is no such concern -- de-centralized. Of course the precondition is that the environment maintenance is sufficiently straight-forward, which may not be the case to some.

@rainwoodman rainwoodman merged commit af37af8 into bccp:master Apr 14, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.003%) to 95.373%
Details

@rainwoodman rainwoodman deleted the rainwoodman:surveydatabounds branch Apr 14, 2018

@rainwoodman

This comment has been minimized.

Member

rainwoodman commented Apr 14, 2018

@martinjameswhite I merged this and deployed it to cori's bccpdesi env.

got it to the point

(bccpdesi) yfeng1@cori06:/global/cscratch1/sd/mwhite/DESI/DarkSky> python two_point.py elg 0.8 0.801
/usr/common/contrib/bccp/anaconda3/envs/bccpdesi/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Data has size:  111188
Rand has size:  72482788
[ 000004.91 ]   0: 04-13 20:32  SurveyDataPairCount INFO     using cpu grid decomposition: (1, 1, 1)
[ 000005.16 ]   0: 04-13 20:32  SurveyDataPairCount INFO     position variable range on rank 0 (max, min) = [1946.62996486 1820.80787017 1891.68533076], [-1947.1250374  -1907.56946594  -657.35840959]
[ 000005.21 ]   0: 04-13 20:32  SurveyDataPairCount INFO     correlating 111188 x 111188 objects in total
[ 000005.21 ]   0: 04-13 20:32  SurveyDataPairCount INFO     correlating A x B = 111188 x 111188 objects (median) per rank
[ 000005.21 ]   0: 04-13 20:32  SurveyDataPairCount INFO     min A load per rank = 111188
[ 000005.21 ]   0: 04-13 20:32  SurveyDataPairCount INFO     max A load per rank = 111188
[ 000005.21 ]   0: 04-13 20:32  SurveyDataPairCount INFO     (even distribution would result in 111188 x 111188)
[ 000005.22 ]   0: 04-13 20:32  MPICorrfuncCallable INFO     calling function 'Corrfunc.mocks.DDsmu_mocks.DDsmu_mocks'
[ 000006.38 ]   0: 04-13 20:32  SurveyDataPairCount INFO     using cpu grid decomposition: (1, 1, 1)
[ 000080.05 ]   0: 04-13 20:33  SurveyDataPairCount INFO     position variable range on rank 0 (max, min) = [1947.20344211 1823.55088511 1893.06785805], [-1947.20548096 -1909.82602368  -664.97224893]

paircounting 72 million points with a single rank is probably going to take a very long time.. The script needs to be MPI-lized.

@michaelJwilson

This comment has been minimized.

michaelJwilson commented Apr 14, 2018

@michaelJwilson

This comment has been minimized.

michaelJwilson commented Apr 14, 2018

@rainwoodman

This comment has been minimized.

Member

rainwoodman commented Apr 16, 2018

@michaelJwilson PR = pull request. In our simplified process tag = release.

I can't find the place the releasing process is written. For most packages we do, it is: editing version.py to the release, git commit, git tag, git push, then edit version.py to the future 'dev' version, and do it all over again. The rest (PyPI, CI=continuous integration) are done automatically. It means each step you'll have to wait till the green light is on and immediate manual intervention is necessary if things turn sour.

For nbodykit there is an additional step of updating changelog to cross reference document all recent changes -- bug fixes, new features, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment