Decompose to the correct boxsize for surveydata #471

rainwoodman · 2018-04-11T18:06:02Z

Survey data does not start from [0, 0, 0]. Thus we need to set
the left edge and right edge properly.

This is one of the cause of a crash in @martinjameswhite's script.

@martinjameswhite

Survey data does not start from [0, 0, 0]. Thus we need to set the left edge and right edge properly. This is one of the cause of a crash in @martinjameswhite's script.

martinjameswhite · 2018-04-11T18:31:43Z

I don't understand the technical issue, but I agree the survey data would normally have the observer at the origin and so a box which surrounds the origin.

rainwoodman · 2018-04-11T21:06:09Z

Yes. And I think we will need technical reviewers.

rainwoodman · 2018-04-12T00:12:07Z

I tested the fix lets your script run though.

The question is how to deliver it before we tag a new release of nbodykit.

I can certainly merge this as deploy the master branch of nbodykit to bccpdesi, but I would suggest a decentralized way -- if each user have their own conda environment then there is no single point of failure.

(e.g. I did the testing in my own environment without touching bccpdesi, which is read-only to my nersc account as well).

rainwoodman · 2018-04-12T00:12:35Z

Here is how it looks when it worked:

(bccp) yfeng1@cori08:/global/cscratch1/sd/mwhite/DESI/DarkSky> python -m pdb ~/tmp/two_point.py elg 0.8 0.801
> /global/homes/y/yfeng1/tmp/two_point.py(6)<module>()
-> import numpy as np
(Pdb) r
/global/homes/y/yfeng1/.conda/envs/bccp/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Data has size:  111188
Rand has size:  724828
[ 000001.19 ]   0: 04-11 17:07  SurveyDataPairCount INFO     using cpu grid decomposition: (1, 1, 1)
[ 000001.77 ]   0: 04-11 17:08  SurveyDataPairCount INFO     position variable range on rank 0 (max, min) = [1946.62996486 1820.80787017 1891.68533076], [-1947.1250374  -1907.56946594  -657.35840959]
[ 000001.82 ]   0: 04-11 17:08  SurveyDataPairCount INFO     correlating 111188 x 111188 objects in total
[ 000001.83 ]   0: 04-11 17:08  SurveyDataPairCount INFO     correlating A x B = 111188 x 111188 objects (median) per rank
[ 000001.83 ]   0: 04-11 17:08  SurveyDataPairCount INFO     min A load per rank = 111188
[ 000001.83 ]   0: 04-11 17:08  SurveyDataPairCount INFO     max A load per rank = 111188
[ 000001.83 ]   0: 04-11 17:08  SurveyDataPairCount INFO     (even distribution would result in 111188 x 111188)
[ 000001.84 ]   0: 04-11 17:08  MPICorrfuncCallable INFO     calling function 'Corrfunc.mocks.DDsmu_mocks.DDsmu_mocks'
[ 000003.07 ]   0: 04-11 17:08  SurveyDataPairCount INFO     using cpu grid decomposition: (1, 1, 1)
[ 000004.31 ]   0: 04-11 17:08  SurveyDataPairCount INFO     position variable range on rank 0 (max, min) = [1947.06468152 1821.4905677  1892.19278023], [-1947.15302515 -1909.30545604  -663.65326608]
[ 000004.66 ]   0: 04-11 17:08  SurveyDataPairCount INFO     correlating 724828 x 724828 objects in total
[ 000004.67 ]   0: 04-11 17:08  SurveyDataPairCount INFO     correlating A x B = 724828 x 724828 objects (median) per rank
[ 000004.67 ]   0: 04-11 17:08  SurveyDataPairCount INFO     min A load per rank = 724828
[ 000004.67 ]   0: 04-11 17:08  SurveyDataPairCount INFO     max A load per rank = 724828
[ 000004.67 ]   0: 04-11 17:08  SurveyDataPairCount INFO     (even distribution would result in 724828 x 724828)
[ 000004.69 ]   0: 04-11 17:08  MPICorrfuncCallable INFO     calling function 'Corrfunc.mocks.DDsmu_mocks.DDsmu_mocks'
[ 000071.19 ]   0: 04-11 17:09  SurveyDataPairCount INFO     using cpu grid decomposition: (1, 1, 1)
[ 000072.87 ]   0: 04-11 17:09  SurveyDataPairCount INFO     position variable range on rank 0 (max, min) = [1947.06468152 1821.4905677  1892.19278023], [-1947.15302515 -1909.30545604  -663.65326608]
[ 000073.07 ]   0: 04-11 17:09  SurveyDataPairCount INFO     correlating 111188 x 724828 objects in total
[ 000073.07 ]   0: 04-11 17:09  SurveyDataPairCount INFO     correlating A x B = 111188 x 724828 objects (median) per rank
[ 000073.07 ]   0: 04-11 17:09  SurveyDataPairCount INFO     min A load per rank = 111188
[ 000073.07 ]   0: 04-11 17:09  SurveyDataPairCount INFO     max A load per rank = 111188
[ 000073.07 ]   0: 04-11 17:09  SurveyDataPairCount INFO     (even distribution would result in 111188 x 724828)
[ 000073.09 ]   0: 04-11 17:09  MPICorrfuncCallable INFO     calling function 'Corrfunc.mocks.DDsmu_mocks.DDsmu_mocks'
--Return--
> /global/homes/y/yfeng1/tmp/two_point.py(83)<module>()->None

martinjameswhite · 2018-04-12T00:16:19Z

Yu, Glad to hear it's "fixed". I'm happy with however you choose to deploy this fix, as long as I get the fixed code so I can run it ;). Martin.

…

On 4/11/18 5:12 PM, Yu Feng wrote: I tested the fix lets your script run though. The question is how to deliver it before we tag a new release of nbodykit. I can certainly merge this as deploy the master branch of nbodykit to bccpdesi, but I would suggest a decentralized way -- if each user have their own conda environment then there is no single point of failure. (e.g. I did the testing in my own environment without touching bccpdesi, which is read-only to my nersc account as well). — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#471 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE-_B2D6mwF86LZj60ulj7mR_UeBgX3Mks5tnpvZgaJpZM4TQfn2>.

rainwoodman · 2018-04-12T00:26:35Z

It is the first time I seriously read this code written by Nick -- optimally I'd like another person taking a look at the technical before merging this. If nothing happens by tomorrow I'll merge it regardless.

michaelJwilson · 2018-04-12T00:53:32Z

Hey Yu, I'm happy to take a look, but you might need to give me a bit more time to get acquainted and to parse your comments. My taxes seem to be done, barring any surprises! "The domain for survey data is non-periodic." Generally true, but can't we consider padding periodic? "I can certainly merge this as deploy the master branch of nbodykit to

bccpdesi, but I would suggest a decentralized way -- if each user have their own conda environment then there is no single point of failure".

I believe you're suggesting you could update the master of bccpdesi/nbodykit but not nbodykit itself?

"if each user have their own conda environment then there is no single point of failure".

You're going to have to explain this more. Sorry for the stupid questions. Thanks! Mike

…

On Wed, Apr 11, 2018 at 5:26 PM, Yu Feng ***@***.***> wrote: It is the first time I seriously read this code written by Nick -- optimally I'd like another person taking a look at the technical before merging this. If nothing happens by tomorrow I'll merge it regardless. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#471 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFwSklWShnDdbvDMwtwecH7vhtYG2nh0ks5tnp88gaJpZM4TQfn2> .

-- Dr. Michael Wilson Cosmology Postdoctoral Fellow http://www.roe.ac.uk/~mjw/ Lawrence Berkeley National Lab Building 50A: Mailstop 50A5104 1 Cyclotron Rd, Berkeley CA 94720, USA Tel: (+01) 510-502-3448 (+44) 7742 373026

nickhand · 2018-04-12T01:12:25Z

@rainwoodman happy to take a look at this tomorrow if you describe a bit more what the issue is. Related to position for survey data going from -L/2 to L/2?

Hopefully the code wasn’t too unreadable ;)

rainwoodman · 2018-04-12T17:25:09Z

@nickhand

Survey data was decomposed assuming a periodic [0, boxsize] box. But it actually is neither periodic nor positive definite (DEC < 0 would give negative z). As a result, some points are wrapped to wrong processors.

As an example, running on a single rank, some points near 0 are wrapped to an out-of-bound rank (rank 1), and produces an exception pmesh.

rainwoodman · 2018-04-12T17:34:30Z

@michaelJwilson :

In the current practice, we usually only installed released version of softwares. For this PR to go into a bccpdesi, three steps must be done:

Merging the PR;
Tagging a release;
Installing into bccpdesi;

Tagging a release is a somewhat formal process (changelog, etc), and since I've just tagged a release this week, I don't feel comfortable tagging another in just two days for a single bug fix.

However, bccpdesi relies on SurveyData 2 point functions.

A way to ease this tension is to deploy a non-released version (head of master branch after this PR is merged) to bccpdesi.

Single point of failure:

Imagine the case where if maintainer (currently me) of the bccpdesi environment, is suddenly ran over by a bus or abduct by aliens; no one will be deploying these updates. However if each user maintain their own environment, then there is no such concern -- de-centralized. Of course the precondition is that the environment maintenance is sufficiently straight-forward, which may not be the case to some.

rainwoodman · 2018-04-14T03:34:07Z

@martinjameswhite I merged this and deployed it to cori's bccpdesi env.

got it to the point

(bccpdesi) yfeng1@cori06:/global/cscratch1/sd/mwhite/DESI/DarkSky> python two_point.py elg 0.8 0.801
/usr/common/contrib/bccp/anaconda3/envs/bccpdesi/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Data has size:  111188
Rand has size:  72482788
[ 000004.91 ]   0: 04-13 20:32  SurveyDataPairCount INFO     using cpu grid decomposition: (1, 1, 1)
[ 000005.16 ]   0: 04-13 20:32  SurveyDataPairCount INFO     position variable range on rank 0 (max, min) = [1946.62996486 1820.80787017 1891.68533076], [-1947.1250374  -1907.56946594  -657.35840959]
[ 000005.21 ]   0: 04-13 20:32  SurveyDataPairCount INFO     correlating 111188 x 111188 objects in total
[ 000005.21 ]   0: 04-13 20:32  SurveyDataPairCount INFO     correlating A x B = 111188 x 111188 objects (median) per rank
[ 000005.21 ]   0: 04-13 20:32  SurveyDataPairCount INFO     min A load per rank = 111188
[ 000005.21 ]   0: 04-13 20:32  SurveyDataPairCount INFO     max A load per rank = 111188
[ 000005.21 ]   0: 04-13 20:32  SurveyDataPairCount INFO     (even distribution would result in 111188 x 111188)
[ 000005.22 ]   0: 04-13 20:32  MPICorrfuncCallable INFO     calling function 'Corrfunc.mocks.DDsmu_mocks.DDsmu_mocks'
[ 000006.38 ]   0: 04-13 20:32  SurveyDataPairCount INFO     using cpu grid decomposition: (1, 1, 1)
[ 000080.05 ]   0: 04-13 20:33  SurveyDataPairCount INFO     position variable range on rank 0 (max, min) = [1947.20344211 1823.55088511 1893.06785805], [-1947.20548096 -1909.82602368  -664.97224893]

paircounting 72 million points with a single rank is probably going to take a very long time.. The script needs to be MPI-lized.

michaelJwilson · 2018-04-14T03:48:40Z

taxes submitted! PR is Public Release? When you say only installed, you mean the conda channel update to NERSC? Where's the best place to look for a description of the tagging process? I guess I'm confused why the maintenance of the bccpdesi environment isn't more automatic than that. As a start, is there a breakdown description of the current packages in the environment - what they do/why they're loaded. Thanks for this! Mike

…

On Thu, Apr 12, 2018 at 10:34 AM, Yu Feng ***@***.***> wrote: @michaelJwilson <https://github.com/michaelJwilson> : In the current practice, we usually only installed released version of softwares. For this PR to go into a bccpdesi, three steps must be done: 1. Merging the PR; 2. Tagging a release; 3. Installing into bccpdesi; Tagging a release is a somewhat formal process (changelog, etc), and since I've just tagged a release this week, I don't feel comfortable tagging another in just two days for a single bug fix. However, bccpdesi relies on SurveyData 2 point functions. A way to ease this tension is to deploy a non-released version (head of master branch after this PR is merged) to bccpdesi. Single point of failure: Imagine the case where if maintainer (currently me) of the bccpdesi environment, is suddenly ran over by a bus or abduct by aliens; no one will be deploying these updates. However if each user maintain their own environment, then there is no such concern -- de-centralized. Of course the precondition is that the environment maintenance is sufficiently straight-forward, which may not be the case to some. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#471 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFwSkt0dNm_5YZaQ3e2_Wt-8FPlR1B4aks5tn5AngaJpZM4TQfn2> .

-- Dr. Michael Wilson Cosmology Postdoctoral Fellow http://www.roe.ac.uk/~mjw/ Lawrence Berkeley National Lab Building 50A: Mailstop 50A5104 1 Cyclotron Rd, Berkeley CA 94720, USA Tel: (+01) 510-502-3448 (+44) 7742 373026

michaelJwilson · 2018-04-14T03:52:47Z

Is there a reason it's not also deployed to edison? CorrFunc has no MPI capability? M

…

On Fri, Apr 13, 2018 at 8:34 PM, Yu Feng ***@***.***> wrote: @martinjameswhite <https://github.com/martinjameswhite> I merged this and deployed it to cori's bccpdesi env. got it to the point (bccpdesi) ***@***.***:/global/cscratch1/sd/mwhite/DESI/DarkSky> python two_point.py elg 0.8 0.801 /usr/common/contrib/bccp/anaconda3/envs/bccpdesi/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters Data has size: 111188 Rand has size: 72482788 [ 000004.91 ] 0: 04-13 20:32 SurveyDataPairCount INFO using cpu grid decomposition: (1, 1, 1) [ 000005.16 ] 0: 04-13 20:32 SurveyDataPairCount INFO position variable range on rank 0 (max, min) = [1946.62996486 1820.80787017 1891.68533076], [-1947.1250374 -1907.56946594 -657.35840959] [ 000005.21 ] 0: 04-13 20:32 SurveyDataPairCount INFO correlating 111188 x 111188 objects in total [ 000005.21 ] 0: 04-13 20:32 SurveyDataPairCount INFO correlating A x B = 111188 x 111188 objects (median) per rank [ 000005.21 ] 0: 04-13 20:32 SurveyDataPairCount INFO min A load per rank = 111188 [ 000005.21 ] 0: 04-13 20:32 SurveyDataPairCount INFO max A load per rank = 111188 [ 000005.21 ] 0: 04-13 20:32 SurveyDataPairCount INFO (even distribution would result in 111188 x 111188) [ 000005.22 ] 0: 04-13 20:32 MPICorrfuncCallable INFO calling function 'Corrfunc.mocks.DDsmu_mocks.DDsmu_mocks' [ 000006.38 ] 0: 04-13 20:32 SurveyDataPairCount INFO using cpu grid decomposition: (1, 1, 1) [ 000080.05 ] 0: 04-13 20:33 SurveyDataPairCount INFO position variable range on rank 0 (max, min) = [1947.20344211 1823.55088511 1893.06785805], [-1947.20548096 -1909.82602368 -664.97224893] paircounting 72 million points with a single rank is probably going to take a very long time.. The script needs to be MPI-lized. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#471 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFwSkugdlF1s5oJuWKwJhgDgUhWUePUIks5toW4ygaJpZM4TQfn2> .

-- Dr. Michael Wilson Cosmology Postdoctoral Fellow http://www.roe.ac.uk/~mjw/ Lawrence Berkeley National Lab Building 50A: Mailstop 50A5104 1 Cyclotron Rd, Berkeley CA 94720, USA Tel: (+01) 510-502-3448 (+44) 7742 373026

rainwoodman · 2018-04-16T16:03:27Z

@michaelJwilson PR = pull request. In our simplified process tag = release.

I can't find the place the releasing process is written. For most packages we do, it is: editing version.py to the release, git commit, git tag, git push, then edit version.py to the future 'dev' version, and do it all over again. The rest (PyPI, CI=continuous integration) are done automatically. It means each step you'll have to wait till the green light is on and immediate manual intervention is necessary if things turn sour.

For nbodykit there is an additional step of updating changelog to cross reference document all recent changes -- bug fixes, new features, etc.

Decompose to the correct boxsize for surveydata

f916977

Survey data does not start from [0, 0, 0]. Thus we need to set the left edge and right edge properly. This is one of the cause of a crash in @martinjameswhite's script.

rainwoodman requested a review from martinjameswhite April 11, 2018 18:06

rainwoodman requested a review from michaelJwilson April 11, 2018 21:06

rainwoodman added 2 commits April 11, 2018 15:14

The domain for survey data is non-periodic.

d331c1e

cover tests on single rank.

57a0a49

rainwoodman requested a review from eelregit April 12, 2018 00:21

rainwoodman merged commit af37af8 into bccp:master Apr 14, 2018

rainwoodman deleted the surveydatabounds branch April 14, 2018 03:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decompose to the correct boxsize for surveydata #471

Decompose to the correct boxsize for surveydata #471

rainwoodman commented Apr 11, 2018

martinjameswhite commented Apr 11, 2018

rainwoodman commented Apr 11, 2018

rainwoodman commented Apr 12, 2018

rainwoodman commented Apr 12, 2018

martinjameswhite commented Apr 12, 2018 via email

rainwoodman commented Apr 12, 2018

michaelJwilson commented Apr 12, 2018 via email

nickhand commented Apr 12, 2018

rainwoodman commented Apr 12, 2018

rainwoodman commented Apr 12, 2018

rainwoodman commented Apr 14, 2018

michaelJwilson commented Apr 14, 2018 via email

michaelJwilson commented Apr 14, 2018 via email

rainwoodman commented Apr 16, 2018

Decompose to the correct boxsize for surveydata #471

Decompose to the correct boxsize for surveydata #471

Conversation

rainwoodman commented Apr 11, 2018

martinjameswhite commented Apr 11, 2018

rainwoodman commented Apr 11, 2018

rainwoodman commented Apr 12, 2018

rainwoodman commented Apr 12, 2018

martinjameswhite commented Apr 12, 2018 via email

rainwoodman commented Apr 12, 2018

michaelJwilson commented Apr 12, 2018 via email

nickhand commented Apr 12, 2018

rainwoodman commented Apr 12, 2018

rainwoodman commented Apr 12, 2018

rainwoodman commented Apr 14, 2018

michaelJwilson commented Apr 14, 2018 via email

michaelJwilson commented Apr 14, 2018 via email

rainwoodman commented Apr 16, 2018