Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BEAST runs in python 2 and 3 #113

Merged
merged 31 commits into from
Aug 30, 2017
Merged

BEAST runs in python 2 and 3 #113

merged 31 commits into from
Aug 30, 2017

Conversation

karllark
Copy link
Member

@karllark karllark commented Jul 19, 2017

The BEAST has been updated to run in python 3 in addition to python 2. This required changes to many of the files. In addition, the ezpipe "external" code has been removed as it was causing significant issues. The run_beast.py file now has direct calls to the steps needed to create the physicsmodel grid. Finally, the ezunits package was removed for similar reasons and units support moved to the astropy units package.

The update was done by starting with the 2to3 tool. Then the code was tested with both python 2 and 3 simultaneously correcting errors as they arose to allow the code to run in both python versions.

This was mainly done during the M31 Lorentz meeting in Jul 2017, especially on the hackday. And then finished up a couple of weeks later.

This pull request should close issues #8, #11, and #79.

@karllark
Copy link
Member Author

@mfouesneau : Travis CI error on ezunits. Can you have a look and tell me if this can be fixed easily?
https://travis-ci.org/BEAST-Fitting/beast/jobs/253864276
If that is not easy, other possible solutions are update to the latest pint code or change to astropy units.

@mfouesneau
Copy link
Member

We can switch to astropy for sure. That might be easy, if we parse a few API calls.

the travis errors are on eztables not ezunits if I read correctly. eztables should not be used, is it?

@karllark
Copy link
Member Author

The end of the travis check link in the previous comment seems to be about ezunits. Or am I miss reading the errors?

Excerpt:

/Users/travis/miniconda/envs/test/lib/python3.6/site-packages/astropy/tests/pytest_plugins.py:188: in collect
for test in finder.find(module):
/Users/travis/miniconda/envs/test/lib/python3.6/site-packages/astropy/tests/pytest_plugins.py:400: in find
extraglobs)
/Users/travis/miniconda/envs/test/lib/python3.6/doctest.py:933: in find
self._find(tests, obj, name, module, source_lines, globs, {})
/Users/travis/miniconda/envs/test/lib/python3.6/doctest.py:992: in _find
if ((inspect.isroutine(inspect.unwrap(val))
/Users/travis/miniconda/envs/test/lib/python3.6/inspect.py:482: in unwrap
while _is_wrapper(func):
/Users/travis/miniconda/envs/test/lib/python3.6/inspect.py:476: in _is_wrapper
return hasattr(f, 'wrapped')
beast/external/ezunits/pint.py:470: in getattr
return self.Quantity(1, item)
beast/external/ezunits/pint.py:712: in new
inst = cls._REGISTRY._parse_expression(units)
beast/external/ezunits/pint.py:670: in _parse_expression
raise UndefinedUnitError(unknown)
E beast.external.ezunits.pint.UndefinedUnitError: 'wrapped' is not defined in the unit registry

@karllark
Copy link
Member Author

It it is ezunits, one solution is to move to astropy units. Similar solution to eztables problems (move to astropy tables).
Probably doing both would be good really. Removes code we have to maintain.

@mfouesneau
Copy link
Member

The simplest interface is to use simpletable already somewhere with the isochrones web interfaces. (also with ezdata that you downloaded the other day).
This simpletable package includes the conversion codes to astropy or pandas & co and therefore will allow minimum intrusion. It is compatible with eztable formats of header etc.

Removed distance info from observations.py as not used anymore (and potentially confusing).
@@ -289,6 +287,5 @@ def get_obscat(obsfile=obsfile, distanceModulus=distanceModulus,
obs: GenFluxCatalog
observation catalog
"""
obs = GenFluxCatalog(obsfile, distanceModulus=distanceModulus,
filters=filters)
obs = GenFluxCatalog(obsfile, filters=filters)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

distanceModulus has gone because it's not used or what?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer used in the observed catalog. This was left over when we were putting the observed data at 10 pc (absolute mags?). Now we put the model at the distance of the galaxy as this is needed for a number of reasons. Figured it was best to take this out as it means less code to maintain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair.
If we decide not to use the distanceModulus in observed catalog, I think you need to delete all methods in the "Observations" class that use 'distanceModulus'. So users avoid any possible confusions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I've done this and will commit the code.

Copy link
Member

@lcjohnso lcjohnso Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait -- but instead of placing observed data at 10 pc, the current convention was to place both the observations and models at the distance of the galaxy. We need to keep distanceModulus in order to convert observations from observed fluxes to intrinsic fluxes. See my other comment on observationmodel/observations.py.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, we never covert the observed fluxes to intrinsic fluxes when doing the fitting. Are you arguing for keeping this for some other purpose? Asking to make sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my misunderstanding -- your thorough explanation below was very helpful. Removing distanceModulus here is OK, as I see no need/use for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. Very useful conversation to have (and have archived!).

@@ -36,8 +37,8 @@ def hdf5diff(fname1, fname2):
hdfb = h5py.File(fname2, 'r')

hd = hdf5diff_results()
for sname in hdfa.keys():
if sname not in hdfb.keys():
for sname in list(hdfa.keys()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list() is not needed here. a for-loop does the job, unless h5py sucks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. h5py doesn't seem to suck. :-)

for cname, cvalue in hdfa[sname].items():
if cname not in hdfb[sname].keys():
for cname, cvalue in list(hdfa[sname].items()):
if cname not in list(hdfb[sname].keys()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list is needed indeed here, but you should only make it once before the for-loop so that you do not instanciate it many times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -262,7 +264,7 @@ def Q_all_memory(prev_result, obs, sedgrid, ast, qnames_in, p=[16., 50., 84.],

# get the names of all the children in the ast structure
ast_children = []
for label, node in ast.root._v_children.items():
for label, node in list(ast.root._v_children.items()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we mixing h5py and pytables? if so we should only use one.
(list is not needed)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I've been trying to move to h5py. But this move is not complete as you might imagine. Part of the move to astropy tables.

@@ -123,7 +124,7 @@ def __init__(self, shape, h = 1.0, domain = None, norm = None):
norm = 1.0
self._normconst = norm
self.domain = domain
if callable(shape):
if isinstance(shape, collections.Callable):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution, I've never used that object but it refers to an ABC class that has multiple methods.
I do not know if any object with a call() method enters that category.
The callable function was simply a test on the call method

class collections.Callable
ABCs for classes that provide respectively the methods contains(), hash(), len(), and call().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to answer this question. This change was done by the 2to3 program.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, where do we use the kernels.py file in the fitting process? Am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use it. This was part of previous ways of interpolation - at least that is my memory of this. We could remove this, but should do this with a different issue/pull request. Removing old code we are not using is on the to do list and there is an issue for this.


import numpy as np
import tables

import toothpick
from . import toothpick
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have a separe toothpick if in "generic_noisemodel" we make more toothpick functions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea for the generic_noisemodel was to provide a noisemodel that could be used by most projects. No need for the same code to be in every project directory.
Keeping the toothpick model details separate allows for someone to make a custom noisemodel if they want.

@@ -331,9 +335,9 @@ def interpolate(self, sedgrid, progress=True):
compl = np.empty((N, M), dtype=float)

if progress is True:
it = Pbar(desc='Evaluating model').iterover(range(M))
it = Pbar(desc='Evaluating model').iterover(list(range(M)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pbar can take the number of iterations (here M) as parameter so you don't have to instanciate the list. (useful when it gets big)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Thanks.

@@ -90,30 +83,11 @@ def __getitem__(self, *args, **kwargs):

def keys(self):
""" Returns dataset content names """
return self.data.keys()
return list(self.data.keys())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should respect the way python works and therefore returns the keys not the list of them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. list() removed.

@@ -173,8 +174,10 @@ def gen_spectral_grid_from_stellib(osl, oiso, ages=(1e7,), masses=(3,),

# some constants
kdata = 0
rsun = ezunits.unit['Rsun'].to('m').magnitude # 6.955e8 m

rsun = 6.955e8 # in meters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the astropy constant as imported above. Makes more sense and propagates units.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I did not do this yet as I wanted the regression testing to not key off of small changes in the solar radius. Will make an issue so that we make the change to astropy constants for the whole BEAST including here at some point.

…d). Also removed the hard-coding of the survey name from the IAU name creation code.
@@ -117,7 +174,6 @@

# read in the observed data
obsdata = datamodel.get_obscat(datamodel.obsfile,
datamodel.distanceModulus,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the same correction in line 129.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@galaxyumi galaxyumi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Karl for making this many changes. I don't see major issues in updating python 2 to 3. Tests are needed to make sure that all these changes work well and reproduce what we got with the python 2 version BEAST.

@coveralls
Copy link

Coverage Status

Coverage increased (+2.8%) to 12.27% when pulling 16b7e48 on py3_compat into 0c58f49 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+2.8%) to 12.27% when pulling 2f23cf8 on py3_compat into 0c58f49 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+2.8%) to 12.27% when pulling 29972ac on py3_compat into 0c58f49 on master.

Copy link
Member

@lcjohnso lcjohnso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One error caught in ezmist's simpletable.py (need to mirror edits made to ezpadova's version), and concerns about dropping distanceModulus.

@@ -125,35 +99,33 @@ def setFilters(self, filters):

def getMags(self, num, filters):
raise Exception('Do not use as magnitudes')
return np.array([ self.data[tt][num] - self.distanceModulus for tt in filters])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of distanceModulus seems like it will affect the comparison of obs and model fluxes -- are their any sanity checks that all code is abiding by new convention? i.e. -- when trimming the grid I get an error stating: "no models are brighter than the minimum ASTs run", which could be caused by mismatch in conventions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further thought, I think the removal of distanceModulus needs to be reverted. Comparison between models and data is done in intrinsic fluxes (at distance of the galaxy), and convention was to convert flux observations from measured to intrinsic to match the unscaled model fluxes. This conversion from measured to intrinsic still needs to occur to match the flux scale of the models!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have not been using the distancemodulus in the observation model for sometime. It was originally used when we took the observed data and put it into absolute units (magnitudes I believe). Then the models were all computed in absolute units (distance of 10 pc). And we were comparing the models to the observations in magnutide units - which was incorrect!

This is not the correct way to do the comparison given to get the correct noisemodel, we have to put the models at the distance of the galaxy we are observing and compute the uncertainties on the model fluxes using the ASTs. Thus, we removed the use of the distancemodulus in the observationmodel. But did not remove passing it to the routines that read in the observed data.

We still use the distance to the galaxy. It is used to calculate the model fluxes at the distance of the galaxy. And then we directly compare the observed fluxes to the model fluxes and uncertainties.

So in summary, removing the distancemodulus from the observationmodel code is just remove old code no longer used. And make things consistent and avoid going back to the old and incorrect way of comparing data to models. Ok?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this comment was not too long-winded. Just trying to clearly state the issue. Not sure I did. Having some allergy fuzziness issues today.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way of saying this is that the distance should be used in the physicsmodel code, but not in the observationsmodel code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given @lcjohnso's comment and error message, it's important to check that properly. @lcjohnso can you give more to @karllark to chew?

I agree with @karllark, Beast makes the models at the distance of the population, not the way around (in principle). The only place where the distance modulus in Observations class is useful is to generate mock data, but that could be left to the tester.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the further explanation. If I'm understanding correctly, are you saying that the getMags and getFlux functions defined here are never used? If so, then I'd advocate for removing them completely to avoid confusion rather than change their utility/purpose (from distance-corrected to non-distance-corrected).

Can you confirm that these functions are not used elsewhere? Otherwise the abrupt change of removing the distance correction continues to worry me - shouldn't this removal require subsequent edits elsewhere to account for the different treatment of distance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getmags is only used in observations.py itself.
The getflux is used in other places, but the key is that the stock getflux function is redefined in datamodel.py. And we have been redefining it this way for a long time. This actually illustrates how having the distance in the stock versions in observations.py is dangerous. The "experts" know not to use it, but newer users would not necessarily know this. Removing it in the stock functions means that the experts and soon-to-be experts will both use the right functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lcjohnso in your first comment, you had an error. Was this with the phat_small example? I've just run this example and am not getting that error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks -- I agree that your edits make good sense. Sorry for the delay -- I didn't realize that getFlux() was being redefined in datamodel.py, so the non-distanceModulus version of the transformation was already in effect.

The error I was encountering was due to an unrelated issue. I was testing the code on SMIDGE data where I'm only fitting a subset of the bands, and the n_detected keyword in trim_grid.trim_models() needed to be set to a non-default value to run correctly.

@@ -71,13 +72,13 @@
if PY3:
iteritems = operator.methodcaller('items')
itervalues = operator.methodcaller('values')
basestring = (str, bytes)
str = (str, bytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str -> strtype as in simpletable.py in ezpadova throughout this file, or one could simply duplicate ezpadova's simpletable.py here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this comment. What particular change are you suggesting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the variable currently named str, which is being set equal to (str, bytes), should be renamed strtype throughout the file. Not sure one can make a variable named str since it is a built-in type. This substitution would make ezmist/simpletable.py consistent with the edits made to ezpadova/simpletable.py (in fact, the files should be identical in the end).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Cliff here, you should not redefine the str type variable, esp. by a tuple containing itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. In fact, having two copies of the same file is not good either. So I moved the simpletable.py from ezpadova to the parent directory and have changed both padova.py and mist.py to import from this single file. Easier to maintain if we have just one version of this file.

@coveralls
Copy link

Coverage Status

Coverage increased (+2.7%) to 12.164% when pulling 0d02c55 on py3_compat into 0c58f49 on master.

Copy link
Member

@lcjohnso lcjohnso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your thorough explanation regarding my questions. The code looks good to go!

@karllark karllark merged commit 431625f into master Aug 30, 2017
@lcjohnso lcjohnso mentioned this pull request Oct 17, 2017
@karllark karllark deleted the py3_compat branch October 17, 2017 20:27
galaxyumi pushed a commit to galaxyumi/beast that referenced this pull request Jun 7, 2020
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants