Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add regression tests replicating current phat_small example run #140

Merged
merged 26 commits into from
Jan 18, 2018

Conversation

karllark
Copy link
Member

@karllark karllark commented Nov 23, 2017

The start of getting regression tests into the travis ci tests. Addresses #97.
The full phat_small model is run in stages and the output of each stage compared to cached versions of the files.
A number of changes needed to get the tests to run in the automated system where the code is run after being installed (not just cloned). Hence this work has also advanced the efforts to get the BEAST installable and working.
Changes include:

  • adding in the ability to pass the location of various library files directly as the standard location in the source code directory does not work for the installed BEAST
  • removed the keep column: addresses Use of SED grid 'keep' column? #73
  • removed the C code from fit_metrics as there was something keeping the test code from being able to import anything from that directory. This is likely related to something in the install process. I could not fix it so simplified the fit_metrics directory to only have the needed python code. All the old fit_metrics code is still around, it just move to the beast/old directory in a subdirectory: addresses Remove C code in fit_metrics? #153.
  • updated the travis setup
  • discovered that eztables is not used in the current phat_small BEAST runs via the test coverage report. Will investigate removing this code/package in a separate pull request.

In the process, found that there are small differences between the linux and mac versions of the BEAST computations. Not clear why this is the case. Spent time trying to debug before, during, and after the Nov 2017 BEAST HackDay. Looks like it may be due to a small difference in the numerical computation between machine types.

@coveralls
Copy link

Coverage Status

Coverage increased (+6.6%) to 18.739% when pulling 51994e5 on karllark:ci_regression into 03207d2 on BEAST-Fitting:master.

Copy link
Member

@mfouesneau mfouesneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems ok to me.

Imports on the tests should be back to relative imports

I do not know travis and astropy packaging to tell if anything is wrong with these parts.

For the stellib interpolation, I have a faster and simplified version
http://mfouesneau.github.io/docs/pystellibs/
We could think about adapting it into beast. It also has more libraries and handles different interpolators (mostly need to adapt the units, that are not astropy)

@@ -146,9 +146,9 @@
# MISTWeb() -- `rotation` param (choices: vvcrit0.0=default, vvcrit0.4)
#
# Default: PARSEC+CALIBRI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's COLIBRI (like te bird)

# Alternative: PARSEC1.2S -- old grid parameters
oiso = isochrone.PadovaWeb(modeltype='parsec12s', filterPMS=True)
#oiso = isochrone.PadovaWeb(modeltype='parsec12s', filterPMS=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the default works properly with BEAST. There are changes in the outputs that need some mapping to work (defining aliases and some testing)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand your comment. The line 151 does work in the BEAST in that it produces reasonable output. Can you be more specific?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parsec12s and the latest version parsec12sr16 do not have the same output format and column names. This may generate bugs downstream.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have not seen any bugs or issues between the two versions. I assume that this is handled in the parsec.py code. @lcjohnso any comments on this topic?

@@ -566,6 +570,9 @@ def add_spectral_properties(specgrid, filternames=None, filters=None,
naming format to adopt for filternames and filters
default value is '{0:s}_0' where the value will be the filter name

filterLib: str
full filename to the filter library hd5 file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a filename or filtername?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the full filename of the filter library file. This is needed for the travis testing as the library files are not in the source directory because the testing is done on an installed version of the beast. This allows the testing code to directly pass the location of the filter library file instead of it being in the default location.

This raises the issue of how to handle the library files for the installed version of the BEAST (should make this for those just using the beast). I've opened issue #141 for this.

@@ -4,7 +4,7 @@
import numpy as np
import pytest

from . import extinction
from beast.physicsmodel.dust import extinction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What absolute import? relative is better to use it without installing into the system.

from .. import extinction

from astropy.tests.helper import remote_data
from astropy.table import Table

from beast.physicsmodel.stars import isochrone
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before. Imports should be relative as much as possible.

@karllark
Copy link
Member Author

The Mac versus the linux issue is that the no dust photometry is slightly different between the Mac and Linux versions. The relevant error message from Mac run is below (as well as URL to the full output). As the file being tested to is from the linux run, this is showing that the calculation is 0.5% different between the Mac and Linux. I can dig deeper to see if this is some kind of edge case. I'm asking here to see if there is a "quick" fix or a known issue here. We use the scipy.integrate.trapz function for doing the integration. I should run the comparison of the two files outside of the testing system to compare the full file as the testing system stops when it finds the first issue. But I don't have access to a Mac, so need to get someone else to run the testing code on their Mac. Probably something to do on Monday during the BEAST@ST hack day.

https://travis-ci.org/BEAST-Fitting/beast/jobs/306428490

    # go through the file and check if it is exactly the same
    for sname in hdf_cache.keys():
        if isinstance(hdf_cache[sname], h5py.Dataset):
            cvalue = hdf_cache[sname]
            cvalue_new = hdf_new[sname]
            if cvalue.dtype.isbuiltin:
                np.testing.assert_equal(cvalue.value, cvalue_new.value,
                                        'testing %s'%(sname))
            else:
                for ckey in cvalue.dtype.fields.keys():
                    np.testing.assert_equal(cvalue.value[ckey],
                                            cvalue_new.value[ckey],
                                          'testing %s/%s'%(sname, ckey))

E AssertionError:
E Arrays are not equal
E testing grid/logHST_ACS_WFC_F475W_nd
E (mismatch 0.4805491990846633%)
E x: array([-17.706708, -17.705989, -17.70227 , ..., -17.403936, -17.196232,
E -30.209619])
E y: array([-17.706708, -17.705989, -17.70227 , ..., -17.403936, -17.196232,
E -30.209619])

Issue seems to be that the radius is *slightly* (1e-14) different for
a small number of models (<10 in test)
@karllark
Copy link
Member Author

Difference in radii column may be due to differences in the results from np.power or np.sqrt. The differences are small (1e-14) and are for very few elements (< 10). No clear reason why, not clearly correlated to the logT or logL values. Putting Mac test with remote-data in the allowed failures.

https://stackoverflow.com/questions/44765611/slightly-different-result-from-exp-function-on-mac-and-linux

@coveralls
Copy link

Coverage Status

Coverage increased (+6.6%) to 18.739% when pulling 2a67f7e on karllark:ci_regression into 03207d2 on BEAST-Fitting:master.

@karllark
Copy link
Member Author

karllark commented Nov 28, 2017

More info on the differences. The differences are not the same for different columns. This makes me think it might be due to how the numbers are being saved in the hdf5 file. Not clear how different columns could be different. For example, the radius differences should effect all the columns that depend on flux, but this does not seem to be the case.

[kgordon@grimy comptests]$ python test_two_spec.py


***** logHST_ACS_WFC_F475W_nd *****


min/max diff [%]: -2.12102363115e-14 2.00556172964e-14
values: [-17.9000348 -17.90809893 -19.45450982 -18.18661445 -18.64675207
-17.34072645 -18.63916794 -17.82567481 -21.21896864 -17.7143073
-17.24287029 -18.59103382 -18.69728022 -18.78303055 -19.13966618
-16.74999574 -17.75385644 -18.56444092 -17.59229913 -17.71455725]
diffs [%]: [ -1.98475239e-14 -1.98385864e-14 1.82616458e-14 1.95347721e-14
1.90527212e-14 -2.04876865e-14 -1.90604736e-14 -1.99303180e-14
-1.67431025e-14 2.00556173e-14 -2.06039576e-14 -1.91098231e-14
1.90012325e-14 -1.89144860e-14 -1.85620462e-14 -2.12102363e-14
-2.00109407e-14 1.91371973e-14 -2.01947094e-14 -2.00553343e-14]
logL: [ 2.552 2.767 0.829 1.892 1.36 3.248 1.677 2.981 -0.344 2.644
3.145 1.719 1.66 1.358 0.86 4.285 2.745 1.681 2.854 3.147]
logT: [ 4.1849 4.2822 3.696 3.7944 3.9684 3.6161 3.684 3.5817 3.5445
4.1496 3.6591 3.6822 3.6665 3.7593 3.8952 3.5548 3.6342 4.0934
3.6419 3.5732]


***** radius *****


min/max diff [%]: -5.89805831405e-14 1.83949685264e-14
values: [ 1.84757113 2.25944185 17.55905615 112.90187844 93.36866897
4.82837695 1.88235342 2.19887817 88.04428434]
diffs [%]: [ -4.80727591e-14 -3.93096384e-14 -2.02329422e-14 -2.51738145e-14
-1.52201535e-14 1.83949685e-14 -5.89805831e-14 -4.03923433e-14
-1.61405761e-14]
logL: [ 1.36 2.076 3.159 3.099 3.19 1.358 1.201 1.336 3.139]
logT: [ 3.9684 4.1037 3.9292 3.5101 3.5741 3.7593 3.9246 3.9246 3.5741]


***** logHST_ACS_WFC_F814W_nd *****


min/max diff [%]: -2.01701987515e-14 2.03811248134e-14
values: [-18.34342942 -17.6136771 -17.4313916 -18.49406467 -18.35498535
-20.31699145 -18.42723283 -19.3992071 -18.54534658 -21.66570469
-18.34487901]
diffs [%]: [ -1.93677725e-14 -2.01701988e-14 2.03811248e-14 1.92100208e-14
-1.93555789e-14 -1.74864162e-14 1.92796917e-14 1.83137056e-14
-1.91569010e-14 -1.63978681e-14 -1.93662421e-14]
logL: [ 3.142 4.172 4.434 2.606 2.012 0.073 2.705 1.299 2.588 -1.296
2.023]
logT: [ 4.2906 4.4183 4.4615 4.1512 3.6528 3.6171 4.1583 3.9783 4.1632
3.5858 3.7208]


***** logHST_WFC3_F336W_nd *****


min/max diff [%]: -2.09138723297e-14 2.0003631678e-14
values: [-17.7603434 -20.73491907 -16.98735472 -18.8233656 -19.24527613
-19.85497323 -19.13784628 -17.02266876 -18.19092006 -24.23210154
-19.26460376 -22.44189561 -18.53689665]
diffs [%]: [ 2.00036317e-14 1.71339645e-14 -2.09138723e-14 1.88739557e-14
-1.84601855e-14 1.78933189e-14 -1.85638113e-14 -2.08704859e-14
-1.95301484e-14 1.46611868e-14 1.84416649e-14 1.58307201e-14
-1.91656335e-14]
logL: [ 2.568 0.419 3.175 3.765 2.37 0.84 1.358 3.228 3.537 -2.72
1.702 -1.134 1.681]
logT: [ 4.2155 3.6588 4.0842 3.563 3.6259 3.7203 3.7593 3.9692 3.6233
3.4683 3.688 3.6283 4.0934]


***** logHST_WFC3_F275W_nd *****


min/max diff [%]: -2.14910554968e-14 1.94470571415e-14
values: [-16.531127 -17.58087307 -18.3233043 -24.33471439 -19.23846426
-17.13818814 -20.15395223 -25.15072835 -20.58416648 -18.26864421
-19.2465264 -20.32651345 -19.79023364 -20.97112735 -23.10661902
-19.63074666 -19.41112975 -25.09666515 -17.39911034 -19.17312964
-24.80866684 -19.4178416 -19.7039245 ]
diffs [%]: [ -2.14910555e-14 -2.02078342e-14 1.93890448e-14 -1.45993646e-14
-1.84667218e-14 -2.07298091e-14 1.76278759e-14 1.41256890e-14
-1.72594488e-14 1.94470571e-14 -1.84589863e-14 -1.74782246e-14
-1.79518531e-14 -1.69409761e-14 -1.53753073e-14 -1.80977002e-14
1.83024570e-14 -1.41561186e-14 -2.04189387e-14 1.85296493e-14
-1.43204538e-14 -1.82961307e-14 -1.80304877e-14]
logL: [ 3.714 2.528 2.163 -1.134 1.067 2.894 3.411 -2.098 0.096 3.033
3.247 0.267 2.12 2.685 -0.037 1.358 1.507 -1.888 3.398 1.668
-1.695 1.668 2.114]
logT: [ 4.3962 4.2033 3.878 3.4931 3.9137 4.1257 3.5986 3.4823 3.7805
3.725 3.6403 3.7961 3.6653 3.5681 3.6098 3.7593 3.7694 3.5203
3.8309 3.7866 3.5628 3.7239 3.6691]


***** logHST_WFC3_F110W_nd *****


min/max diff [%]: -2.07851307728e-14 2.03829590486e-14
values: [-31.73744785 -19.54151472 -19.83521631 -18.33141659 -18.24172346
-21.23477622 -17.42982297 -18.33777493 -20.80867991 -19.33387339
-20.44828891 -17.09257314 -17.402233 ]
diffs [%]: [ -1.11940749e-14 1.81803393e-14 -1.79111416e-14 1.93804645e-14
-1.94757567e-14 -1.67306387e-14 2.03829590e-14 -1.93737446e-14
1.70732295e-14 -1.83755919e-14 1.73741368e-14 -2.07851308e-14
-2.04152747e-14]
logL: [-9.999 2.235 1.206 2.197 2.281 -0.766 3.126 2.229 -0.353 1.358
0.226 3.337 3.129]
logT: [ 4.191 4.2021 3.9394 3.5975 3.5908 3.5554 3.6329 3.6472 3.5491
3.7593 3.7476 3.5266 3.5752]

@karllark karllark closed this Nov 28, 2017
@karllark karllark reopened this Nov 28, 2017
@karllark karllark changed the title Isochrone download and spectral grid regression tests Add regression tests replicating current phat_small example run Nov 28, 2017
@mfouesneau
Copy link
Member

Smells that it's a compiler issue: mac os uses LLVM and clang while linux uses GCC or g++.
You have 1e-14 % variations?

@coveralls
Copy link

Coverage Status

Coverage increased (+6.6%) to 18.741% when pulling 2a67f7e on karllark:ci_regression into 03207d2 on BEAST-Fitting:master.

@karllark
Copy link
Member Author

Yes. 1e-14 variations. Very small. Would like to understand to make sure it is not an indication of a bigger issue.

@coveralls
Copy link

Coverage Status

Coverage increased (+7.6%) to 19.731% when pulling 7ecbb87 on karllark:ci_regression into 03207d2 on BEAST-Fitting:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+7.5%) to 19.673% when pulling de0f160 on karllark:ci_regression into 03207d2 on BEAST-Fitting:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+9.7%) to 21.792% when pulling 41c6e5f on karllark:ci_regression into 03207d2 on BEAST-Fitting:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+9.7%) to 21.784% when pulling ef57548 on karllark:ci_regression into 03207d2 on BEAST-Fitting:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+11.4%) to 23.524% when pulling 3296dfb on karllark:ci_regression into 03207d2 on BEAST-Fitting:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+12.0%) to 24.085% when pulling 06297fa on karllark:ci_regression into 03207d2 on BEAST-Fitting:master.

Included moving most of the code in fit_metrics to fit_metrics_old.
The fit_metrics code would not import in the built version of the beast
and I could not figure out why.  As a test, I created a simple version of
the fit_metrics code and only included what was needed.  This worked.
So, there is something complicated with the inclusion of the C and python
versions and the switching between them that I could not fix.  We have
only been using the python versions for a few years, and tests I did back
in the day found them to have the same speed.  Old code saved, so we can
always go back.
This means it will be ignored for the testing coverage
@coveralls
Copy link

Coverage Status

Coverage increased (+17.6%) to 29.692% when pulling 78c93b5 on karllark:ci_regression into 03207d2 on BEAST-Fitting:master.

@karllark
Copy link
Member Author

Ready for merging!

@karllark
Copy link
Member Author

Bump.

# Alternative: PARSEC1.2S -- old grid parameters
oiso = isochrone.PadovaWeb(modeltype='parsec12s', filterPMS=True)
#oiso = isochrone.PadovaWeb(modeltype='parsec12s', filterPMS=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again very careful, some parameters do not produce correct files on the parsec website when you use this model version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more specific?
I've run the BEAST with the parsec12s option and it works. Maybe the BEAST is not sensitive to the differences?

from astropy.tests.helper import remote_data
from astropy import units

from beast.observationmodel.observations import Observations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming beast is installed, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running tests (with 'python setup.py tests' or via the travis service) the beast is first installed, then the tests are run. So this is as it should be.

@karllark
Copy link
Member Author

One review is enough for merging.

@karllark karllark merged commit da0b573 into BEAST-Fitting:master Jan 18, 2018
@karllark karllark deleted the ci_regression branch January 22, 2018 22:56
galaxyumi pushed a commit to galaxyumi/beast that referenced this pull request Jun 7, 2020
Add regression tests replicating current phat_small example run
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants