New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add ignore_nan option to median_absolute_deviation #5232
Conversation
astropy/stats/funcs.py
Outdated
@@ -729,6 +729,11 @@ def median_absolute_deviation(a, axis=None): | |||
axis : int, optional | |||
Axis along which the MADs are computed. The default (`None`) is | |||
to compute the MAD of the flattened array. | |||
ignore_nan : bool | |||
Ignore NaN values (treat them as if they are not in the array) when | |||
computing the median. This will use `np.nanmedian`, which requires |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this causes sphinx failure, please change to numpy.nanmedian
GH shows all green, but this just an annoying bug to show all green after @keflavich - Could you please fix the failures? np.nanmedian was added in 1.10, so the test should be skipped for earlier versions. The sphinx failure is also a trivial one. |
astropy/stats/tests/test_funcs.py
Outdated
@@ -9,6 +9,8 @@ | |||
from numpy.testing import assert_equal | |||
from numpy.testing.utils import assert_allclose | |||
|
|||
NUMPY_LT_1P9 = [int(x) for x in np.__version__.split('.')[:2]] < [1, 9] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these are already available from astropy.utils.compat
@larrybradley , do you wish to have a look as well? |
The one failure looks like an issue in numpy 1.9: apparently, Would anyone like to confirm that interpretation of the error? I'll set the minimum np version to 1.10 as @bsipocz recommended earlier. |
@keflavich - try |
@astrofrog: agreed that there's a sorting error, but it seems to just put the nan at the end or beginning, not sure which:
EDIT: so, I think we just want to keep the version bumped to 1.10 for the test |
@keflavich - I think this needs to work for all Numpys we support - so just add a workaround function for Numpy < 1.10 which does the nanmedian manually |
astropy/stats/funcs.py
Outdated
@@ -761,6 +766,8 @@ def median_absolute_deviation(a, axis=None): | |||
# returns an masked array even if the result should be scalar). (#4658) | |||
if isinstance(a, np.ma.MaskedArray): | |||
func = np.ma.median | |||
elif ignore_nan: | |||
func = np.nanmedian |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this could do the trick:
try:
func = np.nanmedian
except AttributeError:
func = lambda x: np.median(x[~np.isnan(x)])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would fail if an axis
argument is given because boolean indexing returns a 1d array
The function already supports masked arrays so one could also mask the nans:
This would work for masked and unmasked arrays alike and work across different numpy versions. It's however a bit slower to use |
I suggest the same as @MSeifert04. If no invalid values are present we could ensure the output is not masked (with |
@MSeifert04 I've implemented your suggestion in f887005, which also led me to tackle another corner case (if @larrybradley do you want an extra check at the end that the return is not a masked array? |
@keflavich Currently, there is a bug in |
@larrybradley would you like me to add that same workaround to this PR, then? |
@keflavich Yes. For example, I would expect |
@keflavich But I thought it might be better always to use More general if we want to return scalars instead of returning masked-arrays wouldn't it be better to ship some compat/monkeypatch-function instead of adding the workaround to several functions? |
Yes, I should have suggested that before. For |
Yes, if the test isn't already there, I would suggest adding the test with this addition that |
astropy/stats/funcs.py
Outdated
@@ -716,7 +720,7 @@ def poisson_conf_interval(n, interval='root-n', sigma=1, background=0, | |||
return conf_interval | |||
|
|||
|
|||
def median_absolute_deviation(a, axis=None): | |||
def median_absolute_deviation(data, axis=None, ignore_nan=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're going to rename the parameter #5214 would be ideal here 😄
(sorry for double posting this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(done)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should have mentioned that the PR is not yet reviewed nor merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are going to use stuff from #5214, then the tag from this PR needs to be changed from "affect release" to "affect dev", right?
curious, |
@keflavich Let me know if you still want to update this and have it reviewed. Looks like some tests failed and now a conflict with some other code updates. |
This is ready for further review. I can't see any remaining problems with sphinx, these warnings all seem unassociated:
|
Ready for review again! |
astropy/stats/tests/test_funcs.py
Outdated
|
||
if not NUMPY_LT_1_10: | ||
assert np.isnan(funcs.mad_std([1,2,3,4,5,np.nan])) | ||
assert_allclose(funcs.mad_std([1,2,3,4,5,np.nan], ignore_nan=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick on those two lines: add blank space after commas :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the minor inline comments, I think the main point is what we discussed in #5835 (with @crawfordsm): it would be useful to extract the logic to choose which median to use in a astropy.stats.median
function, and it would simplify the code.
Also something should be done is a func
is provided and ignore_nan
is True. Just issue a warning or raise an exception ? This could also be stated more clearly in the docstring.
CHANGES.rst
Outdated
that will use ``np.ma.median`` with nans masked out or ``np.nanmedian`` | ||
instead of ``np.median`` when computing the median. [#5232] | ||
|
||
- ``astropy.sphinx`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Residual from rebase ?
astropy/stats/funcs.py
Outdated
func = np.ma.median | ||
if ignore_nan: | ||
data = np.ma.masked_where(np.isnan(data), data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use np.ma.masked_invalid
instead (it is already used below).
astropy/utils/compat/numpycompat.py
Outdated
@@ -19,3 +19,4 @@ | |||
NUMPY_LT_1_10_4 = not minversion('numpy', '1.10.4') | |||
NUMPY_LT_1_11 = not minversion('numpy', '1.11.0') | |||
NUMPY_LT_1_12 = not minversion('numpy', '1.12') | |||
NUMPY_GE_1_13 = minversion('numpy', '1.13dev') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not used, can be removed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keflavich - looks good except that I think now that we dropped numpy 1.7 and 1.8, this can be quite a bit simplified (since nanmedian
is in 1.9, contrary to the earlier comment).
But wow, what a hassle!
@@ -766,26 +778,59 @@ def median_absolute_deviation(a, axis=None, func=None): | |||
# See https://github.com/numpy/numpy/issues/7330 why using np.ma.median | |||
# for normal arrays should not be done (summary: np.ma.median always | |||
# returns an masked array even if the result should be scalar). (#4658) | |||
if isinstance(a, np.ma.MaskedArray): | |||
if isinstance(data, np.ma.MaskedArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If func
is explicitly given, this whole stanza should be ignored.
Also, maybe use np.ma.isMaskedArray
here?
astropy/stats/funcs.py
Outdated
data = np.ma.masked_where(np.isnan(data), data) | ||
elif ignore_nan: | ||
is_masked = False | ||
if NUMPY_LT_1_10: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused - help on np.nanmedian
states this has been present since numpy 1.9, and I just checked that this is indeed the case (contrary to what was stated in #5232 (comment)).. Since we just dropped support for 1.7 and 1.8, it would seem this means we can get rid of this stanza!
astropy/stats/funcs.py
Outdated
"unlikely to be what you expect.") | ||
|
||
data = np.asanyarray(data) | ||
data_median = func(data, axis=axis) | ||
|
||
# broadcast the median array before subtraction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.nanmedian
has a keepdims
argument... (it was only added in 1.10 for np.ma.median
). But since any user function can be passed in, it is probably best to keep this (maybe add a comment, though).
@@ -835,7 +885,8 @@ def mad_std(data, axis=None, func=None): | |||
""" | |||
|
|||
# NOTE: 1. / scipy.stats.norm.ppf(0.75) = 1.482602218505602 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unrelated to this PR, but it would be good to mention this factor in the docstring; if I understand correctly, it essentially is the exact number for the following:
1./np.median(np.abs(np.random.normal(size=100000)))
# 1.4832115241968997
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhvk I did describe this factor in the docstring:
http://docs.astropy.org/en/stable/api/astropy.stats.mad_std.html#astropy.stats.mad_std
The factor is the normal inverse cumulative distribution function evaluated at P=3/4. It's exactly 1. / scipy.stats.norm.ppf(0.75)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@larrybradley - sorry, don't know how I could have missed this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries!
data[5,5] = np.nan | ||
rslt = funcs.mad_std(data, ignore_nan=True) | ||
assert np.isscalar(rslt) | ||
with catch_warnings() as warns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are any warnings still given with ignore_nan=True
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should be but I am admittedly fuzzy-headed at the moment.
astropy/utils/compat/numpycompat.py
Outdated
@@ -8,8 +8,8 @@ | |||
from ...utils import minversion | |||
|
|||
|
|||
__all__ = ['NUMPY_LT_1_9_1', 'NUMPY_LT_1_10', 'NUMPY_LT_1_10_4', | |||
'NUMPY_LT_1_11', 'NUMPY_LT_1_12'] | |||
__all__ = ['NUMPY_LT_1_9_1', 'NUMPY_LT_1_10', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just leave the line as is? Then this file does not need to be touched at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably right; I think this was from a failed merge during rebasing.
astropy/stats/funcs.py
Outdated
from astropy.utils.compat import NUMPY_LT_1_10, NUMPY_LT_1_12 | ||
from astropy.utils.decorators import deprecated_renamed_argument | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should just be one blank line.
CHANGES.rst
Outdated
@@ -202,8 +202,6 @@ Bug Fixes | |||
``median_absolute_deviation``. And allow to use these functions with | |||
a multi-dimensional ``axis``. [#5835] | |||
|
|||
- ``astropy.sphinx`` | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keflavich You inadvertently deleted these lines instead of the ones below (line 551).
astropy/stats/funcs.py
Outdated
from warnings import warn | ||
|
||
from astropy.utils.compat import NUMPY_LT_1_10 | ||
from astropy.utils.decorators import deprecated_renamed_argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would write these as
from ..utils.compat import NUMPY_LT_1_10
from ..utils.decorators import deprecated_renamed_argument
and then remove the follow two blank lines.
I... should probably squash all this if we ever decide to merge it. I think the commit log is now longer than the changes? |
I'm happy to merge this after any additional comments. Thanks @keflavich for taking all the time to push this through. |
make mad_std nan-compatible mad_std too add changelog entry [ci skip] np->numpy in docstrings & check numpy version (using 1.9, because officially it was added to numpy then according to their docs...) use astropy.utils.compat (thanks @bsipocz) 1_9, not 1P9 bump to 1.10 to avoid np 1.9 weirdness attempt to work around older numpy versions using ma.median add tests for scalarness and use np.ma.median except when axis=None. Also, change 'a' to 'data'. 'a' is a bad variable name and led to a few mistakes add another test of madstd with nan & axes add a warning about NaNs in arrays for np<1.10. Remove a test for those versions. update docstrings & changelog. Add deprecation of old argument correct the code to match the comment change order of bool checking to do expensive one last Test appropriate numpy versions, check warnings, and use asked array comparison switch to catch_warnigns for numpy 1.13, treatment of NaNs in masked arrays changed add GE_13 to __ALL__ xfail the masked array test to avoid supporting incorrect behavior see astropy#5232 (review) udpate the special cases as per @mwcraig's helpful chart! [ci skip] fix the tests: the behavior of the function when given masked arrays vs non-masked arrays has changed: we're now forcing the correct behavior for earlier versions implement @juliantaylor's suggetions test should no longer use np.ma.allclose (though I don't know why ma wouldn't just work....) this commit will fail: I've fixed the underlying issue but kept the tests flexible using the wrong data type... tests for array type double backticks a second instance of bacticks had a case wrong: one of the warn cases should *not* return a NaN for np<1.11 (which is why we're catching a warning there) numpy 1.10 np.ma.median([1,2,nan,4,5]) returns nan by default no more old numpy supports. Mixed feelings about this fix an import cleanup to address the numpy <=1.8 deprecation more cleanup fix changelog add a note about keepdims whitespace fix fix imports trailing whitespace?!!?!?!
(this is a squash of 38 commits with messages below:) make mad_std nan-compatible mad_std too add changelog entry np->numpy in docstrings & check numpy version (using 1.9, because officially it was added to numpy then according to their docs...) use astropy.utils.compat (thanks @bsipocz) 1_9, not 1P9 bump to 1.10 to avoid np 1.9 weirdness attempt to work around older numpy versions using ma.median add tests for scalarness and use np.ma.median except when axis=None. Also, change 'a' to 'data'. 'a' is a bad variable name and led to a few mistakes add another test of madstd with nan & axes add a warning about NaNs in arrays for np<1.10. Remove a test for those versions. update docstrings & changelog. Add deprecation of old argument correct the code to match the comment change order of bool checking to do expensive one last Test appropriate numpy versions, check warnings, and use asked array comparison switch to catch_warnigns for numpy 1.13, treatment of NaNs in masked arrays changed add GE_13 to __ALL__ xfail the masked array test to avoid supporting incorrect behavior see astropy#5232 (review) udpate the special cases as per @mwcraig's helpful chart! fix the tests: the behavior of the function when given masked arrays vs non-masked arrays has changed: we're now forcing the correct behavior for earlier versions implement @juliantaylor's suggetions test should no longer use np.ma.allclose (though I don't know why ma wouldn't just work....) this commit will fail: I've fixed the underlying issue but kept the tests flexible using the wrong data type... tests for array type double backticks a second instance of bacticks had a case wrong: one of the warn cases should *not* return a NaN for np<1.11 (which is why we're catching a warning there) numpy 1.10 np.ma.median([1,2,nan,4,5]) returns nan by default no more old numpy supports. Mixed feelings about this fix an import cleanup to address the numpy <=1.8 deprecation more cleanup fix changelog add a note about keepdims whitespace fix fix imports trailing whitespace?!!?!?!
Rebased and squashed. |
@crawfordsm - Could you clarify, is this ready to be merged or we're still waiting for others to comment? |
I was primarily just waiting on the rebase and squashing. Merged. Thanks @keflavich ! |
Simple PR: use
np.nanmedian
instead ofnp.median
ifignore_nan
is set. I would favor it defaulting toTrue
, but to avoid breaking backward compatibility I set it toFalse
. Feedback on that option would be helpful.