-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MaskedQuantity
breaking np.nanmean
etc. methods in aggregate_downsample
#12435
Comments
@dhomeier - thanks, this example definitely proves we should support Another question is what should happen with all-NaN columns. I guess most logical is to just mask those. |
I thought one could just apply the nanfunctions to the numerical content (as in the |
What output would one expect for them? Technically it's a function on an empty sequence/array either way; np.nanfunctions generally are consistent in that, only for the all-masked case it looks a bit off: # all-NaN, should be 0.0 / 0
np.nanmean(ma[4:6])
<stdin>:1: RuntimeWarning: Mean of empty slice
nan
# all masked
np.nanmean(ma[5:7])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 5, in nanmean
File "/opt/lib/python3.10/site-packages/numpy/lib/nanfunctions.py", line 950, in nanmean
avg = _divide_by_count(tot, cnt, out=out)
File "/opt/lib/python3.10/site-packages/numpy/lib/nanfunctions.py", line 212, in _divide_by_count
return np.divide(a, b, out=a, casting='unsafe')
ValueError: output array is read-only |
@dhomeier - I tried to think back a little more what the problem was, and remembered that I had a half-finished implementation. For the inexact types, this is easy: Just fill all masked values with The trick was the integer and other non-exact dtypes. As you noted, the implementation for But for the nanfunctions, that would not be correct. For instance, the docstring of Anyway, I think your issue means this should just get done... I've started updating the old PR I had, but I'm still getting errors... Also, I am now seeing that p.s. Note that the nanfunctions do not properly support
|
Yes, the way they are ignoring masked elements or not does not make much sense to me. >>> np.nanmedian(np.ma.array([1, 2, 3, 100, np.nan, np.nan], mask=[0, 0, 0, 1, 1, 0]))
/Users/derek/opt/lib/python3.10/site-packages/numpy/lib/function_base.py:3685: UserWarning: Warning: 'partition' will ignore the 'mask' of the MaskedArray.
part.partition(kth)
3.0
>>> np.nanmedian(np.ma.array([1, 2, 3, 100, np.nan, np.nan, 200], mask=[0, 0, 0, 1, 1, 0, 0]))
nan
>>> np.nanmedian(np.ma.array([1, 2, 3, 100, np.nan, np.nan, 200], mask=[0, 0, 0, 1, 1, 1, 0]))
/Users/derek/opt/lib/python3.10/site-packages/numpy/core/fromnumeric.py:755: UserWarning: Warning: 'partition' will ignore the 'mask' of the MaskedArray.
a.partition(kth, axis=axis, kind=kind, order=order)
nan
>>> np.nanmedian(Masked(np.ma.array([1, 2, 3, 100, np.nan, np.nan, 200], mask=[0, 0, 0, 1, 1, 1, 0])))
MaskedNDArray(2.5)
>>> np.nanmedian(Masked(np.ma.array([1, 2, 3, 100, np.nan, np.nan, 200], mask=[0, 0, 0, 1, 1, 1, 1])))
MaskedNDArray(2.) |
I understand that as |
I have not tracked down exactly how the timeseries is initialised here, but I think the critical step is the change in reading in the QTable, which previously had the columns in question as NaN-valued Quantities: >>> qt = table.QTable.read('tess2019128220341-0000000410458113-0016-s_lc.fits')
WARNING: dropping mask in Quantity column 'PSF_CENTR2_ERR': masked Quantity not supported [astropy.table.table]
>>> qt[:3]['PSF_CENTR1_ERR']
<Quantity [nan, nan, nan] pix> whereas in 5.0x this becomes <MaskedColumn name='PSF_CENTR1_ERR' dtype='float32' unit='pix' format='{:14.7e}' length=3>
--
--
--
>>> qt[:3]['PSF_CENTR1_ERR'].data
masked_array(data=[--, --, --],
mask=[ True, True, True],
fill_value=1e+20,
dtype=float32) Now the original file of course has no record of a mask, but simply >>> hdulist[1].data[:3]['PSF_CENTR1_ERR']
array([nan, nan, nan], dtype=float32) so it feels to me overly imposing to automatically convert the values to masked elements. My suggestion is to provide some option |
I'll try to push my fix for the nanfunctions later today - it was mostly done... |
@dhomeier - the FITS standard is to use |
@mhvk - thanks! |
Sorry, I had not checked that part of the standard. As it was already treated this way in non-Q Tables, it's certainly a step forward to enable this for |
@dhomeier - indeed, it will be good if all types of tables behave the same way! And no surprise you would have missed that; there really should be a what's new entry about it (see #11914 (comment)...). Of course, that hopefully will mean that But in the meantime, hopefully #12454 will fix the regression... |
@mhvk that's actually a good workaround for now, to replace them with their standard counterparts assuming that all NaN values would be masked anyway! rmse_func = lambda x: np.sum(x) if hasattr(x, 'mask') and np.all(np.isfinite(x)) else np.nansum(x) if np.any(np.isfinite(x)) else np.nan in |
Missed again that you already had the PR in. It (or the 5.0.x backport) does fix the lightkurve regression, thanks a lot. Only thing I noted is that the returned columns from |
Great to hear that the issue is fixed, and even better that with the new |
I guess we should close this issue - while I'm indeed not completely sure the present behaviour is entirely consistent, that is better as a separate discussion. At least this is fixed! |
Description
A downstream issue lightkurve/lightkurve#1157 appears to have been triggered by the use of
Masked
objects, probably following #11127. The same code that returned atimeseries
subclass withQuantity
columns, "masked" withnp.nan
where applicable, with astropy 5.0rc1 is creating columns ofMaskedQuantity
. This is breaking any code using functions likenp.nanmean
that are currently not supported.Expected behaviour
MaskedQuantity
should either fully support required array functions or not be used where still unsupported functions are required.Actual behaviour
For a more minimal example that can also be constructed in 4.3 (i.e. the
aggregate_downsample
failure is independently from its updates in 5.0) – note that the array functions do work fornp.ma.array
, but the NaNs don't seem to be correctly caught in the downsampling:Steps to Reproduce
I am not sure how exactly the timeseries subclass in the downstream issue is instantiated, but the change to
MaskedQuantity
occurred with 5.0rc1 without any other code changes (@barentsen may have more background on the read-in details). So the code should probably be more conservative in using Masked objects for now, or functions likeaggregate_downsample
will need to escape those methods applying them either to theunmasked
instances or their numpy content.Better still would of course be to directly support those functions. A naïve attempt to just add them to
function_helpers.MASKED_SAFE_FUNCTIONS
actually (sort of) worked fornp.nanmedian
, butnp.nanmean
,np.nansum
, np.nanprod` then fail withso they will probably need a dedicated implementation – analogously to
MaskedIterator.mean
?System Details
Python: 3.8.12 / 3.9.8 / 3.10.0
Platform: macOS-10.14.6-x86_64-i386-64bit
Astropy: 5.0rc1
Numpy: 1.21.2
Scipy: 1.7.1
Matplotlib: 3.5.0rc1
The text was updated successfully, but these errors were encountered: