Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering of Tables with datetime MaskedColumns now produces TypeError #9374

Closed
talister opened this issue Oct 15, 2019 · 14 comments
Closed

Comments

@talister
Copy link

Description

Filtering of Tables with MaskedColumns to produce a "row mask" which can then be used to produce a subset of the Table used to work in 3.2.1 but no longer works in 3.2.2

Expected behavior

A smaller subset of the Table should be produced

Actual behavior

Produces a TypeError: Cannot set fill value of string with array of dtype bool as per attached stack trace:
table_filtering_stack_trace.txt

Steps to Reproduce

This is using a Table produced by astroquery.jplhorizons but I have no reason to believe it's specific to that code as that just returns a Table.

from datetime import datetime, timedelta
from astroquery.jplhorizons import Horizons
from astropy.table import Column

start = datetime(2019,10,13)
end = start + timedelta(days=10)
eph = Horizons(id='2011 YS62', id_type='smallbody', epochs={'start' : start.strftime("%Y-%m-%d %H:%M:%S"),  'stop' : end.strftime("%Y-%m-%d %H:%M:%S"), 'step' : '10m'}, location='-1')
ephem = eph.ephemerides(quantities='1,3,4,9,19,20,23,24,38,42',\
            skip_daylight=False, airmass_lessthan=99, max_hour_angle=12)
dates = Column([datetime.strptime(d, "%Y-%b-%d %H:%M") for d in ephem['datetime_str']])
ephem.add_column(dates, name='datetime')
visible_ephem = ephem[(ephem['datetime'] >= datetime(2019,10,14)) & (ephem['datetime'] < datetime(2019,10,15))]

Issue seems to be with the datetime column, filtering on other columns with dtype=str1 or dtype=float64 seems to work:

In [17]: ephem['solar_presence'].info                                                                                   
Out[17]: 
name = solar_presence
dtype = str1
unit = ---
class = MaskedColumn
n_bad = 0
length = 1441
In [6]: len(ephem)                                                                                                      
Out[6]: 1441
In [11]: subset = ephem[(ephem['solar_presence'] != 'C') & (ephem['solar_presence'] != 'N')]                            
In [12]: len(subset)                                                                                                    
Out[12]: 1355

In [10]: ephem['alpha'].info                                                                                            
Out[10]: 
name = alpha
dtype = float64
unit = deg
class = MaskedColumn
n_bad = 0
length = 1441
In [15]: subset = ephem[(ephem['alpha'] > 58.0)]                                                                          
In [16]: len(subset)                                                                                                    
Out[16]: 992

System Details

Linux-3.10.0-1062.1.1.el7.x86_64-x86_64-with-centos-7.7.1908-Core
Python 3.7.3 (LCOGT.net, May 28 2019, 18:39:56) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]
Numpy 1.17.2
Scipy 1.3.1
astropy 3.2.2
@bsipocz bsipocz added the table label Oct 15, 2019
@taldcroft
Copy link
Member

This is likely caused by #8904 which was introduced in 3.2.2. Cc @mhvk.

An immediate workaround is if you use Time instead of Column. Then you get a bona-fide Time object in the table and no crash.

dates = Time([datetime.strptime(d, "%Y-%b-%d %H:%M") for d in ephem['datetime_str']])

@taldcroft taldcroft added the Bug label Oct 16, 2019
@mhvk
Copy link
Contributor

mhvk commented Oct 16, 2019

@taldcroft - I don't see where Quantity comes in... But one fortunate thing: the problem seems to have disappeared on master! (i.e., it will be gone in 4.0).

More minimal example:

from datetime import datetime
from astropy.table import MaskedColumn
mc = MaskedColumn([datetime(2000, 1, 1), datetime(2001, 1, 1)])
mc > datetime(2000, 6, 6)

@mhvk
Copy link
Contributor

mhvk commented Oct 16, 2019

p.s. @talister - the above said, I would agree with @taldcroft that it is nicer to work with Time directly.

@taldcroft
Copy link
Member

@mhvk - I reproduce the original problem on master. It maybe a numpy version issue? Your minimal example fails for me on numpy 1.17.2.

@taldcroft
Copy link
Member

Starting from the minimal example and using the post #8904 version and the pre- version:

In [10]: getattr(super(type(mc), mc), '__lt__')(datetime(2000, 6, 6))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/miniconda3/envs/astropy/lib/python3.6/site-packages/numpy/ma/core.py in _check_fill_value(fill_value, ndtype)
    472             try:
--> 473                 fill_value = np.array(fill_value, copy=False, dtype=ndtype)
    474             except (OverflowError, ValueError):

ValueError: invalid literal for int() with base 10: '?'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-10-e6b1ed32c4f5> in <module>()
----> 1 getattr(super(type(mc), mc), '__lt__')(datetime(2000, 6, 6))

~/git/astropy/astropy/table/column.py in _compare(self, other)
    986 
    987             # Now just let the regular ndarray.__eq__, etc., take over.
--> 988             result = getattr(super(), op)(other)
    989             # But we should not return Column instances for this case.
    990             return result.data if isinstance(result, Column) else result

~/git/astropy/astropy/table/column.py in __array_finalize__(self, obj)
    360 
    361         if callable(super().__array_finalize__):
--> 362             super().__array_finalize__(obj)
    363 
    364         # Self was created from template (e.g. obj[slice] or (obj * 2))

~/miniconda3/envs/astropy/lib/python3.6/site-packages/numpy/ma/core.py in __array_finalize__(self, obj)
   3015         # Finalize the fill_value
   3016         if self._fill_value is not None:
-> 3017             self._fill_value = _check_fill_value(self._fill_value, self.dtype)
   3018         elif self.dtype.names is not None:
   3019             # Finalize the default fill_value for structured arrays

~/miniconda3/envs/astropy/lib/python3.6/site-packages/numpy/ma/core.py in _check_fill_value(fill_value, ndtype)
    477                 # that the passed fill_value is not compatible with the ndtype.
    478                 err_msg = "Cannot convert fill_value %s to dtype %s"
--> 479                 raise TypeError(err_msg % (fill_value, ndtype))
    480     return np.array(fill_value)
    481 

TypeError: Cannot convert fill_value ? to dtype bool

In [11]: getattr(mc.data, '__lt__')(datetime(2000, 6, 6))
Out[11]: 
masked_array(data=[ True, False],
             mask=False,
       fill_value='?')

@mhvk
Copy link
Contributor

mhvk commented Oct 16, 2019

Hmm, that's weird! It now fails for me too... I must have done something silly before.

@mhvk
Copy link
Contributor

mhvk commented Oct 16, 2019

Hmm, this seems a numpy problem that was fixed for __eq__ and __ne__ in numpy/numpy#12257 (on a problem reported by @astrofrog...), but not for the others.

@mhvk
Copy link
Contributor

mhvk commented Oct 16, 2019

Yikes, there is some inconsistency going on in trying to reproduce examples; at some points, a MaskedArray example gave problems too, and then at others it didn't. Bizarre.

@mhvk
Copy link
Contributor

mhvk commented Oct 16, 2019

OK, I'm almost going crazy, given this:

import numpy as np        
from datetime import datetime
from astropy.table import MaskedColumn
ma = np.ma.MaskedArray([datetime(2000, 1, 1), datetime(2001, 1, 1)], mask=False)
mc = MaskedColumn(ma)
mc > datetime(2000, 6, 6)
# works
mc2 = MaskedColumn([datetime(2000, 1, 1), datetime(2001, 1, 1)])
mc2 > datetime(2000, 6, 6)
# TypeError

But there is a minute difference:

mc._fill_value
# array('?', dtype=object)
mc2._fill_value
# array('?', dtype='<U1')

The latter is wrong - should be object given that the dtype of the array is object.

@mhvk
Copy link
Contributor

mhvk commented Oct 16, 2019

See numpy/numpy#14727
It may be possible to work around this, but I cannot quite bring myself to look into this further...

@talister
Copy link
Author

This is likely caused by #8904 which was introduced in 3.2.2. Cc @mhvk.

An immediate workaround is if you use Time instead of Column. Then you get a bona-fide Time object in the table and no crash.

dates = Time([datetime.strptime(d, "%Y-%b-%d %H:%M") for d in ephem['datetime_str']])

I tried making a Column of a Time initialized from a list, which failed the same way - I will try adding a Time directly.

@mhvk
Copy link
Contributor

mhvk commented Oct 16, 2019

Yes, it would need to be the Time instance directly; I think every Column with object dtype will fail because of the MaskedArray bug.

@MridulS
Copy link
Contributor

MridulS commented Mar 1, 2024

BTW this is fixed upstream now (numpy 1.24) and the codeblocks in #9374 (comment) work just fine now.

@mhvk
Copy link
Contributor

mhvk commented Mar 1, 2024

Thanks, good to cross-check! Let's just close the issue, since our minimum numpy version will be 1.24 soon enough...

@mhvk mhvk closed this as completed Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants