New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Masked inside Time #15231
Using Masked inside Time #15231
Conversation
Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.
|
👋 Thank you for your draft pull request! Do you know that you can use |
65ebd26
to
8522edb
Compare
@mhvk - you got it that my big concern would be about breaking existing code. The minimal impact to existing astropy tests is a good start, but some quick questions without really looking at the code yet:
|
I do indeed worry about that, and have not checked. I do not have a strong opinion on whether the mask should be writeable or not, so we can stick more to the
See https://docs.astropy.org/en/latest/utils/masked/index.html#utils-masked-vs-numpy-maskedarray I think that if we implement the config item I suggested that enforces
I would expect a (slight) degradation in both cases, since there now is some overhead in propagating and storing the mask (latter only 1 byte for 16 bytes of data, though, since mask is shared). Given that actual time calculations are expensive, I think the time overhead will be minimal. Would be good if we still had functioning benchmarks! (
Not directly. But it will enable using Note that the larger goal I still have in mind here is to add masking to Anyway, am happy to make changes, but would like to be sure you think this is OK at least in principle... |
@mhvk - in principle I am fine going ahead. In part this is because I think Thanks for the work! |
OK, sounds good. I add two checks on top for things I should implement. |
8522edb
to
2b47bcf
Compare
@taldcroft - I pushed an update to this effort to use Compared to what I had before, the main change is that I kept the option simple: choose either This is fairly complete, but there are a few related things I'm wondering about; would be great to get your opinion!
Here, all could be done as follow-up, but I think at least the first point would be best addressed before 6.0, since otherwise people get another API change in 6.1. The other items just add features, so could be done in a later version too (but good to think through a bit what we want, since it would affect |
p.s. I checked, and it turns out to be fairly trivial to get p.s.2 The scheme I used to create |
This makes me think about a big picture question and overall API consistency. The current If there will be a Writing it out like this makes me think that we should adhere to "one and only one way" to doing masking for each class at the expense of inconsistency in how masking is done. That would imply answers to your questions (of course up for more discussion):
|
Thanks, that's super helpful! I agree with your logic. Shall I add One further question: it would seem sensible to allow setting As for working with |
@mhvk - About the In any case, I suppose that Conceptually this gets to the original implementation and analogy with Pandas that from the user perspective there doesn't need to be much (if any) distinction between masked or unmasked. This is the benefit of having a container class (or using sentinel values). In the other direction, doing Upshot: I'm on the fence about whether to allow setting |
OK, why don't I leave settable |
OK, I pushed the further commits that make |
Note that the test on scalars uncovered a small bug in |
33162b2
to
8c6b23b
Compare
8c6b23b
to
4376e66
Compare
Here is where we come back to the fundamental issue of masked vs missing. This is surprising to me because of the implication that the data are valid but simply masked:
Most people would probably expect
I wonder if we could address this ambiguity with this:
Note that these issues are not necessarily new with this PR, but are somehow more obvious to me now. |
OK, the final push worked locally at least... |
This replaces the use of nan for jd2 with Masked arrays for jd1 and jd2. For now, the mask is not writeable if nothing Masked was input, since we do not want to change the nature of jd1 and jd2 without explicit input. Along the way, this uncovered a bug in how masked elements were treated in fitstime. The old implementation merged jd1 and jd2 together using np.stack, which removes any mask. But because jd2 was nan for any element anyway, the element still was recognized as masked. Now it is treated properly.
Moving the cache to Time itself seems more logical, since apart from the mask, all the handling of cache state is done in Time. Indeed, it was on Time before setting of Time elements was introduced in astropygh-6028. This just moves it back now that the mask handling is much simpler. One resulting change is that any cached format information still needs be deleted when changing format, since it can be out of date.
Also check that the mask is shared internally between jd1 and jd2, but not with any inputs or outputs.
696b60e
to
de5736b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, approved!!
To do:
.mask
, perhaps also internally).np.ma.MaskedArray
always on output.@taldcroft - when I introduced
Masked
, I had hoped to soon use it to have maskedSkyCoord
, etc. Of course, things never go very fast. But even at the time I had made a start by seeing how things would work forTime
, where it would replace usingnp.nan
forjd2
. This PR is to actually do that (as draft for now). This is triggered in part by #15230 which led back to #6509, a request to have an option for "NaT", ie., a marker for a bad time (as opposed to a masked one).I remember vaguely that at the time you were rather hesitant to go this route, but do not really remember why, except that it probably involved backward compatibility! The approach taken here is that if
Time
getsnp.ma.MaskedArray
as input, that's also what it will give as output (reusing the_shaped_like_input
method as before), otherwise it will beMasked
. I think this is a reasonable option, though I think we should add a configuration option for those who want to just set a given type. If you think that makes sense, I'll add that.Note that the PR consists of 4 commits all of which are working states, so perhaps useful to review independently:
Masked
everywhere. Conversion upon inputs, all output asMasked
. EDIT: also a small bug fix infitstime
, uncovered because now jd2 is no longer nan if an element is masked.Time
proper.t2 = t-t[0]
, thent2
just usesMasked
again. Probably should take_masked_cls
from the left-most argument or so. Or maybe this should be an attribute oninfo
and let that do the merging?!One API change is that the
mask
is now writeable if masked values were passed in. This is consistent with how masked arrays generally work (but could be changed). One consequence of that is that themasked
property can no longer be cached. (Obviously, this could be changed. Since the mask is always a copy of that from the inputs, we could write-protect it.)Out of scope here:
NaN
equivalent forastropy.time.Time
#6509 - though if we useMasked
internally, we become free to usenp.nan
to indicate "NaT";t = Time(val.unmasked, ...)
and then set the mask after, witht[val.mask] = np.ma.masked
).Masked(times, mask=...)
to work too.Small trial: