Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow in-place modification of coordinate BaseFrame and SkyCoord #9857

Merged
merged 24 commits into from Apr 25, 2020

Conversation

taldcroft
Copy link
Member

@taldcroft taldcroft commented Jan 13, 2020

Description

This pull request is to address #6394, #5745 and #6348 by adding __setitem__ to the SkyCoord and BaseCoordinateFrame classes to allow setting values in array-valued instances.

It also adds an insert() method and SkyCoordInfo.new_like() method to support table operations with coordinates (vstack, dstack, insert). With this update, SkyCoord is fully supported in Table for non-masked operations. With one small exception, the changes to the table package in this PR are only in testing. The table change was needed because SkyCoord does not allow reshaping by setting the shape attribute.

Key points:

  • This is intentionally a limited implementation that requires the right hand side value to be entirely equivalent to the left side except for the actual representation data. This includes strict class equality and all frame attributes being equivalent.
  • This is the minimal requirement to support Table operations.
  • Setting values in Representation objects is done by a private _setitem method so prevent updating representation data without clearing the parent frame cache.
  • The key assumption here is that clearing the frame cache is always sufficient to prevent any inconsistencies after updating representation data.

Closes #6394
Closes #5745
Closes #6348

@taldcroft
Copy link
Member Author

@adrn @eteq - still looking for some high-level feedback on this PR to allow a limited and safe version of setitem in coordinates.

@bsipocz bsipocz added this to the v4.1 milestone Feb 21, 2020
@taldcroft
Copy link
Member Author

@adrn @eteq - still looking for some high-level feedback on this PR to allow a limited and safe version of setitem in coordinates.

elif frame_attr_shape != frame.shape:
# How to broadcast a ShapedLikeNDArray subclass like Time. E.g.
# an obstime attribute with shape (3,) and the frame has shape (10, 3).
raise NotImplementedError('help me @mhvk!')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attribute __get__ should have ensured the attribute is either a scalar or has been broadcast to the frame shape:

if isinstance(out, ShapedLikeNDArray):
out = out._apply(np.broadcast_to, shape=instance_shape,
subok=True)
else:
out = np.broadcast_to(out, instance_shape, subok=True)

But that of course may leave it unwriteable... I guess the safest might be to do the same broadcasting (sadly, two different ways to do it until I get around to #8610) and then copy.

@@ -255,6 +255,19 @@ def _apply(self, method, *args, **kwargs):
apply_method(getattr(self, component)))
return new

def _setitem(self, item, value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My sense would be to move this method to BaseFrame, as I don't think Representation has to know about this.

Or implement __setitem__ for representations and (somehow) take care that coord.data contains or returns a read-only version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cannot easily be moved out to BaseFrame because of the differentials. Notice that there are two versions of _setitem and the differentials one calls the super() version. So these really are part of the Representation class structure (in that they deal with different behavior of subclasses).

On the other idea of making the representation data be read-only, that pretty much breaks everything, including the existing (as of 4.0) API for updating coordinates in place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, fair point about the subclassing.

But I think I wasn't very clear about my alternative suggestion: it was to just define __setitem__ with what you have here, since representations themselves have no reason to be immutable (they're just complicated quantities as far as I'm concerned, and your current _setitem ensures we do not set with a wrong representation class).

The read-only aspect I mentioned only because, as far as I can tell, the only reason representations are currently immutable is to ensure people do not mess up by setting data. But as you note, we in fact show work-arounds for that, so if we are making SkyCoord mutable anyway, perhaps we do not have to worry too much about sky_coord.data becoming mutable itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say the other reason for .data not being mutable is that it makes it impossible to know when to invalidate a cache (although that might be what you meant by "mess up"?). That is, a user doing coord.data[<whatever>] = something will muck up anything cached in coord. Most important of those is that in some cases (I forget when exactly), the cartesian representation is cached.

That said, it's true we specifically allow ways around it, but with the "here there be dragons" warning in the docs. I think maybe that convention is ok here as long as it's clear anyone stumbling across it sees it (which I guess then needs to be highlighted in the docs - I have some suggestions about that in my review).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eteq, @taldcroft - thinking more about this, I still think we should just give representations a __setitem__ - they really have no reason to be immutable. On the caching question, I think it is important that we make sure our code works well, and that we have appropriate warnings to avoid user errors; I don't think we should make convoluted code just to ensure users cannot make mistakes. So, I would suggest to use __setitem__ for the representations as well as for SkyCoordinate and BaseFrame, and be sure to add a warning to the docstring of data.

p.s. Just as an example of how hopeless it is to prevent users messing up if they go for attributes, consider a user who, quite logically in my opinion (arguably more logical at least than setting data), thinks that one can change coordinates from their named attributes:

sc = SkyCoord(np.arange(2.)*u.deg, np.arange(2.)*u.deg)                                        
sc                                                                                             
# <SkyCoord (ICRS): (ra, dec) in deg
#     [(0., 0.), (1., 1.)]>
sc.ra                                                                                          
# <Longitude [0., 1.] deg>
# Aha, I can just change RA, right?
sc.ra[1] = -1*u.deg                                                                            
# Let me just check to be sure
sc.ra                                                                                          
# <Longitude [  0., 359.] deg>
# Clearly, that worked.
# But...
sc                                                                                             
# <SkyCoord (ICRS): (ra, dec) in deg
#     [(0., 0.), (1., 1.)]>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. Of course, none of this would matter if we didn't have to have caching... https://www.martinfowler.com/bliki/TwoHardThings.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should the lower level be done in a different way? What makes them special?

It is exactly the caching and the fact that .data does not have any reference to a parent frame, so it cannot invalidate the cache in __setitem__. Caching is basically needed because of APE-5 and astropy/astropy-api#12, e.g.

assert icrs.spherical.lat == 5*u.deg
assert icrs.cartesian.z.value > 0
icrs.representation_type = 'cartesian'  # It works

In other words the API allows seamlessly changing representations, so caching is needed to make that performant. I will accept some blame here, this seemed like a fine idea and the community accepted it. If we had a time machine it would be worth trying to avoid anything in the API that requires caching. But that ship has sailed. I don't see a way out without make a new class or a serious API break.

Having said all this, can we open a new issue to "consider adding __setitem__ to representations", and stipulate that this is out of scope for this PR, which is explicitly about making SkyCoord and Frame a little better and to support table operations for SkyCoord. In other words, perfect is the enemy of good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, I am not claiming caching is a bad idea, just remarking that it is known in computer science that to do it right is very hard (Indeed, that has already been amply demonstrated for Time, where it similarly makes sense to cache, but where of course setting the equivalent of .data, jd1[...] and jd2[...] screws up the cache as well).

But before just giving in on making a separate issue (which just means I have to make the changes and we will need to have the whole discussion all over again): what is really wrong with just changing _set_item to __setitem__? Right now one can screw up nearly everything else in a SkyCoord by setting an item in most attributes (except TimeAttribute, which is made read-only; quantities, locations, and now coordinate origins are settable), so why make .data different. In fact, arguably doing so is confusing since not allowing setting it suggests it is safe to set all the other attributes.

And of course, by changing to __setitem__, table would work for representations as well!

Note that there is no additional risk of previous code breaking; we are only talking about new users possibly messing up.

p.s. I'm quite willing to make a PR to yours that does it, adjusts the .data docstring, and adds some tests for representations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is really the issue is that making the change is controversial, so I do not want my PR being blocked and delayed by a controversial discussion that has nothing to do with the goal of this PR. That change is out of scope and so it really belongs in a separate PR. We should keep each PR as small as possible, no?

This one is done and ready to merge now (apart from one missing backtick and final review of doc updates from @eteq), and there is no reason it needs a new PR to make it complete or even better. Once this is merged (which could happen as soon as @eteq looks over changes), then you can make a nice clean PR against master.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taldcrof - OK, let's indeed go that route. I've gotten quite convinced this all is a good idea, and, as you say, we can do the next step in a separate PR.

@mhvk
Copy link
Contributor

mhvk commented Apr 1, 2020

The implementation here looks OK, but I guess the general question is whether we are throwing away too much simplicity, and increasing the risk for inconsistency, by making SkyCoord mutable. I think if we stick to the current implementation where frames have to be identical, we are probably OK (it might have to be made slightly more flexible, like in the Time.__setitem__ case, by explicitly checking that the item being replaced has exactly the same attributes, rather than the whole frame.).

But I do worry about the more general case, i.e., of setting an arbitrary SkyCoord with another arbitrary one (maybe including handling things like mixing proper motion with no proper motion, etc.). This will not be simple to get right.

For instance, I do not think it is easy to keep the promise of in-place setting for the general case, in that while the coordinates themselves are set in place, in the background attributes may have to be copied (since they can be scalars but will need to have the coordinate's shape), which means things would not necessarily propagate back correctly. E.g., consider the following example

sc = SkyCoord([[1., 2.], [3., 4]]*u.hour, [[-10, -5], [5, 10]]*u.deg, obstime=Time('J2000'))
sc0 = sc[0]
sc0[1] = SkyCoord(5*u.hour, 20*u.deg, obstime=Time('J2010'))

And in any semi-trivial implementation the coordinates in sc will be changed, but obstime will not, because in sc0 it will have to be copied (as it is a scalar) to allow setting an element.

It may be good to be clear about use cases. Presumably one of the case which this addresses is that in the documentation, where initializing a SkyCoord is too slow. But is the is_equivalent_frame check fast enough that this would actually help?

Another case is probably Table, in particular stacking and joining. For that case, would having a well-working stack/join option be good enough? I ask in particular since between clearing the cache and possibly updating attributes, one is effectively going to make quite a few copies anyway (just accessing .ra will make a copy in my above example cases, since those are in hourangle, and .ra is always in degrees), which suggests that perhaps ensuring that any change creates a new copy is the safest/easiest route. (I like a separate code path, e.g., implementing np.stack for coordinates, better since it isolates any problems...)

@taldcroft
Copy link
Member Author

I guess the general question is whether we are throwing away too much simplicity, and increasing the risk for inconsistency, by making SkyCoord mutable.

This PR does not make SkyCoord mutable, it has been mutable since at least astropy 2.0 (https://docs.astropy.org/en/stable/coordinates/inplace.html). There are not any open issues (from what I could see) related to changing a SkyCoord in-place. Even bing goes straight to the right documentation page for a search for "modify astropy skycoord in-place". (Duckduckgo seemed lost in space however).

But certainly the goal here is to get the experts to try to conjure any scenario in which this PR (and the currently documented mutability) could result in a broken or inconsistent coordinate object.

My interest is largely related to making tables work across the board with astropy mixin columns, and to fix #6348 and #5745. So I'm just thinking about supporting only a very limited case of precisely equivalent frames. I'm not interested in the home-run of allowing arbitrary right-side coordinate attributes. I agree that seems hard to impossible to do correctly all the time.

For high-performance one probably needs to use the back-door method that is currently documented. But it is worth noting that doing a single in-place update with __setitem__ here is about 3 times faster than making a new SkyCoord. And we certainly could add a kwarg to __setitem__ like skip_checks which allows skipping the checks if the caller is certain they are not needed. For instance sc.__setitem__(12, sc[13], skip_checks=True).

@taldcroft
Copy link
Member Author

A self-reminder to look back at #9536 which is related.

@mhvk
Copy link
Contributor

mhvk commented Apr 3, 2020

@taldcroft - I note that the current text at the link you provide states "Coordinates are generally considered to be immutable"...

But looking back at Time, where I was somewhat similarly worried, it has turned out that just providing the simple has-to-be-the-same location has been good enough. And similar strictness clearly is not very intrusive in terms of code. So, as long as we make clear in the documentation that we have no plans to generalize this, I'm fine with implementing your simple solution (i.e., without any change in attributes).

@taldcroft
Copy link
Member Author

I note that the current text at the link you provide states "Coordinates are generally considered to be immutable"...

Good point, that needs to be changed! 😼

Are you also good with having a validated kwarg in __setitem__ ala TableColumns? This will help performance in table operations where we have just created the new SkyCoord with info.new_like() and already know that it is fine. As usual, new_like() will take care that the inputs are equivalent.

def __setitem__(self, item, value, validated=False):

@mhvk
Copy link
Contributor

mhvk commented Apr 3, 2020

Yes, 👍 to the additional keyword; easier than having to reimplement parts of __setitem__ elsewhere.

@taldcroft
Copy link
Member Author

I started writing new_like() and realized that we may actually need validated (or something similar) instead of just wanting it. Right now the BaseCoordinateFrame.__setitem__ implementation ignores frame attributes entirely because it assumes they are all exactly the same.

However, the idea of new_like() is that it makes a blank template object that gets filled in. For a SkyCoord with an array attribute like obstime one needs to copy those array values in the same way as the primary data values, and that would happen in __setitem__. Of course one could start with a more restricted limitation that does not allow table operations for a SkyCoord with array-valued frame attributes.

@mhvk
Copy link
Contributor

mhvk commented Apr 3, 2020

Yes, that makes sense. It helps to be able to override things if one knows they are correct.

@taldcroft taldcroft changed the title [WIP] __setitem__ for coordinate BaseFrame and SkyCoord Allow in-place modification of coordinate BaseFrame and SkyCoord Apr 15, 2020
@taldcroft
Copy link
Member Author

@mhvk - this is ready for final review. The docs build failure is just a whitespace error in the changelog. I have that commit ready locally but I'll wait to push that pending your review. The other two fails look unrelated.

@taldcroft
Copy link
Member Author

BTW I updated the top description as well.

@taldcroft
Copy link
Member Author

taldcroft commented Apr 24, 2020

there is a problem: by default, since representations were meant to be immutable, input arrays are broadcast against each other

Argh. That seems like a bit of unnecessary optimization for how coordinates get used in real life (or maybe I'm missing some use case). I really hope you can fix this, but maybe in the worst case I find a way to raise an exception in setitem if the components are not the same shape? Or just broadcast the components all to the same shape at that time (on-demand)?

Ah, now I see that the components are broadcast initially but sharing memory. So can one check shapes and make sure a copy gets done if shapes are not matching?

@mhvk
Copy link
Contributor

mhvk commented Apr 24, 2020

It was simply the easiest way to ensure consistent shapes. I think now it should be possible to move the copy after the broadcast. I agree that it is not a particularly super-useful optimization since it gets lost the moment there is any transformation anyway.

@mhvk
Copy link
Contributor

mhvk commented Apr 25, 2020

@taldcroft - I added a commit which (I think) fixes things. Please have a look...

@mhvk
Copy link
Contributor

mhvk commented Apr 25, 2020

Oh, darn, I forgot to adjust the representation tests...

@mhvk
Copy link
Contributor

mhvk commented Apr 25, 2020

Should be OK now.

This so that memory is never broadcast, so setting items is safe.
@mhvk
Copy link
Contributor

mhvk commented Apr 25, 2020

Darn, I think I have somehow removed one of the docs-fixing commits. Will try to fix tomorrow.

@taldcroft
Copy link
Member Author

In theory I have fixed it. This is the result of git automerge being broken and actually corrupting the changelog, and it is now the second time I fixed that problem in this repo.

@mhvk
Copy link
Contributor

mhvk commented Apr 25, 2020

OK, thanks. Let's get this in!

@mhvk mhvk dismissed eteq’s stale review April 25, 2020 11:01

Changes were made following request

@mhvk mhvk merged commit 9f3858e into astropy:master Apr 25, 2020
@taldcroft taldcroft deleted the coord-setitem branch April 25, 2020 13:32
@taldcroft
Copy link
Member Author

@mhvk - thanks for the review and good catch + fix on the broadcasting issue.

@bsipocz
Copy link
Member

bsipocz commented Apr 25, 2020

This may deserve a what's new entry, as the skycoord mixin support for table operations is a great improvement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants