New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow in-place modification of coordinate BaseFrame and SkyCoord #9857
Conversation
elif frame_attr_shape != frame.shape: | ||
# How to broadcast a ShapedLikeNDArray subclass like Time. E.g. | ||
# an obstime attribute with shape (3,) and the frame has shape (10, 3). | ||
raise NotImplementedError('help me @mhvk!') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The attribute __get__
should have ensured the attribute is either a scalar or has been broadcast to the frame shape:
astropy/astropy/coordinates/attributes.py
Lines 110 to 114 in 60fb417
if isinstance(out, ShapedLikeNDArray): | |
out = out._apply(np.broadcast_to, shape=instance_shape, | |
subok=True) | |
else: | |
out = np.broadcast_to(out, instance_shape, subok=True) |
But that of course may leave it unwriteable... I guess the safest might be to do the same broadcasting (sadly, two different ways to do it until I get around to #8610) and then copy.
@@ -255,6 +255,19 @@ def _apply(self, method, *args, **kwargs): | |||
apply_method(getattr(self, component))) | |||
return new | |||
|
|||
def _setitem(self, item, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My sense would be to move this method to BaseFrame
, as I don't think Representation
has to know about this.
Or implement __setitem__
for representations and (somehow) take care that coord.data
contains or returns a read-only version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cannot easily be moved out to BaseFrame
because of the differentials. Notice that there are two versions of _setitem
and the differentials one calls the super()
version. So these really are part of the Representation
class structure (in that they deal with different behavior of subclasses).
On the other idea of making the representation data
be read-only, that pretty much breaks everything, including the existing (as of 4.0) API for updating coordinates in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, fair point about the subclassing.
But I think I wasn't very clear about my alternative suggestion: it was to just define __setitem__
with what you have here, since representations themselves have no reason to be immutable (they're just complicated quantities as far as I'm concerned, and your current _setitem
ensures we do not set with a wrong representation class).
The read-only aspect I mentioned only because, as far as I can tell, the only reason representations are currently immutable is to ensure people do not mess up by setting data. But as you note, we in fact show work-arounds for that, so if we are making SkyCoord
mutable anyway, perhaps we do not have to worry too much about sky_coord.data
becoming mutable itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say the other reason for .data
not being mutable is that it makes it impossible to know when to invalidate a cache (although that might be what you meant by "mess up"?). That is, a user doing coord.data[<whatever>] = something
will muck up anything cached in coord
. Most important of those is that in some cases (I forget when exactly), the cartesian representation is cached.
That said, it's true we specifically allow ways around it, but with the "here there be dragons" warning in the docs. I think maybe that convention is ok here as long as it's clear anyone stumbling across it sees it (which I guess then needs to be highlighted in the docs - I have some suggestions about that in my review).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eteq, @taldcroft - thinking more about this, I still think we should just give representations a __setitem__
- they really have no reason to be immutable. On the caching question, I think it is important that we make sure our code works well, and that we have appropriate warnings to avoid user errors; I don't think we should make convoluted code just to ensure users cannot make mistakes. So, I would suggest to use __setitem__
for the representations as well as for SkyCoordinate
and BaseFrame
, and be sure to add a warning to the docstring of data
.
p.s. Just as an example of how hopeless it is to prevent users messing up if they go for attributes, consider a user who, quite logically in my opinion (arguably more logical at least than setting data
), thinks that one can change coordinates from their named attributes:
sc = SkyCoord(np.arange(2.)*u.deg, np.arange(2.)*u.deg)
sc
# <SkyCoord (ICRS): (ra, dec) in deg
# [(0., 0.), (1., 1.)]>
sc.ra
# <Longitude [0., 1.] deg>
# Aha, I can just change RA, right?
sc.ra[1] = -1*u.deg
# Let me just check to be sure
sc.ra
# <Longitude [ 0., 359.] deg>
# Clearly, that worked.
# But...
sc
# <SkyCoord (ICRS): (ra, dec) in deg
# [(0., 0.), (1., 1.)]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.s. Of course, none of this would matter if we didn't have to have caching... https://www.martinfowler.com/bliki/TwoHardThings.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should the lower level be done in a different way? What makes them special?
It is exactly the caching and the fact that .data
does not have any reference to a parent frame, so it cannot invalidate the cache in __setitem__
. Caching is basically needed because of APE-5 and astropy/astropy-api#12, e.g.
assert icrs.spherical.lat == 5*u.deg
assert icrs.cartesian.z.value > 0
icrs.representation_type = 'cartesian' # It works
In other words the API allows seamlessly changing representations, so caching is needed to make that performant. I will accept some blame here, this seemed like a fine idea and the community accepted it. If we had a time machine it would be worth trying to avoid anything in the API that requires caching. But that ship has sailed. I don't see a way out without make a new class or a serious API break.
Having said all this, can we open a new issue to "consider adding __setitem__
to representations", and stipulate that this is out of scope for this PR, which is explicitly about making SkyCoord and Frame a little better and to support table operations for SkyCoord. In other words, perfect is the enemy of good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear, I am not claiming caching is a bad idea, just remarking that it is known in computer science that to do it right is very hard (Indeed, that has already been amply demonstrated for Time
, where it similarly makes sense to cache, but where of course setting the equivalent of .data
, jd1[...]
and jd2[...]
screws up the cache as well).
But before just giving in on making a separate issue (which just means I have to make the changes and we will need to have the whole discussion all over again): what is really wrong with just changing _set_item
to __setitem__
? Right now one can screw up nearly everything else in a SkyCoord
by setting an item in most attributes (except TimeAttribute
, which is made read-only; quantities, locations, and now coordinate origins are settable), so why make .data
different. In fact, arguably doing so is confusing since not allowing setting it suggests it is safe to set all the other attributes.
And of course, by changing to __setitem__
, table would work for representations as well!
Note that there is no additional risk of previous code breaking; we are only talking about new users possibly messing up.
p.s. I'm quite willing to make a PR to yours that does it, adjusts the .data
docstring, and adds some tests for representations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is really the issue is that making the change is controversial, so I do not want my PR being blocked and delayed by a controversial discussion that has nothing to do with the goal of this PR. That change is out of scope and so it really belongs in a separate PR. We should keep each PR as small as possible, no?
This one is done and ready to merge now (apart from one missing backtick and final review of doc updates from @eteq), and there is no reason it needs a new PR to make it complete or even better. Once this is merged (which could happen as soon as @eteq looks over changes), then you can make a nice clean PR against master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@taldcrof - OK, let's indeed go that route. I've gotten quite convinced this all is a good idea, and, as you say, we can do the next step in a separate PR.
The implementation here looks OK, but I guess the general question is whether we are throwing away too much simplicity, and increasing the risk for inconsistency, by making But I do worry about the more general case, i.e., of setting an arbitrary For instance, I do not think it is easy to keep the promise of in-place setting for the general case, in that while the coordinates themselves are set in place, in the background attributes may have to be copied (since they can be scalars but will need to have the coordinate's shape), which means things would not necessarily propagate back correctly. E.g., consider the following example
And in any semi-trivial implementation the coordinates in It may be good to be clear about use cases. Presumably one of the case which this addresses is that in the documentation, where initializing a Another case is probably |
This PR does not make SkyCoord mutable, it has been mutable since at least astropy 2.0 (https://docs.astropy.org/en/stable/coordinates/inplace.html). There are not any open issues (from what I could see) related to changing a SkyCoord in-place. Even bing goes straight to the right documentation page for a search for "modify astropy skycoord in-place". (Duckduckgo seemed lost in space however). But certainly the goal here is to get the experts to try to conjure any scenario in which this PR (and the currently documented mutability) could result in a broken or inconsistent coordinate object. My interest is largely related to making tables work across the board with astropy mixin columns, and to fix #6348 and #5745. So I'm just thinking about supporting only a very limited case of precisely equivalent frames. I'm not interested in the home-run of allowing arbitrary right-side coordinate attributes. I agree that seems hard to impossible to do correctly all the time. For high-performance one probably needs to use the back-door method that is currently documented. But it is worth noting that doing a single in-place update with |
A self-reminder to look back at #9536 which is related. |
@taldcroft - I note that the current text at the link you provide states "Coordinates are generally considered to be immutable"... But looking back at |
Good point, that needs to be changed! 😼 Are you also good with having a astropy/astropy/table/table.py Line 252 in e85a741
|
Yes, 👍 to the additional keyword; easier than having to reimplement parts of |
I started writing However, the idea of |
Yes, that makes sense. It helps to be able to override things if one knows they are correct. |
@mhvk - this is ready for final review. The docs build failure is just a whitespace error in the changelog. I have that commit ready locally but I'll wait to push that pending your review. The other two fails look unrelated. |
BTW I updated the top description as well. |
9116ebd
to
929b2a9
Compare
Argh. That seems like a bit of unnecessary optimization for how coordinates get used in real life (or maybe I'm missing some use case). I really hope you can fix this, Ah, now I see that the components are broadcast initially but sharing memory. So can one check shapes and make sure a copy gets done if shapes are not matching? |
It was simply the easiest way to ensure consistent shapes. I think now it should be possible to move the copy after the broadcast. I agree that it is not a particularly super-useful optimization since it gets lost the moment there is any transformation anyway. |
@taldcroft - I added a commit which (I think) fixes things. Please have a look... |
Oh, darn, I forgot to adjust the representation tests... |
Should be OK now. |
This so that memory is never broadcast, so setting items is safe.
Darn, I think I have somehow removed one of the docs-fixing commits. Will try to fix tomorrow. |
In theory I have fixed it. This is the result of git automerge being broken and actually corrupting the changelog, and it is now the second time I fixed that problem in this repo. |
OK, thanks. Let's get this in! |
@mhvk - thanks for the review and good catch + fix on the broadcasting issue. |
This may deserve a what's new entry, as the skycoord mixin support for table operations is a great improvement |
Adding rows to a `Table` containing a `SkyCoord` column was made possible by astropy#9857, but the relevant table documentation was not updated until now.
Description
This pull request is to address #6394, #5745 and #6348 by adding
__setitem__
to theSkyCoord
andBaseCoordinateFrame
classes to allow setting values in array-valued instances.It also adds an
insert()
method andSkyCoordInfo.new_like()
method to support table operations with coordinates (vstack
,dstack
,insert
). With this update,SkyCoord
is fully supported inTable
for non-masked operations. With one small exception, the changes to thetable
package in this PR are only in testing. The table change was needed becauseSkyCoord
does not allow reshaping by setting theshape
attribute.Key points:
Representation
objects is done by a private_setitem
method so prevent updating representation data without clearing the parent frame cache.cache
is always sufficient to prevent any inconsistencies after updating representation data.Closes #6394
Closes #5745
Closes #6348