Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantities with structured units #11775

Merged
merged 57 commits into from
Aug 9, 2021
Merged

Conversation

mhvk
Copy link
Contributor

@mhvk mhvk commented May 23, 2021

Ever since we made the ERFA ufuncs, I've wanted to be ablve to override all of them, but that was tricky because many require either Time input or deal with structured arrays. This PR addresses the latter problem, by implementing a StructuredUnit, which can be used by quantities with structured dtype. As one can see from the commit history (EDIT: well, not under "commits", but from the list in-line; first commit is from 2018 May 28, almost 3 years ago...), it has taken me quite a bit of time, on and off, but the final implementation is actually reasonably simple - by far the most here is tests, especially of all the ERFA functions that use structured arrays. If this is thought a good idea, I'll complete the ERFA coverage in follow-up in two stages, first for just units, and second also with Time (latter to be discussed how best to do it).

There is also a small documentation page, just to give a sense of how it works.

Opened as draft to get initial feedback. cc @nstarman, who might enjoy a different aspect of astropy.

Fixes #3777

@mhvk mhvk added this to the v5.0 milestone May 23, 2021
@mhvk mhvk requested a review from adrn May 23, 2021 03:37
@github-actions github-actions bot added the Docs label May 23, 2021
@github-actions
Copy link

👋 Thank you for your draft pull request! Do you know that you can use [ci skip] or [skip ci] in your commit messages to skip running continuous integration tests until you are ready?

Copy link
Member

@nstarman nstarman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks great and extremely useful!

I need to check out this branch and hunt for edge cases because there are a number of complex things happening.
As I'm reading through this now, my two conceptual concerns are: 1) should StructuredUnit inherit from UnitBase and 2) SpecificTypeQuantity, does this work (esp for ones with added equivalencies)?

astropy/units/quantity.py Show resolved Hide resolved
docs/units/structured_units.rst Outdated Show resolved Hide resolved
astropy/units/structured.py Show resolved Hide resolved
astropy/units/structured.py Outdated Show resolved Hide resolved
astropy/units/format/generic.py Show resolved Hide resolved
@@ -3,15 +3,39 @@
"""Quantity helpers for the ERFA ufuncs."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are any of these publicly scoped? Nothing has docstrings...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they're not. There are some comments about what these functions are supposed to do in quantity_helpers/helpers.py. I think a good docstring in quantity_helpers/__init__.py would be an idea... (I've long wondered whether we should use module docstrings for that type of information on implementation; #8930)

Copy link
Member

@nstarman nstarman May 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable. It's where I write that, when I remember to.
I looked at #8930. I like everything except the custom section title. Numpydoc will complain unless we register in a new section title type. The existing section titles seems sufficient, Notes or routine listings can go into details. Or just stuff in the extended summary.

astropy/units/structured.py Show resolved Hide resolved
astropy/units/structured.py Show resolved Hide resolved
astropy/units/structured.py Show resolved Hide resolved
astropy/units/quantity.py Show resolved Hide resolved
@mhvk
Copy link
Contributor Author

mhvk commented May 23, 2021

On inheriting from UnitBase - I thought about it, but in the end the structured unit is really quite different (indeed, I initially had it inherit from np.void!). E.g., it is not something that one can usefully put in add_enabled_units. Also, the fact that I would be overriding essentially all methods suggests it is not really a subclass, but rather more like a duck-type.

That said, it would be useful to have some type of abstract unit class that defines the interfaces that are actually used (like having a .to() method and .is_equivalent(). The same problem happened for FunctionUnit and there is even an issue about it: #3650.

@mhvk
Copy link
Contributor Author

mhvk commented May 24, 2021

@nstarman - structured SpecificTypeQuantity is indeed possible, but testing it brought out a bug (now fixed) that u.m.is_equivalent(structured_unit) raised a FutureWarning from the physical type ID comparison. And that, really, those comparisons were including field names, which is wrong. So, keep those questions coming!

@mhvk mhvk force-pushed the quantity-structured-unit branch from 4705573 to bdf2e4e Compare May 24, 2021 01:09
@mhvk mhvk marked this pull request as ready for review May 24, 2021 15:48
@nstarman
Copy link
Member

Found a bug in printing

Screen Shot 2021-05-24 at 16 25 47

@nstarman
Copy link
Member

I'm not sure I get the design principle here.

The following works

>>> u.m.to(u.km)
0.001

But not for a structured unit.

>>> u.Unit((u.km, u.km/u.s)).to(u.Unit((u.m, u.m/u.s)))
TypeError: to() missing 1 required positional argument: 'value'

I guess I was expecting a structured array of the conversion factors, or maybe a tuple.

@nstarman
Copy link
Member

Should

def _values(self):
"""Turn the coordinates into a record array with the coordinate values.

have a structured unit?

return self._units.dtype.names

def items(self):
return zip(self._units.dtype.names, self._units.item())
Copy link
Member

@nstarman nstarman May 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use ItemsView ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And zip doesn't look good when printed. Maybe consume the iterator into a tuple?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since both the names and values are tuples and not very long, I agree one might as well make it a tuple. I guess the only downside is that it will slow down some of the code which uses .items() as an iterator, but the additional 100 ns is probably negligible to everything else...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a frequently encountered problem Python really need more sophisticated tools for building views. ValuesView, KeysView, and ItemsView should make dict-like views.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice for repr, though I am still quite enchanted with the idea that all you need to know about the output of .items() is that it is an iterable that provides key, value sequences. (If you hadn't noticed already, I love ducktyping!)

operator.methodcaller('_get_physical_type_id'), cls=Structure)

@property
def physical_type(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: should this perhaps be a dictionary the same keys as the unit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is currently a void that can be indexed with the same keys, but, like for physical_type_id of the Structure subclass, which for comparisons ignores the field names. I think that's important. Of course, __repr__ could be overridden, but the the unit input is also as a tuple. Overall, I feel this is most similar to how an individual element of the actual array looks. But I'm not sure...

return self._units.dtype.names

def items(self):
return zip(self._units.dtype.names, self._units.item())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And zip doesn't look good when printed. Maybe consume the iterator into a tuple?

astropy/units/structured.py Show resolved Hide resolved
astropy/units/structured.py Show resolved Hide resolved
astropy/units/structured.py Show resolved Hide resolved
@nstarman
Copy link
Member

Multiplication, division, and exponentiation of structured Quantities is broken

>>> pv_values = np.array([([1., 0., 0.], [0., 0.125, 0.]),
...                      ([0., 1., 0.], [-0.125, 0., 0.])],
...                     dtype=[('p', '(3,)f8'), ('v', '(3,)f8')])
>>> pv = u.Quantity(pv_values, u.StructuredUnit((u.km, u.km/u.s)))
>>> pv * 2
TypeError: invalid type promotion

Also (at least) the addition, multiplication, and division of two structured Quantities is broken

>>> pv + pv
UFuncTypeError: ufunc 'add' did not contain a loop...

@mhvk
Copy link
Contributor Author

mhvk commented May 25, 2021

Thanks for the further comments! On the ones not in-line:

  • I fixed _repr_latex_ - thanks!
  • I'll fix su.to(other_su) to have by default values of 1 in every item (and return a void). I did wonder a bit whether we wanted that? I never liked it much for regular units, since it is useful only for units conversions that are plain scalings (i.e., no equivalencies, no functional units), and I think it only encourages code that will get broken eventually. But since I did add it for function units, where it also makes little sense, I'll do it anyway...
  • I've indeed been wondering whether representations could be turned into structured quantities. But for later!
  • Operations do not work on structured quantities because they also do not work on structured arrays. This is a numpy design decision that would need serious thought before breaking. E.g., what should np.arctan(sq1, sq2) do? Note that I similarly do not allow su1 * su2.

@nstarman
Copy link
Member

Operations do not work on structured quantities because they also do not work on structured arrays. This is a numpy design decision that would need serious thought before breaking. E.g., what should np.arctan(sq1, sq2) do? Note that I similarly do not allow su1 * su2.

It makes sense that if numpy doesn't allow it, neither should we. But not being able to do 2 * pv is very unintuitive. How does one perform operations on a structured array?

@mhvk
Copy link
Contributor Author

mhvk commented May 25, 2021

It makes sense that if numpy doesn't allow it, neither should we. But not being able to do 2 * pv is very unintuitive. How does one perform operations on a structured array?

Essentially, one doesn't 😸 Of course, for things like pv, multiplication with 2 might make sense, but for ldbody or astrom it would not (similarly, for x,y,z vs lon,lat,distance). I think the general idea is that the structures are for heterogenous data. In this respect, perhaps there still is room for a deviation: if all structures have the same unit, then one could broadcast. But I think that may still be better done in a different class...

@mhvk mhvk force-pushed the quantity-structured-unit branch from b723597 to d0e3298 Compare May 29, 2021 00:07
@nstarman
Copy link
Member

nstarman commented Jun 1, 2021

Let me know when I should give this another look, and perhaps where I should look. The API surface is quite large.

@mhvk
Copy link
Contributor Author

mhvk commented Jun 1, 2021

@nstarman and @adrn - yes, further review would be very good! There are really four things to this rather large PR:

  • Is the concept OK? Mostly means looking at the new documentation, structured_units.rst
  • Do the changes to the existing stuff make sense (core.py, quantity.py, parser, etc.)
  • Is the implementation of StructuredUnit OK.
  • Is the use for erfa functions OK. This is mostly in erfa.py, but frankly it makes probably more sense to check that the tests are OK...

Copy link
Member

@nstarman nstarman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nstarman - yes, further review would be very good! There are really four things to this rather large PR:

  • Do the changes to the existing stuff make sense (core.py, quantity.py, parser, etc.)
  • core.py and quantity look good. I'm really not familiar with the parser functions, but what I can parse seems 👍.

Why is Quantity.to_list() not allowed? If it were, it would also make sense for structured Quantity.

  • Is the implementation of StructuredUnit OK.

I think I found a bug in view

import astropy.units as u, numpy as np
pv_values = np.array([([1., 0., 0.], [0., 0.125, 0.]),
                      ([0., 1., 0.], [-0.125, 0., 0.])],
                     dtype=[('p', '(3,)f8'), ('v', '(3,)f8')])
pv = u.Quantity(pv_values, u.StructuredUnit((u.km, u.km/u.s)))

x = pv_values.view(float); print(x[0])
x = pv.view(float)  # this shouldn't work
print(x[0])  # this doesn't work but should

namedtuples, should they be treated differently when creating a structured unit? Like a named tuple structured unit has fields names and can only match up with a structured array with the same names? This is almost certainly for a followup.

  • Is the use for erfa functions OK. This is mostly in erfa.py, but frankly it makes probably more sense to check that the tests are OK...

I like what I see. Looks useful for the XRepresentation PRs.

I checked out the PR and have been playing around with the examples, trying to get a feel for how it all works. I don't use structured arrays all too often, nor ERFA, so I'm certainly not seeing a lot of the edge cases. From what I can tell, structured Quantities seem quite consistent with numpy's structured arrays, and that's really the point. Hope this helps.

astropy/units/quantity.py Show resolved Hide resolved
astropy/units/quantity_helper/erfa.py Show resolved Hide resolved
astropy/units/quantity_helper/erfa.py Show resolved Hide resolved
astropy/units/quantity_helper/function_helpers.py Outdated Show resolved Hide resolved
astropy/units/structured.py Show resolved Hide resolved
try:
other = Unit(other, parse_strict='silent')
except Exception:
return NotImplemented
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does this arise? When other is not a unit string?
And can the error be specified, like UnitConversionError or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when other is something arbitrary. Here, the goal is simply to check whether the structured unit knows how to deal with other, and I think catching arbitrary exceptions is correct - we should definitely return NotImplemented on anything that goes wrong, because that means we do not know how to deal with it, and other should get a chance.

astropy/units/structured.py Outdated Show resolved Hide resolved
@mhvk
Copy link
Contributor Author

mhvk commented Jun 6, 2021 via email

@nstarman
Copy link
Member

nstarman commented Jun 6, 2021

Why is Quantity.tolist() not allowed? If it were, it would also make sense for structured Quantity.
Obviously, a bit separate from this PR, but what should the output be? For ndarray, it is just a
python representation of the array, but how can we make that for Quantity?

Looking at a shaped array, tolist returns a list of arrays. Quantity could do the same.

>>> import astropy.units as u, numpy as np
>>> pv_values = np.array([([1., 0., 0.], [0., 0.125, 0.]),
...                       ([0., 1., 0.], [-0.125, 0., 0.])],
...                      dtype=[('p', '(3,)f8'), ('v', '(3,)f8')])
>>> pv = u.Quantity(pv_values, u.StructuredUnit((u.km, u.km/u.s)))
>>> pv_values.tolist()
[(array([1., 0., 0.]), array([0.   , 0.125, 0.   ])),
 (array([0., 1., 0.]), array([-0.125,  0.   ,  0.   ]))]

i.e.

>>> (pv_values * u.m).tolist()
[([1., 0., 0.] m, [0.   , 0.125, 0.   ] m),
 ([0., 1., 0.] m, [-0.125,  0.   ,  0.   ] m)]

and for structured quantities

>>> pv.tolist()
[([1., 0., 0.], [0.   , 0.125, 0.   ]) (km, km / s),
 ([0., 1., 0.], [-0.125,  0.   ,  0.   ]) (km, km / s)]

@mhvk
Copy link
Contributor Author

mhvk commented Jun 6, 2021

I'm not sure that isn't a bug in np.ndarray.tolist()... E.g., if the structure does not have multiple values in a given element:

In [2]: a = np.array([(0., 1.), (2., 3.)], 'f8,f8')

In [3]: a.tolist()
Out[3]: [(0.0, 1.0), (2.0, 3.0)]

In [4]: type(a.tolist()[0])
Out[4]: tuple

It keeps using tuples for more complicated structures as well.

@adrn
Copy link
Member

adrn commented Jun 7, 2021

Quick note to say sorry for my lack of review -- am away right now, but would very much like to take a close look at this. I think realistically that won't happen until next week, unfortunately -- sorry!

@mhvk
Copy link
Contributor Author

mhvk commented Jun 7, 2021

@adrn - no worries, other than me wanting this off my plate, there is no real hurry!

mhvk added 18 commits July 18, 2021 19:26
Simple to do by storing units in void.

Also adjust the erfa quantity wrappers to become independent
of field name -- although pyerfa itself does not yet support that.
This since Quantity is no longer imported in structured.py
Plus tests and some cleanup.
Plus a few small changes to ensure those work.
This to avoid possible clash with UnitBase.names.
Also speed up case for dtype.  Plus cleanup and more documentation.
Turned out this needed a fix to how physical type IDs were returned,
as a plain numpy.void gave a FutureWarning on comparisons, and
is sensitive to names, which we do not want to be.
And fix .items() to directly return a tuple, for nicer interactive viewing.
@mhvk
Copy link
Contributor Author

mhvk commented Jul 19, 2021

@adrn - thanks for the review and the nice top-level thoughts! I really like the idea of ensuring QTable can be transformed to a structured quantity! Right now, one tricky aspect is that one cannot have true unit-less entries (i.e., a QTable that also has an index column would not work so nicely). This mostly since for erfa I didn't need it... (Though having None for the unit is an obvious extension - just not sure how to interface that with u.Unit(string)...) I also wonder a bit about the interface: one could imaging a as_array(units=True) (where the default might depend on Table or QTable), or a new as_quantity(), or even just ensuring Quantity(qtable) itself does "the right thing". Overall, perhaps best to leave this for follow-up?

I'd definitely like to make interactions between structured quantities and representations possible! Perhaps representations could even store their data as such (or at least keep it). I'm also considering using them in the transformations, so that we do not have to do explicit unit conversions any more. But, as you note, for follow up!

On the documentation, to me it is not so clear yet what structured quantities will be used for beyond making interaction with pyerfa possible, so my tendency is to keep the documentation as is until we have more to say. Essentially, it will be relatively obvious to people who use structured arrays, and for those who do not know about those, they probably should read the numpy documentation first (not that that is all that great...).

@mhvk mhvk force-pushed the quantity-structured-unit branch from 010725e to 5055b82 Compare July 19, 2021 00:00
@mhvk
Copy link
Contributor Author

mhvk commented Aug 1, 2021

@adrn - did you have a chance to think about the high-level questions & answers above? My sense is to get this in and leave it all for follow-up...

@adrn
Copy link
Member

adrn commented Aug 8, 2021

@mhvk - sorry for the delay! Some answers below:

Right now, one tricky aspect is that one cannot have true unit-less entries (i.e., a QTable that also has an index column would not work so nicely).

Ah, yes, that's a good point...

Overall, perhaps best to leave this for follow-up?

OK yea, you've convinced me that there are some important / subtle details to think over!

I'd definitely like to make interactions between structured quantities and representations possible! Perhaps representations could even store their data as such (or at least keep it). I'm also considering using them in the transformations, so that we do not have to do explicit unit conversions any more. But, as you note, for follow up!

Yes, great!

On the documentation, to me it is not so clear yet what structured quantities will be used for beyond making interaction with pyerfa possible, so my tendency is to keep the documentation as is until we have more to say. Essentially, it will be relatively obvious to people who use structured arrays, and for those who do not know about those, they probably should read the numpy documentation first (not that that is all that great...).

I think they are more general, but since I have a clearer idea of other use cases, perhaps I should submit a follow-up to make the documentation more general (i.e. it shouldn't hold up this PR!)

Copy link
Member

@adrn adrn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this in so we can start exploring where we might be able to use this internally within Astropy (e.g., in Representation or Differential) with enough time to test before v5.0! I'll hold off on merging in case there are any last things you'd like to update, but please ping me again if so for another quick look. If not, feel free to merge @mhvk

@mhvk mhvk merged commit 3e1bda7 into astropy:main Aug 9, 2021
@mhvk mhvk deleted the quantity-structured-unit branch August 9, 2021 13:54
@pllim
Copy link
Member

pllim commented Aug 9, 2021

Congratz on fixing an issue from 2015! 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create ndarray (recarray?) subclass that can hold multiple Quantities of mixed units
4 participants