-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Quantity Arrays
This page summarizes different approaches that have been attempted in order to have proper support for arrays in the Quantity class. A set of 'ideal' requirements (which may not all be met) is:
- The quantity class with arrays should behave as for scalars when combining with scalars or other arrays.
- Numpy ufuncs should return the correct units
- Warnings should be raised if the units are dropped
At this time, there is an issue with the current implementation in master (#899):
In [2]: (3. * u.m) * np.array([1,2,3])
Out[2]: <Quantity [ 3. 6. 9.] m>
In [3]: np.array([1,2,3]) * (3. * u.m)
Out[3]: array([ 3., 6., 9.])
This would normally be fixed by adding the __array_priority__
attribute for Quantity
, but this does not work due to the fact that __array_priority__
gets ignored if __array__
is defined (numpy/numpy#3164) - this is how things are defined in Numpy, and I don't think we can rely on this being fixed (even if it was fixed in 1.8, things wouldn't work well for most users). This is the only real bug with the current implementation.
The second main issue is that at the moment we can't use e.g. Numpy ufuncs and get Quantities with the right units out. For example, we might want to compute np.sqrt(np.array([1,2,3]) * u.m**2)
but at the moment the units will simply be dropped.
What solutions exist?
Simply removing the __array__
method is one solution (#1139). If we do this, then we can still get ufuncs to work because it turns out that e.g. np.sin
will look for a Quantity.sin
method:
In [6]: np.sin(q)
ERROR: AttributeError: sin [IPython.core.interactiveshell]
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-6-784b22e8171a> in <module>()
----> 1 np.sin(q)
AttributeError: sin
Implementing it:
def sin(self):
return Quantity(np.sin(self.value), unit=Unit(1))
gives:
In [5]: np.sin(np.array([1,2,3]) * u.degree)
Out[5]:
array([<Quantity 0.841470984808 >, <Quantity 0.909297426826 >,
<Quantity 0.14112000806 >], dtype=object)
which is not quite what we want, because we want an array of Quantity
object. Why is it returning an array of Quantity
objects when we asked for a Quantity
with an array inside? Because it turns out that if the Quantity
class defines both __len__
and __getitem__
, then Numpy will first extract each element of the array (as a scalar quantity) using __getitem__
, then call Quantity.sin
for each Quantity
, and will store the output in an object array. If we remove either __len__
or __getitem__
, we get what we expect:
In [3]: np.sin(np.array([1,2,3]) * u.degree)
Out[3]: <Quantity [ 0.84147098 0.90929743 0.14112001] >
Of course, we probably don't want to get rid of __len__
or __getitem__
, so we can instead define __array_wrap__
. It's going to be a mess and inefficient to try and convert an array of Quantity
objects back to a Quantity
containing an array, so it's easier to just re-compute it:
def __array_wrap__(self, out, context=None):
if context:
from . import dimensionless_unscaled
if context[0] in [np.sin]:
return Quantity(context[0](self.value), dimensionless_unscaled)
return self.__class__(out, unit=self._unit)
which works as expected:
In [3]: np.sin(np.array([1,2,3]) * u.degree)
Out[3]: <Quantity [ 0.84147098 0.90929743 0.14112001] >
However, this is not a good solution either because internally, it's still creating an array of Quantity
objects, which will be very expensive once we are talking about e.g. millions of values.
Another caveat of this approach is that it is not possible to pass quantity arrays to any function that would normally take Numpy arrays for calculations, so it would then make sense to also disable implicit float conversion for scalars.
Pros: never silently drop units
Cons: bad performance for large arrays due to conversion to array of quantities
The following example demonstrates the issue with the automated conversion to an array of Quantity
objects:
import numpy as np
from astropy import units as u
class Quantity(object):
__array_priority__ = 1000.
def __init__(self, value, unit):
self.unit = unit
self.value = value
def __getitem__(self, key):
return Quantity(self.value[key], unit=self.unit)
def __len__(self):
return len(self.value)
def sin(self):
return Quantity(np.sin(self.value), unit=u.dimensionless_unscaled)
q = Quantity(np.array([1,2,3]), u.degree)
print(np.sin(q))
For which the output is:
[<__main__.Quantity object at 0x102e86a10>
<__main__.Quantity object at 0x102e86a50>
<__main__.Quantity object at 0x102e86a90>]
Note that before deciding to split up the array into multiple Quantity
objects, Numpy checks whether the following attributes exist:
__array_interface__
__array_struct__
__array__
Is there a way to use __array_interface__
or __array_struct__
to get what we want?
Even simpler example/question on StackOverflow here - maybe someone will have an idea on there?
If we want to have optimal performance, we can also consider simply removing __len__
or __getitem__
(if any, __len__
would be a better one to remove).
Another solution is to have Quantity be a sub-class of ndarray (#929) - this should allow us to define the correct behavior for ufuncs without the performance hit described above, but the downside is that it isn't possible to customize the behavior for any non-ufunc function that accepts arrays - the units will be silently dropped:
In [9]: q.dot(q)
Out[9]: <Quantity 14 m2>
In [10]: np.dot(q, q)
Out[10]: 14
Another issue is that if a ufunc returns a scalar (mean, median, etc.), then __array_wrap__
does not get called, so it's impossible to add the units back to the scalar output. One solution is to separate the Quantity
class into Quantity
and QuantityArray
:
import numpy as np
class Quantity(object):
def __init__(self, value, unit):
self.value = value
self.unit = unit
def __div__(self, other):
s = Quantity(self.value / other, self.unit)
return s
def __str__(self):
return str(self.value) + ' ' + self.unit
class QuantityArray(np.ndarray):
def __new__(cls, input_array, unit=None):
obj = np.asarray(input_array).view(cls)
obj.unit = unit
return obj
def __array_finalize__(self, obj):
if obj is None:
return
self.unit = getattr(obj, 'unit', None)
def __array_wrap__(self, out_arr, context=None):
wrapped_arr = np.ndarray.__array_wrap__(self, out_arr, context)
if wrapped_arr.ndim == 0:
wrapped_arr = Quantity(float(wrapped_arr), wrapped_arr.unit)
return wrapped_arr
def __str__(self):
return np.ndarray.__str__(self) + ' ' + self.unit
a = QuantityArray(np.array([1,2,3]), unit='m')
print '-' * 50
print a.cumsum(), type(a.cumsum())
print '-' * 50
print a.mean(), type(a.mean())
print '-' * 50
which gives:
--------------------------------------------------
[1 3 6] m <class '__main__.QuantityArray'>
--------------------------------------------------
2.0 m <class '__main__.Quantity'>
--------------------------------------------------
Another solution is to explicitly define the ufuncs as methods of the Quantity
class, e.g.:
def mean(self, *args, **kwargs):
value = np.ndarray.mean(self, *args, **kwargs)
if isinstance(value, Quantity):
return value
else:
return Quantity(value, self.unit)
Pros: high performance since Quantity is an array
Cons: units might be silently dropped when passed to non-ufunc functions that support Numpy arrays.