Skip to content
Thomas Robitaille edited this page Jul 3, 2013 · 11 revisions

About

This page summarizes different approaches that have been attempted in order to have proper support for arrays in the Quantity class. A set of 'ideal' requirements (which may not all be met) is:

  • The quantity class with arrays should behave as for scalars when combining with scalars or other arrays.
  • Numpy ufuncs should return the correct units
  • Warnings should be raised if the units are dropped

What is wrong with the current implementation?

At this time, there is an issue with the current implementation in master (#899):

In [2]: (3. * u.m) * np.array([1,2,3])
Out[2]: <Quantity [ 3.  6.  9.] m>

In [3]: np.array([1,2,3]) * (3. * u.m)
Out[3]: array([ 3.,  6.,  9.])

This would normally be fixed by adding the __array_priority__ attribute for Quantity, but this does not work due to the fact that __array_priority__ gets ignored if __array__ is defined (numpy/numpy#3164) - this is how things are defined in Numpy, and I don't think we can rely on this being fixed (even if it was fixed in 1.8, things wouldn't work well for most users). This is the only real bug with the current implementation.

The second main issue is that at the moment we can't use e.g. Numpy ufuncs and get Quantities with the right units out. For example, we might want to compute np.sqrt(np.array([1,2,3]) * u.m**2) but at the moment the units will simply be dropped.

What solutions exist?

Remove __array__

Simply removing the __array__ method is one solution (#1139). If we do this, then we can still get ufuncs to work because it turns out that e.g. np.sin will look for a Quantity.sin method:

In [6]: np.sin(q)
ERROR: AttributeError: sin [IPython.core.interactiveshell]
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-784b22e8171a> in <module>()
----> 1 np.sin(q)

AttributeError: sin

Implementing it:

def sin(self):
    return Quantity(np.sin(self.value), unit=Unit(1))

gives:

In [5]: np.sin(np.array([1,2,3]) * u.degree)
Out[5]: 
array([<Quantity 0.841470984808 >, <Quantity 0.909297426826 >,
       <Quantity 0.14112000806 >], dtype=object)

which is not quite what we want, because we want an array of Quantity object. Why is it returning an array of Quantity objects when we asked for a Quantity with an array inside? Because it turns out that if the Quantity class defines both __len__ and __getitem__, then Numpy will first extract each element of the array (as a scalar quantity) using __getitem__, then call Quantity.sin for each Quantity, and will store the output in an object array. If we remove either __len__ or __getitem__, we get what we expect:

In [3]: np.sin(np.array([1,2,3]) * u.degree)
Out[3]: <Quantity [ 0.84147098  0.90929743  0.14112001] >

Of course, we probably don't want to get rid of __len__ or __getitem__, so we can instead define __array_wrap__. It's going to be a mess and inefficient to try and convert an array of Quantity objects back to a Quantity containing an array, so it's easier to just re-compute it:

def __array_wrap__(self, out, context=None):
    if context:
        from . import dimensionless_unscaled
        if context[0] in [np.sin]:
            return Quantity(context[0](self.value), dimensionless_unscaled)
    return self.__class__(out, unit=self._unit)

which works as expected:

In [3]: np.sin(np.array([1,2,3]) * u.degree)
Out[3]: <Quantity [ 0.84147098  0.90929743  0.14112001] >

However, this is not a good solution either because internally, it's still creating an array of Quantity objects, which will be very expensive once we are talking about e.g. millions of values.

Another caveat of this approach is that it is not possible to pass quantity arrays to any function that would normally take Numpy arrays for calculations, so it would then make sense to also disable implicit float conversion for scalars.

Pros: never silently drop units

Cons: bad performance for large arrays due to conversion to array of quantities

The following example demonstrates the issue with the automated conversion to an array of Quantity objects:

import numpy as np
from astropy import units as u

class Quantity(object):

    __array_priority__ = 1000.

    def __init__(self, value, unit):
        self.unit = unit
        self.value = value

    def __getitem__(self, key):
        return Quantity(self.value[key], unit=self.unit)

    def __len__(self):
        return len(self.value)

    def sin(self):
        return Quantity(np.sin(self.value), unit=u.dimensionless_unscaled)

q = Quantity(np.array([1,2,3]), u.degree)
print(np.sin(q))

For which the output is:

[<__main__.Quantity object at 0x102e86a10>
 <__main__.Quantity object at 0x102e86a50>
 <__main__.Quantity object at 0x102e86a90>]

Note that before deciding to split up the array into multiple Quantity objects, Numpy checks whether the following attributes exist:

__array_interface__
__array_struct__
__array__

Is there a way to use __array_interface__ or __array_struct__ to get what we want?

Even simpler example/question on StackOverflow here - maybe someone will have an idea on there?

If we want to have optimal performance, we can also consider simply removing __len__ or __getitem__ (if any, __len__ would be a better one to remove).

Sub-classing ndarray

Another solution is to have Quantity be a sub-class of ndarray (#929) - this should allow us to define the correct behavior for ufuncs without the performance hit described above, but the downside is that it isn't possible to customize the behavior for any non-ufunc function that accepts arrays - the units will be silently dropped:

In [9]: q.dot(q)
Out[9]: <Quantity 14 m2>

In [10]: np.dot(q, q)
Out[10]: 14

Another issue is that if a ufunc returns a scalar (mean, median, etc.), then __array_wrap__ does not get called, so it's impossible to add the units back to the scalar output. One solution is to separate the Quantity class into Quantity and QuantityArray:

import numpy as np


class Quantity(object):
    def __init__(self, value, unit):
        self.value = value
        self.unit = unit
    def __div__(self, other):
        s = Quantity(self.value / other, self.unit)
        return s
    def __str__(self):
        return str(self.value) + ' ' + self.unit


class QuantityArray(np.ndarray):

    def __new__(cls, input_array, unit=None):
        obj = np.asarray(input_array).view(cls)
        obj.unit = unit
        return obj

    def __array_finalize__(self, obj):
        if obj is None:
            return
        self.unit = getattr(obj, 'unit', None)

    def __array_wrap__(self, out_arr, context=None):
        wrapped_arr = np.ndarray.__array_wrap__(self, out_arr, context)
        if wrapped_arr.ndim == 0:
            wrapped_arr = Quantity(float(wrapped_arr), wrapped_arr.unit)
        return wrapped_arr

    def __str__(self):
        return np.ndarray.__str__(self) + ' ' + self.unit


a = QuantityArray(np.array([1,2,3]), unit='m')
print '-' * 50
print a.cumsum(), type(a.cumsum())
print '-' * 50
print a.mean(), type(a.mean())
print '-' * 50

which gives:

--------------------------------------------------
[1 3 6] m <class '__main__.QuantityArray'>
--------------------------------------------------
2.0 m <class '__main__.Quantity'>
--------------------------------------------------

Another solution is to explicitly define the ufuncs as methods of the Quantity class, e.g.:

def mean(self, *args, **kwargs):
    value = np.ndarray.mean(self, *args, **kwargs)
    if isinstance(value, Quantity):
        return value
    else:
        return Quantity(value, self.unit)

Pros: high performance since Quantity is an array

Cons: units might be silently dropped when passed to non-ufunc functions that support Numpy arrays.

Clone this wiki locally