authors: Thomas Robitaille, Perry Greenfield, Matt Craig
date-created: 2014 October 2
date-last-revised: 2014 December 17
date-accepted: 2014 December 17
type: Standard Track
This APE is intended to provide a long-term plan for the
sub-package. The package has been the subject of continuous debate since the
start of the astropy project, and has changed in scope several times, so this
APE is aimed at agreeing on the scope and future of the sub-package and
providing a well-structured foundation on which future development of
astropy.nddata can occur.
At the first Astropy coordination meeting in 2011, it was decided that as well
as having a generic table container, it would be useful to have a generic
container for gridded data. The
astropy.table package has since then seen a
large amount of development, and the API has now stabilized. The added value of
the table package compared to using simple Numpy structured arrays is clear -
Table class makes it very easy to do common operations on tables such
as adding or removing columns or rows, and reading/writing tables to common
On the other hand,
NDData development has stagnated and we have not been
able to converge on a stable API. Part of this is due to the fact that there is
in fact a huge variety of 'n-dimensional datasets' and that there is very
little in common for example between a spectrum and an image, in terms of what
can be done with them. This has prevented the
NDData class from including
much functionality, and in fact we have been adding functionality then removing
it after we have realized that it is not general enough.
An example to illustrate this issue is that of arithmetic - what happens when we add two n-dimensional datasets? Of course, the data values can be added, but what happens to the mask, to the flags, or to the uncertainties? The truth is that there is no general recipe for dealing with this, and that it may be a mistake to try and define it in such a general way.
This APE takes the approach of re-thinking the purpose of NDData and trying to define a scope for the future.
The 'why' of NDData
First, why do we actually need a generic data container that can be sub-classed? What is the benefit of this versus simply defining separate base classes for spectra, images, and other types of data? There are several possible answers:
- We want to provide users with a consistent experience across data objects -
that is, the user should know that meta-data is always consistently named
meta, that a mask can always be accessed with
mask, and that the data can be accessed with
- By providing a common base class which can define a unified I/O interface
which taps into the astropy I/O registry, we can seamlessly make it that
all data objects have
writemethods that behave consistently.
- We want functions and methods in Astropy to know if an object passed to them
is an n-dimensional data object that can be expected to have specific
attributes (such as
mask, and so on).
In principle, none of these require a base class. We could simply agree on a
standard for data objects that defines what certain attributes should be
called. This is a valid solution, but at the same time, having a base class can
enforce this and factor out some boilerplate code, and as described in
Alternatives, the questions raised here would apply to a base
class that was sub-classed as
CCDImage, and so on.
Proposal for an
NDDataBase abstract base class
The first proposal in this APE is to separate a definition of the
interface from concrete realizations of
NDData-like objects by creating an
abstract base class called
NDDataBase simplifies the current
NDData class to the extreme,
such that it only defines which properties are needed for
NDDataBase class should not define any arithmetic operations, which are
impossible to generalize.
The following properties should be included in the base class:
data- the data itself. No restrictions are placed on the type of this data in the ABC. Subclasses could, for example, make it a plain Numpy array, masked Numpy array, an Astropy Quantity, or an h5py data buffer. Subclasses could also require, for example, that
shapeattribute and/or be sliceable in order to 'prove' that it is an n-dimensional data object.
mask- the mask of the data, following the Numpy convention of True meaning masked, and False meaning unmasked. Sub-classes could choose to connect this to
data.mask. Masks do not need to be Numpy arrays, they could be for example 'lazy' masks based on functions that will be evaluated on-the-fly.
unit- the unit of the data values, which will be internally represented as an Astropy Unit. If present, subclasses should try to ensure numerical operations properly take into account and propagate units. Sub-classes could choose to connect this to
data.unit, in which case data should be a
Quantityor behave like it.
wcs- an object that can be used to describe the relationship between input and world coordinates. This can (but does not have to) be an Astropy WCS object. Once the generalized WCS system is in place in Astropy, we will probably require this to be such an object. Subclasses are free to be more restrictive in what they permit for the
meta- a dict-like object that can be used to contain arbitrary metadata. This could be a plain Python dict, an ordered dict, a FITS Header object, and so on, provided that it offers dict-like item access and iteration.
uncertainty-- an object that represents the uncertainty in the data for each element on the array. This APE places no restriction on what type of uncertainty this is (e.g. variance, standard deviation, or posisson count rate), nor does it require the attribute to be set at all (other than defaulting to
None). It places only one restriction on
uncertainty: it must have an attribute
uncertainty_type, which should be a human-readable string.
While not a requirement, the following
uncertainty_typestrings are strongly recommended for common ways of specifying normal distributions:
uncertaintystores the standard deviation/sigma (either a single value or on a per-pixel basis).
uncertaintystores the variance (either a single value or on a per-pixel basis).
uncertaintystores the inverse variance (either a single value or on a per-pixel basis).
The only required attribute is
data; all others default to
not initialized or overridden in a subclass.
Proposal for the
NDData class would be a concrete class that is far less restrictive
than the current
NDData while still providing some basic functionality as
a container. It will place loose restrictions on what
data can be set to
and is described further in Implementation.
NDData class would not include methods such as
__array_prepare__, and so on which allow a class to be treated as a Numpy
array. This behavior has been identified as being potentially ambiguous in
the general case because it will depend on the details of e.g. how masks are
handled and also does not make it explicit in what units the data is required.
Proposal for mixin classes to provide additional functionality
Specific functionality such as uncertainty handling and arithmetic can be
developed as mix-in classes that can be used by
Generic slicing capabilities, further described in Implementation, will be
provided as a mixin class called
NDData in Astropy and affiliated packages
If a user has a data object such as an image, it would be nice if they can use
functions directly on this image and have them return an image object. At the
same time, we do not want to force people to use special data containers if
they have for example a Numpy array and a WCS object. This raises the question
of whether we should duplicate the API for all functions, to provide one
NDData subclasses, and one for separate attributes. The
proposal in this APE is that functions should only define a single API that
takes separate keyword arguments for e.g.
mask, and so on, but
that we then provide a way for users to be able to call these functions with
NDData sub-classes (see Implementation).
NDDataBase will be implemented as an abstract base class. The only input
validation it will provide is enforcing the existence of an
uncertainty_type attribute if
uncertainty is not
None, as described
We will not include setters for properties except
because it is ambiguous what the meaning of setting e.g. the unit or WCS after
initialization means: it could either mean to change the unit or WCS, or it
could mean that the user wants to convert the data to this new unit or WCS.
Given this ambiguity, it is safer to not have setters for the core attributes
and this is consistent with e.g.
NDData will be a concrete subclass of
NDDataBase that provides some logic for handling the setting of
- If the object passed in as
shapeattribute, is sliceable, and has an
__array__method, so that it can be easily used as a numpy array, then
NDData.datawill be set to that object.
NDDatawill attempt to create a
numpy.ndarrayfrom the input
dataand use that as the internal representation of the data.
write methods will be developed via a mixin class.
This APE suggests adding a mixin class,
NDSlicing, to handle basic
slicing. This could be done by simply having code similar to the following
def __getitem__(self, slice): new = self.__class__() if self.data is not None: new._data = self.data[slice] if self.mask is not None: new._mask = self.mask[slice] if self.wcs is not None: new._wcs = self.wcs[slice] ...
Note that this is only meant as an illustration of the idea suggested here,
and the final implementation will likely differ from this - but the basic
idea is that the slicing would be delegated to the member attributes. For
example, the WCS class would need to define itself how it should be sliced.
Some attributes (such as
meta) would not necessarily need to be sliceable.
Note the effect of slicing on attributes presumably returns a similar object, e.g., for wcs, it returns a new WCS appropriate to the sliced data attribute.
The arithmetic methods currently in
NDData will be implemented in a mixin
Facilitating the use of
One question that has come up as part of several affiliated packages is how
to deal with
NDData objects in functions. For example, if we consider a
downsample function that can downsample an image, should the function
NDData (or sub-class) objects? Should it also
accept plain Numpy arrays? If so, how do we pass any additional meta-data
such as WCS? Should we return a downsampled Numpy array and downsampled WCS,
or a single downsampled
NDData (or sub-class) instance? In this example, one option would
be to provide two APIs, one for
NDData and/or sub-classes and one for separate Numpy arrays
and attributes, but maintaining two parallel APIs is not an ideal solution.
An alternative is for each function to encode the logic of checking the input
type and deciding on the output type based on the output type. However, this
means repeating a lot of similar code such as:
def downsample(data, wcs=None) if isinstance(data, NDData): if wcs is not None: raise ValueError("wcs cannot be specified if NDData instance was passed") wcs = data.wcs data = data.data
and this will become a lot more complex once more attributes are needed by the function.
In order to make it easier for functions to accept
NDData sub-classes and
return these, we can implement a decorator that will automatically split up an
NDData object as needed. Let us consider the following function:
def test(data, wcs=None, unit=None, n_iterations=3): ...
We can provide a decorator called e.g.
@support_nddata def test(data, wcs=None, unit=None, n_iterations=3): ...
which makes it so that if the user passes an
NDData sub-class called e.g.
nd, the function would automatically be called with:
test(nd.data, wcs=nd.wcs, unit=nd.unit)
That is, the decorator looks at the signature of the function and checks if any
of the arguments are also properties of the
NDData object, and passes them
as individual arguments.
An error could be raised if an
NDData property is set but the function does
not accept it - for example, if
wcs is set, but the function cannot support
WCS objects, an error would be raised. On the other hand, if an argument in the
function does not exist in the
NDData object or is not set, it is simply
left to its default value. This behavior could be customizable but the
details are beyond the scope of this APE document.
If the function call succeeds, then the decorator will make a new
object (with the correct class) and will populate the properties as needed. In
order to figure out what is returned by the function, the decorator will need
to accept a list which gives the name of the output values:
@support_nddata(returns=['data', 'wcs']) def test(data, wcs=None, unit=None, n_iterations=3): ...
Finally, the decorator could be made to restrict input to specific
sub-classes (and sub-classes of those):
@support_nddata(accepts=CCDImage, returns=['data', 'wcs']) def test(data, wcs=None, unit=None, n_iterations=3): ...
With this decorator, the functions could be seamlessly used either with
separate arguments (e.g. Numpy array and WCS) or with subclasses of
NDData such as
Example of subclassing from both NDData and Quantity
Quantity class would benefit from the ability to share the same
interface that NDData provides and to tap into NDData's metadata and WCS
Because subclassing from
numpy.ndarray involves subtleties that differ
from typical subclassing in python, an example subclass called
may be implemented as part of
astropy.nddata. Should it turn out to be
unreasonably difficult to do, an attempt may be made to implement a class
Quantity as the data store, with the ability to link
unit to the underlying properties in
that also prove to be unworkable, an explanation of the issues that prevented
implementation may be provided in the documentation for
Branches and pull requests
Initial decorator implementation: https://github.com/astropy/astropy/pull/2855
Initial refactoring of NDData: https://github.com/astropy/astropy/pull/2905
This APE will require packages such as
completely refactor how they use the
NDData class. This will also break
compatibility with users currently using
NDData directly, but this is
assumed to be a very small fraction (if any) of users.
One alternative is to remove the
NDData class altogether and to start
the base classes at the level of
Image. In this case many
of this ideas of this APE (including the attribute names, decorators, etc.)
would still apply to these base classes. The benefits of having a base
NDData class instead of starting at the
NDDataclass enforces the naming of the base properties to ensure consistency across all sub-classes.
- It allows slicing to be implemented at the core level as a mixin, whereas
this would need to be repeated in each base class if we had e.g.
SpectralCubeas the base classes.
- It allows the connection to the unified I/O framework to be defined once, whereas this would also need to be repeated in each base class otherwise.
On the other hand, the downsides of having a core
NDData class is that it
reduces flexibility of the sub-classes - for instance
Spectrum has to be
implemented taking into consideration the restrictions on e.g. attribute
names defined by the sub-classes. In the
spectral-cube package, at the moment
we do not have a
data attribute because we have a custom masking
framework and define attributes like
unmasked_data. Of course, we should
aim to make this more compliant with what is decided here, but this is just
to demonstrate that this type of flexibility may be lost. However, this may
be a good thing as it enforces consistency for users.
Subclass NDData from
The original implementation of the
NDData class behaved like a numpy
ndarray; an alternative to making
NDData a more generic container is
to make it a full-fledged subclass of
ndarray or of
advantage of this approach is that it potentially reduces duplication of code
by using the infrastructure of
It has the disadvantage of reducing the flexibility of
NDData and presents
the challenge of handling the attributes (especially
wcs) in a sensible way for arbitrary operations on an
NDData. Even in
one of the most straightforward cases, the addition of two
with metadata, it is unclear what the
meta of the result should be.
There is a need for a more generic container with metadata than would be
possible if subclassing from
ndarray. In addition, it would be
straightforward to implement a subclass of the
NDData proposed in this APE
that ties the
unit and (when they are available in
uncertainty to those properties of the
data attribute. In other
words, a subclass which is essentially a
meta wrapped in
NDData interface is straightforward.
NDData subclasses from
ndarray then it will be difficult or
impossible to subclass a more generic container from it, which is likely to
lead, down the road, to the need for the type of generic container proposed in
This APE led to lengthy discussion both on the mailing list and in the astropy-APEs Pull Request. This produced some major changes to the APE, leading to the current form. A final vote was held, with unanimous support. Hence this APE was accepted 2014 December 17.