New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NDArithmeticMixin and NDUncertainty Refactor #4272

Merged
merged 12 commits into from Jan 30, 2016

Conversation

Projects
None yet
9 participants
@MSeifert04
Contributor

MSeifert04 commented Oct 25, 2015

Initial try of splitting the large #4234. More information will follow when the splitting is done.

edit 12/22/2015: added checklist

Next steps:

  • Rework metadata arithmetic so that it does not enforce any specific methods of combining the metadata -- that works by calling methods on the metadata but that returns None with a warning if the methods don't exist. _arithmetic_meta
  • Make thin wrapper class around a meta if it doesn't have any arithmetic operations.
  • Then, if that looks promising:*
  • Add wrapper classes for other properties.
@MSeifert04

This comment has been minimized.

Contributor

MSeifert04 commented Oct 25, 2015

Short Summary

  • UnknownUncertainty is now the default uncertainty for NDData-like classes.
  • NDUncertainty has now adopted the internal mechanics of StdDevUncertainty.
  • StdDevUncertainty now propagates uncertainties always correct and can use a given correlation
  • NDArithmeticMixin has classmethods that allow arithmetic operations where the first element is not a NDData-like class but the result should be.
  • NDArithmeticMixin has now different methods that operate on the attributes. Also different ways of handling the meta in arithmetic operations are now possible.

I think I forgot to mention some further changes and I hope this is not to much change for you to give me some comments.

@MSeifert04

This comment has been minimized.

Contributor

MSeifert04 commented Oct 25, 2015

One issue I found about the Uncertainty refactor was how to organize the classes. Since NDUncertainty is a MetaClass but it does everything except the actual propagation. As an alternative I could make NDUncertainty like NDDataBase to a class that only implements an interface (UnknownUncertainty would then be like a Minimal working implementation of it) and make the mechanics (like __init__ and propagation abstractmethods) a Mixin with StdDevUncertainty being a working implementation of it. I don't know what would be better.

I hope you understand what I mean, if not I could wrap this idea in code (just superficially). But maybe I'm thinking too much about it and it is fine just like it is.

function that is used by the name of the operation better than by
the function itself. Suppose we have ``data`` that is a
`~numpy.matrix` or ``Pandas.DataFrame`` we need to use different
UFUNCS (maybe) and that explodes the number of if/elif checks in

This comment has been minimized.

@mwcraig

mwcraig Oct 31, 2015

Contributor

I think this (need to use different UFUNCS) isn't actually an issue per http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe-interoperability-with-numpy-functions

Even if it were an issue I think it would be better to make operation a function that defaults to the appropriate numpy ufunc because then the user can substitute a different, more appropriate, ufunc without needing to subclass or add a special case via a PR.

This comment has been minimized.

@MSeifert04

MSeifert04 Oct 31, 2015

Contributor

Just to get this straight before I change it:
add should provide the numpy ufunc (np.add) and pass it to _arithmetic as operation? So that another ufunc could be provided by calling _arithmetic directly?
Another thing: How should I evaluate the function in _arithmetics_meta: compare the function name or simply apply the operation-parameter?
For uncertainty_propagation I'll need to compare the operation in a way, should I compare it like operation is np.add or by operation.__name__ == 'add'?

I'm not sure about what would be best, that was one another reason why I thought string comparisons were the best way to go.

This comment has been minimized.

@mwcraig

mwcraig Nov 2, 2015

Contributor

add should provide the numpy ufunc (np.add) and pass it to _arithmetic as operation? So that another ufunc could be provided by calling _arithmetic directly?

Mostly, yes. I think what I said before about substituting a different function without subclassing doesn't actually work, but a subclass that wanted to use a different addition could do so by just overriding add without needing to change anything in _arithmetic.

The other advantage of making this change is that you eliminate the need for the duplicate logic in _arithmetic_data and _arithmetic_meta

This comment has been minimized.

@mwcraig

mwcraig Nov 2, 2015

Contributor

Another thing: How should I evaluate the function in _arithmetics_meta: compare the function name or simply apply the operation-parameter?

I would just apply the parameter.

This comment has been minimized.

@mwcraig

mwcraig Nov 2, 2015

Contributor

For uncertainty_propagation I'll need to compare the operation in a way, should I compare it like operation is np.add or by operation.__name__ == 'add'?

Good question...looking over the code, I'd lean towards using operation.__name__. What about an approach like that in the current implementation [1]? That adds an argument to _arithmetic that just gets passed through to the uncertainty call but that maybe isn't so bad.

if isinstance(self.wcs, WCS) and isinstance(operand.wcs, WCS):
log.info("Astropy.WCS does not allow equality checks.")
else:
log.info("WCS is not equal.")

This comment has been minimized.

@mwcraig

mwcraig Oct 31, 2015

Contributor

Given that the user can choose to compare WCS or not, I think an exception be raised here instead of warning if the two don't match.

This comment has been minimized.

@MSeifert04

MSeifert04 Oct 31, 2015

Contributor

in both cases or just when it's not an astropy.wcs.WCS object?

return self._arithmetic(
operand, propagate_uncertainties, "subtraction", np.subtract)
# Both have WCS information. If requested compare the WCS and
# issue a warning if they are not equal

This comment has been minimized.

@mwcraig

mwcraig Oct 31, 2015

Contributor

In the current version of NDArithmeticMixin there is a check self.wcs != operand.wcs. DO you know if this fails if the two WCS are the same?

This comment has been minimized.

@wkerzendorf

wkerzendorf Dec 4, 2015

Member

here similar to my comments above: overwrite __ne__.

operand, propagate_uncertainties, "subtraction", np.subtract)
# Both have WCS information. If requested compare the WCS and
# issue a warning if they are not equal
# TODO: Currently (astropy 1.04) this is a bit tricky because no

This comment has been minimized.

@mwcraig

mwcraig Oct 31, 2015

Contributor

Would #4085 address this when fixed? Not suggesting that you try to fix that one...but if comparison does not work properly, and the issue is distinct from #4085, could you please open a new issue with a short code snippet that will reproduce the issue? I gather the problem is that two identical WCS objects (identical in the sense that one is a deepcopy of the other) do not compare as the same?

This comment has been minimized.

@MSeifert04

MSeifert04 Oct 31, 2015

Contributor

Yes, astropy.wcs.WCS does not implement a __eq__ method so python defaults to check if the wcs1 is wcs2 is True. I'll supply an example later or tomorrow.

This comment has been minimized.

@wkerzendorf

wkerzendorf Dec 4, 2015

Member

yep - that's what I mean. It should for our case. Maybe make an super thin subclass can accomplish that (and later gwcs can implement that natively`)

# that is a scalar... (ok, at least if we don't want to allow
# the meta to contain numpy arrays :-) )
if operand.data.size != 1:
log.info("The second operand is not a scalar. Cannot "

This comment has been minimized.

@mwcraig

mwcraig Oct 31, 2015

Contributor

I think this should raise an exception instead of logging an information message.

# Same thing but reversed
if self.data.size != 1:
log.info("The first operand is not a scalar. Cannot"
"operate meta keywords with arrays.")

This comment has been minimized.

@mwcraig

mwcraig Oct 31, 2015

Contributor

Same comment as above.

(np.array(5) * u.Mpc, np.array(10)),
(np.array(5), np.array(10) * u.s),
])
def test_arithmetics_data_unit_NOT_identical(data1, data2):

This comment has been minimized.

@mwcraig

mwcraig Oct 31, 2015

Contributor

nitpick: could you please lower-case the function name?

This comment has been minimized.

@wkerzendorf

wkerzendorf Dec 4, 2015

Member

PEP8 syntax requirement not nitpick 😉

else:
# Make sure these masks are boolean or ndarrays of bools so this
# kind of mask operation is possible.
if self.mask is not None:

This comment has been minimized.

@mwcraig

mwcraig Oct 31, 2015

Contributor

I think the checks below are too restrictive and are unnecessary...why not simply do the mask operation self.mask | operand.mask? If it fails an exception will presumably be raised...if the desire is to catch the exception that is raised if the operation fails then it could be put inside a try/except.

This comment has been minimized.

@MSeifert04

MSeifert04 Oct 31, 2015

Contributor

Unfortunatly the | operator works on most (string arrays raise an exception but it works on every numerical array) kinds of numpy arrays but the result is sometimes quite strange.

import numpy as np
a = np.array([1,0,1])
b = np.array([1,0,0])
a | b
# array([1, 0, 1], dtype=int32) this is what is happening to booleans too
b = np.array([1,0,2])
a | b
# array([1, 0, 3], dtype=int32) not what is expected...

This comment has been minimized.

@mwcraig

mwcraig Nov 2, 2015

Contributor

what about doing this: np.array(a | b, dtype='bool')?

Then I get a bool even with integer arrays:

import numpy as np
a = np.array([1,0,1])
b = np.array([1,0,2])
np.array(a | b, dtype='bool')
# array([ True, False,  True], dtype=bool)

This comment has been minimized.

@wkerzendorf

wkerzendorf Dec 4, 2015

Member

This again shows that I believe the individual components of NDData need to decide if they can be added/subtracted/multiplied/divided. This would only mean that one moves a lot of the logic into light classes or the existing classes. another thing shouldn't it be and instead of or.

@mwcraig

This comment has been minimized.

Contributor

mwcraig commented Oct 31, 2015

@MSeifert04 -- lots of inline comments, I still want to try running the tests locally and see if it breaks anything in ccdproc :). Hoping to do that today.

@MSeifert04

This comment has been minimized.

Contributor

MSeifert04 commented Oct 31, 2015

@mwcraig Thanks for the comments, I've adressed every obvious comment but left replies where I wasn't sure how to proceed.

If you need any help with ccdproc I can help if you want.

@mwcraig

This comment has been minimized.

Contributor

mwcraig commented Nov 2, 2015

@jehturner -- would you mind looking over this? It would impact NDDataArithmetic. While comments on the code itself are certainly welcome, it would be useful if you could try this out with what you are developing with NDDataArithmetic and report back whether it breaks anything.

@jehturner

This comment has been minimized.

Member

jehturner commented Nov 3, 2015

OK. Any sane changes to the arithmetic are unlikely to break things for me, as I'm not systematically using that functionality yet (though obviously that's the intention). Most of the calculations are done by the external processes I'm initially wrapping and then gradually replacing, something I'm just completing the outline of. I can certainly do a quick check but it might take a few days as I'm juggling 2 deadlines. Anyway, thanks for asking and thanks to Michael for improving the code.

@jehturner

This comment has been minimized.

Member

jehturner commented Nov 25, 2015

I confirm that this does not break my code (in a couple of dozen quick tests). Sorry it has taken so long to comment (has it really been 3 weeks?). I haven't yet looked closely at what these changes actually do but I'm not noticing any unexpected problems.

@embray

This comment has been minimized.

Member

embray commented Nov 25, 2015

Thanks for checking it out @jehturner --the fact that it works for you is encouraging.

I hope to review these changes soon myself.

@MSeifert04

This comment has been minimized.

Contributor

MSeifert04 commented Dec 4, 2015

@mwcraig @embray Should I incorporate @mwcraig suggestions or still wait a bit longer?

@jehturner Thanks for the tests.

@mwcraig

This comment has been minimized.

Contributor

mwcraig commented Dec 4, 2015

@MSeifert04 -- hold off for a couple more days, hoping to get comments form @wkerzendorf (he was traveling for most of November).

@@ -23,212 +168,638 @@ class NDArithmeticMixin(object):
When subclassing, be sure to list the superclasses in the correct order

This comment has been minimized.

@wkerzendorf

wkerzendorf Dec 4, 2015

Member

Is there a way to automate or force the correct order? @embray you know a lot of python magic.

This comment has been minimized.

@mhvk

mhvk Dec 4, 2015

Contributor

It is probably worth checking what makes it required to have a certain order. Often, one can write things such that order does not matter (but certainly not always).

This comment has been minimized.

@mwcraig

mwcraig Dec 5, 2015

Contributor

The MRO dictates the order - a helpful reference is http://nedbatchelder.com/blog/201210/multiple_inheritance_is_hard.html and the comments underneath.

If the mixin overrides a method in NDDataBase you want the mixin listed before the base class.

It might be better to simply point out that order matters and provide a reference....

This class only aims at covering the most common cases so there are certain
restrictions on the saved attributes::
- ``uncertainty`` : has to be something that has a `NDUncertainty`-like

This comment has been minimized.

@wkerzendorf

wkerzendorf Dec 4, 2015

Member

Correct me if I'm wrong - I believe that the idea was that it is an arbitrary object that has an attribute .uncertainty_type

This comment has been minimized.

@MSeifert04

MSeifert04 Dec 17, 2015

Contributor

That was actually shortsighted of me, with the current implementation one only needed to pass the right propagate method for the uncertainty object. Probably I should switch back to the propagate-method parameter. But since there is no actual API what parameters should be passed to the propagate method makes it really hard to cover all cases.

Notes
-----
1. It is not tried to decompose the units, mainly due to the internal

This comment has been minimized.

@wkerzendorf

wkerzendorf Dec 4, 2015

Member

I wonder if we need to note this as this is the standard (and sensible behaviour) - see km/s / Mpc should not be Hz 😉

This comment has been minimized.

@MSeifert04

MSeifert04 Dec 17, 2015

Contributor

Well sometimes it makes sense, sometimes one would like to decompose the units. So I thought I should mention it somewhere.

>>> from astropy.nddata import *
>>> class NDDataWithMath(NDArithmeticMixin, NDData): pass
>>> nd = NDDataWithMath([1,2,3], unit='meter')
>>> nd_inv = nd.ic_division(1, nd)

This comment has been minimized.

@wkerzendorf

wkerzendorf Dec 4, 2015

Member

Why can't we just do nd_inv = 1 / nd?

This comment has been minimized.

@MSeifert04

MSeifert04 Dec 17, 2015

Contributor

Because someone in APE said, that for NDData the magic mathematical methods should not be used. And without them it's getting really hard (I can't think of a way) to simulate __radd__. If __add__ for the first element isn't possible.

This comment has been minimized.

@mwcraig

mwcraig Jan 21, 2016

Contributor

IIRC an issue that comes up is if you try to add an NDData to a Quantity. Both override __add__ and __radd_ and the result ends up being of the type of whatever the first operand is. I think I was the one who raised this in the APE and it could be that I'm wrong...but overriding the arithmetic operators should be a separate PR (if it is ever done).

# operations with numbers, lists, numpy arrays, numpy masked arrays
# astropy quantities, masked quantities and of other subclasses of
# NDData
operand = self.__class__(operand)

This comment has been minimized.

@wkerzendorf

wkerzendorf Dec 4, 2015

Member

this is nice.

# enforces that the dtype is NOT objects)?
kwargs = {}
# First check that the WCS allows the arithmetic operation

This comment has been minimized.

@wkerzendorf

wkerzendorf Dec 4, 2015

Member

I would attach this to the wcs object. In the end, NDData just makes the operation wcs1 + wcs2 and the either wcs object will complain if that doesn't work. you only need to check if both of them are None which is a special case.

# quantity)
result = self._arithmetic_data(operation, operand, **kwds)
# Determine the other properties
if propagate_uncertainties is None:

This comment has been minimized.

@wkerzendorf

wkerzendorf Dec 4, 2015

Member

There is an NDUncertainty class that in the end was thought to be a mixin. @mhvk how far along is that?

This comment has been minimized.

@mhvk

mhvk Dec 4, 2015

Contributor

@wkerzendorf - My Variable class (#3715) is quite complete and does track uncertainties properly for all numpy ufunc. I've not worked on it further since I posted it, however, since there really should be a bit of general thought in when to stop tracking correlations.

else:
kwargs['uncertainty'] = self._arithmetic_uncertainty(
operation, operand, result, uncertainty_correlation, **kwds)
if handle_mask is None:

This comment has been minimized.

@wkerzendorf

wkerzendorf Dec 4, 2015

Member

Maybe this should be part of the mask as well.

if self.unit is None and operand.unit is None:
if result_unit is dimensionless_unscaled:
result_unit = None
if handle_meta is None:

This comment has been minimized.

@wkerzendorf

wkerzendorf Dec 4, 2015

Member

I think similar to wcs this maybe should be part of the metaclass - you just do the arithmetic on the meta class and it will know what to do. This would then also be the same for tables, which should show the same behaviour.

This comment has been minimized.

@MSeifert04

MSeifert04 Dec 17, 2015

Contributor

I never worked with metaclasses yet and what I've read you need to define the metaclass as superclass (at least I heard so for python 3.+). Also we allow everything that is a subclass from dict, how would we get the metaclass in there without breaking something. Or do you mean a wrapper like NDUncertainty?

Also I think that the container should define the way the properties are handled/propagated since the context might be different? Adding two dicts normally means something completly different from my implementation here.

@MSeifert04 MSeifert04 force-pushed the MSeifert04:ndarithmeticmixin_nduncertainty branch from 988f796 to 7096103 Jan 27, 2016

@MSeifert04

This comment has been minimized.

Contributor

MSeifert04 commented Jan 27, 2016

@jehturner - Sorry, I haven't seen your comments. I just used the uncertainty_correlation as given in the error propagation formulas (I don't recall which book I had them from but they are also given here: https://en.wikipedia.org/wiki/Propagation_of_uncertainty#Example_formulas). This is just a shortcut if one really tries to use some known correlation. Thank you for pointing it out!
Implementing some covariance handling by itself is a bit beyond my capabilities. But in some cases I know the uncertainty (because I divide an image A by the pixel-wise mean of Image A, B, C) and want to have the right final uncertainty (including the correlation effect). Not sure if anyone else wants to use it though. :)

@mwcraig - I rebased and deleted the provisional code parts. I hope I have time the next days to make an accurate Changelog entry.

@MSeifert04

This comment has been minimized.

Contributor

MSeifert04 commented Jan 27, 2016

I've added a small changelog entry. I think it covers everything related to public methods.

@eteq

This comment has been minimized.

Member

eteq commented Jan 28, 2016

@MSeifert04 @mwcraig and all the reviewers - Lots of great work here, +💯 !

I have one concern, though, and that's the lack of (narrative) documentation and examples, particularly with the correlation stuff (which is notoriously tricky). I see that #4538 was created as a reminder to do that. But, in my experience, if something gets merged with a separate "do the documentation" issue, often the documentation doesn't get written until it's far too late (i.e., someone mis-used it or gave up on it because they didn't understand it).

That said, I understand you might want to get it in soon for other changes. So @mwcraig, I'm fine with you making a command decision about whether my concern here is warranted or if you're confident you and/or @MSeifert04 will get the docs in for this in a reasonably timely fashion before the key points have escaped your heads.

@eteq

This comment has been minimized.

Member

eteq commented Jan 28, 2016

Oh one other thing that's more informational than anything else: I think we'll also want to try to at least get a start on something with Quantity and coorelated uncertainties for v1.2 - maybe @mhvk's #3715, or maybe something simpler that then #3715 ends up building on. We will certainly want to try to figure out how to harmonize that with nddata's uncertainties... But I think it's probably better to let them operate seperately at least initially so that we have APIs that make sense for those objects, and then later on try to hook them together better.

@mhvk

This comment has been minimized.

Contributor

mhvk commented Jan 28, 2016

Agreed to let Quantity and NDData proceed separately; my notes here were really about making people aware of #3715, since it does correlations perfectly (though the resulting derivative matrices can get rather out of hand -- which is what needs thought).

@mwcraig

This comment has been minimized.

Contributor

mwcraig commented Jan 30, 2016

@eteq -- I'm going to go ahead and merge this; @MSeifert04 has an outline in an ipython notebook in #4538.

Agreed it would have been better to flesh out the docs at the same time, but this has lingered long enough I'd like to get it closed.

I'll work with @MSeifert04 to divide up the work on the docs with a target date of a week or two.

mwcraig added a commit that referenced this pull request Jan 30, 2016

Merge pull request #4272 from MSeifert04/ndarithmeticmixin_nduncertainty
NDArithmeticMixin and NDUncertainty Refactor

@mwcraig mwcraig merged commit b49d1ad into astropy:master Jan 30, 2016

@MSeifert04 MSeifert04 deleted the MSeifert04:ndarithmeticmixin_nduncertainty branch Feb 28, 2016

@MSeifert04 MSeifert04 referenced this pull request Mar 5, 2016

Closed

Added an unknown uncertainty #3714

1 of 3 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment