Moment: handle all elements being masked#5339
Conversation
|
@jcrist do you have a moment to look through this? I've never really touched masked arrays before. |
| denominator = np.nan | ||
| else: | ||
| elif denominator is not np.ma.masked: | ||
| denominator[denominator < 0] = np.nan |
There was a problem hiding this comment.
I wonder if we should set only the unmasked values to np.nan? (just guessing here)
| dfunc = getattr(da, reduction) | ||
| func = getattr(np, reduction) | ||
|
|
||
| assert_eq_ma(dfunc(dx), func(x)) |
dask/array/ufunc.py
Outdated
| log1p = ufunc(np.log1p) | ||
| expm1 = ufunc(np.expm1) | ||
| sqrt = ufunc(np.sqrt) | ||
| sqrt = ufunc(np.ma.sqrt) |
There was a problem hiding this comment.
Yeah, my guess is that we probably don't want to elevate np.ma like this. It's such an uncommon case and I would expect that this would cause problems in some other workload (but again, I'm just guessing here)
There was a problem hiding this comment.
Agreed, I also think this is a bad idea, but don't know what else to do.
We sort of want a ufunc like:
def sqrt(x):
return np.ma.sqrt(x) if isinstance(x, np.ma.masked_array) else np.sqrt(x)Or perhaps in std, we could switch which sqrt is used based on the type of _meta? (That seems to sometimes, but not always, reflect whether the array is going to be masked or not.)
|
Also, I wonder if @bjlittle could recommend someone to help review this |
|
cc'ing also @DPeterK in case he knows someone comfortable with reviewing masked array code. |
|
Hows this going @gjoseph92? Is there anything in particular that others like @bjlittle or @DPeterK can help out with? |
|
@TomAugspurger , this is currently blocked on people able to review masked array code. It seems like we no longer have anyone qualified who is able to review these things. I wonder if we should stop supporting masked arrays. @jacobtomlinson do you know if this is still used within UK Met? If so, do you know anyone there who we could lean on to help? |
|
AFAIK it is and I think @DPeterK is the right person to ping, he's just on vacation right now. |
|
@TomAugspurger I've been out on vacation myself, so just catching up on this. The core issue is that dask's data model for masked arrays doesn't quite line up with NumPy's (and frankly, kind of abuses dask's)—NumPy uses a separate subclass, while dask holds them in a plain dask array, and just knows that the functions in the graph will produce masked arrays. I don't believe there's a definitive way to tell if a dask array is masked or not without actually computing it. I think Besides making a separate dask MaskedArray subclass (lot of effort), we could at least expose dask versions of all the We could also make the dask ufuncs automatically switch between masked and non-masked versions on the type of the input array (like I mentioned here). But that seems bad, because users can't control it, and it changes current behavior out from under them. Or, to get this merged, I can just remove the test for |
|
This appears to be stalled. @jcrist you're probably the best equiped person to handle this, but I can understand that it might not be high in your priority list. I thought I'd make you aware of it in case you have some extra time. @gjoseph92 I apologize that no one has been found who is able to review this work so far. |
45cef9a to
94b9e76
Compare
|
Thanks @gjoseph92. I've updated this PR to make the required fixes a bit more localized. Thanks for being patient here, this one slipped through the cracks (apologies). |
|
Hmmm, tests are failing for numpy 1.15. Debugging. |
|
Was a bug in the test, fixed now. |
|
Wow thanks @jcrist! Wasn't expecting this to get solved, but we've still been running into it, so I really appreciate the help. |
|
Failure is unrelated. cc @jrbourbeau or @TomAugspurger for a quick review, but I think this is good-to-go. |
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks for the PR @gjoseph92 and @jcrist @mrocklin for reviewing!
This almost works, but still fails for
std, which returns 1-element array where the element is masked, instead of the MaskedConstant it should.The issue is that
np.sqrtdoesn't return a masked constant, whereasnp.ma.sqrtdoes:da.stdcallssqrt, which is a dask ufunc wrappingnp.sqrt.Replacing the
sqrtufunc with the masked version seems like a potentially significant change around the edge cases. For example, they handle NaN differently:I tried this in 45cef9a, but it doesn't seem like a good idea. (And breaks other tests due to the above discrepancy).
So the question is: how to write
stdin such a way that if the variance is a MaskedConstant, we just return the MaskedConstant without passing it tosqrt? (Or use a different version ofsqrtthat handles the MaskedConstant correctly, but doesn't interpret NaN as masked?)black dask/flake8 daskCloses #5338.