New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prevent div by zero for mean of all-masks #14341
Conversation
Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for separating things out!
I am a bit worried about slowing things down for the normal case that no slices are fully masked, but in the end the warning is not useful, so probably it is better to avoid it. Some comments inline.
astropy/utils/masked/core.py
Outdated
|
||
# catch the case when an axis is fully masked to prevent div by zero: | ||
fully_masked_axes = n == 0 | ||
divisor = np.where(fully_masked_axes, 1, n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my tests, slightly faster is the following
neq0 = n == 0
n += neq0
result /= n
result.unmasked[neq0] = np.nan
Note that the mask is already set correctly, so no need to do that again.
Actually, it may well be a lot faster to do the sum on the unmasked data and add the mask back. Something like
result = self.unmasked.sum(...)
n = np.add.reduce(where, axis=axis, keepdims=keepdims)
neq0 = n == 0
n += neq0
result /= n
result.unmasked[neq0] = np.nan
return self._masked_result(result, neq0)
But fine to do that separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speeds tested in #14341 (comment).
Collapsed below are the results of a speed test for the algorithm in the PR and the two alternatives from @mhvk in Speed testSpeed test from astropy.utils.masked import Masked
import numpy as np
def pr(arr):
result = arr.sum(
axis=axis, dtype=dtype, out=out, keepdims=keepdims, where=where
)
n = np.add.reduce(where, axis=axis, keepdims=keepdims)
fully_masked_axes = n == 0
divisor = np.where(fully_masked_axes, 1, n)
result /= divisor
# mask resulting values from fully-masked axes
# which have been divided by 1
result[fully_masked_axes] = np.ma.masked_array(np.nan, True)
return result
def v2(arr):
result = arr.sum(
axis=axis, dtype=dtype, out=out, keepdims=keepdims, where=where
)
n = np.add.reduce(where, axis=axis, keepdims=keepdims)
neq0 = n == 0
n += neq0
result /= n
result.unmasked[neq0] = np.nan
def v3(arr):
result = arr.sum(
axis=axis, dtype=dtype, out=out, keepdims=keepdims, where=where
)
n = np.add.reduce(where, axis=axis, keepdims=keepdims)
neq0 = n == 0
n += neq0
result /= n
result.unmasked[neq0] = np.nan
return arr._masked_result(result, neq0, None)
def get_args():
shape = (100, 10, 5)
axis = 0
where = True
keepdims = False
dtype = float
out = None
arr = Masked(
np.arange(np.prod(shape), dtype=dtype).reshape(shape),
np.random.randint(0, 2, size=shape).astype(bool)
)
where = ~arr.mask & where
return arr
arr = get_args()
%timeit pr(arr)
%timeit v2(arr)
%timeit v3(arr) Results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to test throroughly! Just one comment on a comment... And maybe also squash the commits to one.
astropy/utils/masked/core.py
Outdated
result /= n | ||
# mask resulting values from fully-masked axes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the comment should be changed to something like "# Correct fully-masked slice results to what is expected for 0/0 division." -- we're not touching the mask here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much clearer, thanks.
6982319
to
c349f2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Let's get this in once the tests have passed (excluding coverage -- what the heck is up with that? your lines are most obviously covered!)
Coverage looks like a heisenbug reported in codecov/codecov-action#598 |
This needs a rebase to pick up the new mpl322 job. |
revisions from Marten clearer comments
c349f2e
to
069e3ca
Compare
@pllim Rebased. 🤞🏻 |
@mhvk , you sure this doesn't need a change log? |
The upload fails again with a different error, maybe it is OpenAstronomy/github-actions-workflows#105 |
Huh, now the coverage is back to normal. 🤪 |
Thanks, all! |
Description
In testing #14175, I was often doing operations on Masked arrays. Sometimes all entries along a given axis in the array are masked. You can run into trouble if you take the
mean
of an array that is masked everywhere, since themean
issum(axis)/n
where the number of non-masked pointsn
can be zero. There are a bunch offilterwarnings
for this in the masked tests like this:astropy/astropy/utils/masked/tests/test_masked.py
Line 884 in 93d9d9d
I go back and forth on whether or not the current behavior (warning) above is helpful. On one hand, the warning is useful if you don't know how
masked
works and you need to trace nans. On the other, the module is emitting a warning when nothing is "actually going wrong," because the undefined result will be masked anyway.This PR circumvents the warning from division by zero. The modified
mean
method will divide by1
wheren=0
to dodge the warning, and return a masked nan value for that result.