prevent div by zero for mean of all-masks #14341

bmorris3 · 2023-01-30T19:42:52Z

Description

In testing #14175, I was often doing operations on Masked arrays. Sometimes all entries along a given axis in the array are masked. You can run into trouble if you take the mean of an array that is masked everywhere, since the mean is sum(axis)/n where the number of non-masked points n can be zero. There are a bunch of filterwarnings for this in the masked tests like this:

astropy/astropy/utils/masked/tests/test_masked.py

Line 884 in 93d9d9d

@pytest.mark.filterwarnings("ignore:.*encountered in.*divide")

I go back and forth on whether or not the current behavior (warning) above is helpful. On one hand, the warning is useful if you don't know how masked works and you need to trace nans. On the other, the module is emitting a warning when nothing is "actually going wrong," because the undefined result will be masked anyway.

This PR circumvents the warning from division by zero. The modified mean method will divide by 1 where n=0 to dodge the warning, and return a masked nan value for that result.

github-actions · 2023-01-30T19:43:32Z

mhvk

Thanks for separating things out!

I am a bit worried about slowing things down for the normal case that no slices are fully masked, but in the end the warning is not useful, so probably it is better to avoid it. Some comments inline.

mhvk · 2023-01-31T01:57:30Z

astropy/utils/masked/core.py

+
+        # catch the case when an axis is fully masked to prevent div by zero:
+        fully_masked_axes = n == 0
+        divisor = np.where(fully_masked_axes, 1, n)


From my tests, slightly faster is the following

neq0 = n == 0 n += neq0 result /= n result.unmasked[neq0] = np.nan

Note that the mask is already set correctly, so no need to do that again.

Actually, it may well be a lot faster to do the sum on the unmasked data and add the mask back. Something like

result = self.unmasked.sum(...) n = np.add.reduce(where, axis=axis, keepdims=keepdims) neq0 = n == 0 n += neq0 result /= n result.unmasked[neq0] = np.nan return self._masked_result(result, neq0)

But fine to do that separately.

Speeds tested in #14341 (comment).

astropy/utils/masked/tests/test_masked.py

bmorris3 · 2023-01-31T14:59:32Z

Collapsed below are the results of a speed test for the algorithm in the PR and the two alternatives from @mhvk in
#14341 (comment). The both suggestions are faster, but I think the first is clearer, so I will push it in a moment.

Speed test

  from astropy.utils.masked import Masked
  import numpy as np
  
  def pr(arr):
      result = arr.sum(
          axis=axis, dtype=dtype, out=out, keepdims=keepdims, where=where
      )
      n = np.add.reduce(where, axis=axis, keepdims=keepdims)
      fully_masked_axes = n == 0
      divisor = np.where(fully_masked_axes, 1, n)
      result /= divisor
      # mask resulting values from fully-masked axes
      # which have been divided by 1
      result[fully_masked_axes] = np.ma.masked_array(np.nan, True)
      return result
  
  def v2(arr):
      result = arr.sum(
          axis=axis, dtype=dtype, out=out, keepdims=keepdims, where=where
      )
      n = np.add.reduce(where, axis=axis, keepdims=keepdims)
      neq0 = n == 0
      n += neq0
      result /= n
      result.unmasked[neq0] = np.nan
  
  def v3(arr):
      result = arr.sum(
          axis=axis, dtype=dtype, out=out, keepdims=keepdims, where=where
      )
      n = np.add.reduce(where, axis=axis, keepdims=keepdims)
      neq0 = n == 0
      n += neq0
      result /= n
      result.unmasked[neq0] = np.nan
      return arr._masked_result(result, neq0, None)
  
  def get_args():
      shape = (100, 10, 5)
      axis = 0
      where = True
      keepdims = False
      dtype = float
      out = None
  
      arr = Masked(
          np.arange(np.prod(shape), dtype=dtype).reshape(shape),
          np.random.randint(0, 2, size=shape).astype(bool)
      )
      where = ~arr.mask & where
      return arr

  arr = get_args()
  %timeit pr(arr)
  %timeit v2(arr)
  %timeit v3(arr)

Results

  90.5 µs ± 1.45 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
  84.5 µs ± 2.95 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
  84.4 µs ± 2.24 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

mhvk

Nice to test throroughly! Just one comment on a comment... And maybe also squash the commits to one.

mhvk · 2023-01-31T15:15:55Z

astropy/utils/masked/core.py

        result /= n
+        # mask resulting values from fully-masked axes


Maybe the comment should be changed to something like "# Correct fully-masked slice results to what is expected for 0/0 division." -- we're not touching the mask here!

Much clearer, thanks.

mhvk

Great! Let's get this in once the tests have passed (excluding coverage -- what the heck is up with that? your lines are most obviously covered!)

pllim · 2023-01-31T17:13:58Z

Coverage looks like a heisenbug reported in codecov/codecov-action#598

pllim · 2023-01-31T17:14:18Z

This needs a rebase to pick up the new mpl322 job.

revisions from Marten clearer comments

bmorris3 · 2023-01-31T17:20:56Z

@pllim Rebased. 🤞🏻

pllim · 2023-01-31T17:26:06Z

@mhvk , you sure this doesn't need a change log?

pllim · 2023-01-31T17:45:33Z

The upload fails again with a different error, maybe it is OpenAstronomy/github-actions-workflows#105

mhvk · 2023-01-31T18:20:41Z

@pllim, you are right, it is probably best to add a changelog entry. After all, it is possible people were on purpose turning divide-by-zero warnings into errors...

@bmorris3 - sorry about not realizing that earlier: would you mind adding a changelog fragment in docs/changes/utils/?

pllim · 2023-01-31T20:29:30Z

Huh, now the coverage is back to normal. 🤪

pllim · 2023-01-31T20:29:42Z

Thanks, all!

github-actions bot added the utils.masked label Jan 30, 2023

pllim added this to the v5.3 milestone Jan 30, 2023

pllim requested a review from mhvk January 30, 2023 19:44

bmorris3 mentioned this pull request Jan 30, 2023

ENH nddata: collapse operations on NDDataArray, improved Masked Quantity support #14175

Merged

10 tasks

mhvk reviewed Jan 31, 2023

View reviewed changes

mhvk added Refactoring no-changelog-entry-needed labels Jan 31, 2023

bmorris3 force-pushed the masked-div-by-zero branch from 6982319 to c349f2e Compare January 31, 2023 15:19

mhvk approved these changes Jan 31, 2023

View reviewed changes

mhvk added the 💤 merge-when-ci-passes Do not use: We have auto-merge option now. label Jan 31, 2023

prevent div by zero for mean of all-masks

069e3ca

revisions from Marten clearer comments

bmorris3 force-pushed the masked-div-by-zero branch from c349f2e to 069e3ca Compare January 31, 2023 17:20

mhvk added utils and removed no-changelog-entry-needed labels Jan 31, 2023

adding changelog entry

f93975a

pllim merged commit ed7160e into astropy:main Jan 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prevent div by zero for mean of all-masks #14341

prevent div by zero for mean of all-masks #14341

bmorris3 commented Jan 30, 2023

github-actions bot commented Jan 30, 2023 •

edited by pllim

mhvk left a comment

mhvk Jan 31, 2023

bmorris3 Jan 31, 2023

bmorris3 commented Jan 31, 2023 •

edited

Speed test

Results

mhvk left a comment

mhvk Jan 31, 2023

bmorris3 Jan 31, 2023

mhvk left a comment

pllim commented Jan 31, 2023

pllim commented Jan 31, 2023

bmorris3 commented Jan 31, 2023

pllim commented Jan 31, 2023

pllim commented Jan 31, 2023

mhvk commented Jan 31, 2023

pllim commented Jan 31, 2023

pllim commented Jan 31, 2023

prevent div by zero for mean of all-masks #14341

prevent div by zero for mean of all-masks #14341

Conversation

bmorris3 commented Jan 30, 2023

Description

github-actions bot commented Jan 30, 2023 • edited by pllim

mhvk left a comment

Choose a reason for hiding this comment

mhvk Jan 31, 2023

Choose a reason for hiding this comment

bmorris3 Jan 31, 2023

Choose a reason for hiding this comment

bmorris3 commented Jan 31, 2023 • edited

Speed test

Results

mhvk left a comment

Choose a reason for hiding this comment

mhvk Jan 31, 2023

Choose a reason for hiding this comment

bmorris3 Jan 31, 2023

Choose a reason for hiding this comment

mhvk left a comment

Choose a reason for hiding this comment

pllim commented Jan 31, 2023

pllim commented Jan 31, 2023

bmorris3 commented Jan 31, 2023

pllim commented Jan 31, 2023

pllim commented Jan 31, 2023

mhvk commented Jan 31, 2023

pllim commented Jan 31, 2023

pllim commented Jan 31, 2023

github-actions bot commented Jan 30, 2023 •

edited by pllim

bmorris3 commented Jan 31, 2023 •

edited