Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prevent div by zero for mean of all-masks #14341

Merged
merged 2 commits into from Jan 31, 2023

Conversation

bmorris3
Copy link
Contributor

Description

In testing #14175, I was often doing operations on Masked arrays. Sometimes all entries along a given axis in the array are masked. You can run into trouble if you take the mean of an array that is masked everywhere, since the mean is sum(axis)/n where the number of non-masked points n can be zero. There are a bunch of filterwarnings for this in the masked tests like this:

@pytest.mark.filterwarnings("ignore:.*encountered in.*divide")

I go back and forth on whether or not the current behavior (warning) above is helpful. On one hand, the warning is useful if you don't know how masked works and you need to trace nans. On the other, the module is emitting a warning when nothing is "actually going wrong," because the undefined result will be masked anyway.

This PR circumvents the warning from division by zero. The modified mean method will divide by 1 where n=0 to dodge the warning, and return a masked nan value for that result.

@github-actions
Copy link

github-actions bot commented Jan 30, 2023

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see "When to rebase and squash commits".
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • Is a milestone set? Milestone must be set but we cannot check for it on Actions; do not let the green checkmark fool you.
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for separating things out!

I am a bit worried about slowing things down for the normal case that no slices are fully masked, but in the end the warning is not useful, so probably it is better to avoid it. Some comments inline.


# catch the case when an axis is fully masked to prevent div by zero:
fully_masked_axes = n == 0
divisor = np.where(fully_masked_axes, 1, n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my tests, slightly faster is the following

neq0 = n == 0
n += neq0
result /= n
result.unmasked[neq0] = np.nan

Note that the mask is already set correctly, so no need to do that again.

Actually, it may well be a lot faster to do the sum on the unmasked data and add the mask back. Something like

result = self.unmasked.sum(...)
n = np.add.reduce(where, axis=axis, keepdims=keepdims)
neq0 = n == 0
n += neq0
result /= n
result.unmasked[neq0] = np.nan
return self._masked_result(result, neq0)

But fine to do that separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speeds tested in #14341 (comment).

astropy/utils/masked/tests/test_masked.py Outdated Show resolved Hide resolved
astropy/utils/masked/tests/test_masked.py Show resolved Hide resolved
@bmorris3
Copy link
Contributor Author

bmorris3 commented Jan 31, 2023

Collapsed below are the results of a speed test for the algorithm in the PR and the two alternatives from @mhvk in
#14341 (comment). The both suggestions are faster, but I think the first is clearer, so I will push it in a moment.

Speed test

Speed test

  from astropy.utils.masked import Masked
  import numpy as np
  
  def pr(arr):
      result = arr.sum(
          axis=axis, dtype=dtype, out=out, keepdims=keepdims, where=where
      )
      n = np.add.reduce(where, axis=axis, keepdims=keepdims)
      fully_masked_axes = n == 0
      divisor = np.where(fully_masked_axes, 1, n)
      result /= divisor
      # mask resulting values from fully-masked axes
      # which have been divided by 1
      result[fully_masked_axes] = np.ma.masked_array(np.nan, True)
      return result
  
  def v2(arr):
      result = arr.sum(
          axis=axis, dtype=dtype, out=out, keepdims=keepdims, where=where
      )
      n = np.add.reduce(where, axis=axis, keepdims=keepdims)
      neq0 = n == 0
      n += neq0
      result /= n
      result.unmasked[neq0] = np.nan
  
  def v3(arr):
      result = arr.sum(
          axis=axis, dtype=dtype, out=out, keepdims=keepdims, where=where
      )
      n = np.add.reduce(where, axis=axis, keepdims=keepdims)
      neq0 = n == 0
      n += neq0
      result /= n
      result.unmasked[neq0] = np.nan
      return arr._masked_result(result, neq0, None)
  
  def get_args():
      shape = (100, 10, 5)
      axis = 0
      where = True
      keepdims = False
      dtype = float
      out = None
  
      arr = Masked(
          np.arange(np.prod(shape), dtype=dtype).reshape(shape),
          np.random.randint(0, 2, size=shape).astype(bool)
      )
      where = ~arr.mask & where
      return arr

  arr = get_args()
  %timeit pr(arr)
  %timeit v2(arr)
  %timeit v3(arr)

Results

  90.5 µs ± 1.45 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
  84.5 µs ± 2.95 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
  84.4 µs ± 2.24 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to test throroughly! Just one comment on a comment... And maybe also squash the commits to one.

result /= n
# mask resulting values from fully-masked axes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the comment should be changed to something like "# Correct fully-masked slice results to what is expected for 0/0 division." -- we're not touching the mask here!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much clearer, thanks.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Let's get this in once the tests have passed (excluding coverage -- what the heck is up with that? your lines are most obviously covered!)

@mhvk mhvk added the 💤 merge-when-ci-passes Do not use: We have auto-merge option now. label Jan 31, 2023
@pllim
Copy link
Member

pllim commented Jan 31, 2023

Coverage looks like a heisenbug reported in codecov/codecov-action#598

@pllim
Copy link
Member

pllim commented Jan 31, 2023

This needs a rebase to pick up the new mpl322 job.

revisions from Marten

clearer comments
@bmorris3
Copy link
Contributor Author

@pllim Rebased. 🤞🏻

@pllim
Copy link
Member

pllim commented Jan 31, 2023

@mhvk , you sure this doesn't need a change log?

@pllim
Copy link
Member

pllim commented Jan 31, 2023

The upload fails again with a different error, maybe it is OpenAstronomy/github-actions-workflows#105

@mhvk
Copy link
Contributor

mhvk commented Jan 31, 2023

@pllim, you are right, it is probably best to add a changelog entry. After all, it is possible people were on purpose turning divide-by-zero warnings into errors...

@bmorris3 - sorry about not realizing that earlier: would you mind adding a changelog fragment in docs/changes/utils/?

@pllim
Copy link
Member

pllim commented Jan 31, 2023

Huh, now the coverage is back to normal. 🤪

@pllim pllim merged commit ed7160e into astropy:main Jan 31, 2023
@pllim
Copy link
Member

pllim commented Jan 31, 2023

Thanks, all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants