Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support mixin columns for group aggregation operations #12825

Merged
merged 5 commits into from
Apr 23, 2022

Conversation

taldcroft
Copy link
Member

@taldcroft taldcroft commented Feb 5, 2022

Description

This pull request adds support for using mixin columns in group aggregation operations when the mixin supports the specified operation (e.g. np.sum works for Quantity but not Time). In cases where the operation is not supported the code now issues a warning and drops the column instead of raising an exception.

Notes:

  • To support mixins the groups attribute was added to the info property. For Column this is taken from the parent, while for mixins this is a distinct new attribute.
  • The tests were updated to include a Quantity column in most grouping tests.
  • As before, when a column cannot be aggregated for any reason it is dropped from the output and a warning is issued. Now the text of the warning includes the exception that caused the problem.

Fixes #12249

Checklist for package maintainer(s)

This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see "When to rebase and squash commits".
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the Extra CI label.
  • Is a change log needed? If yes, did the change log check pass? If no, add the no-changelog-entry-needed label. If this is a manual backport, use the skip-changelog-checks label unless special changelog handling is necessary.
  • Is a milestone set? Milestone must be set but astropy-bot check might be missing; do not let the green checkmark fool you.
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate backport-X.Y.x label(s) before merge.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, what a nice demonstration of how well .info can be used! I've only minimal comments, really more questions, so will approve now (failures seem due to new pytest version).

@@ -278,7 +278,7 @@ For the example grouped table ``obs_by_name`` from above, we compute the group
means with the :meth:`~astropy.table.groups.TableGroups.aggregate` method::

>>> obs_mean = obs_by_name.groups.aggregate(np.mean) # doctest: +SHOW_WARNINGS
AstropyUserWarning: Cannot aggregate column 'obs_date' with type '<U10'
WARNING: Cannot aggregate column 'obs_date' with type '<U10': cannot perform reduceat with flexible type [astropy.table.groups]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better for this to remain an AstropyUserWarning? Though I'm actually not sure why this changed! Anyway, definitely not a big deal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, I didn't notice that change. Looks like RTD was OK on this. It failed but for an unrelated reason. Hmm.

astropy/table/groups.py Show resolved Hide resolved
docs/changes/table/12825.feature.rst Show resolved Hide resolved
@taldcroft
Copy link
Member Author

I'll just let this wait here while hopefully the unrelated problems CI and RTD get fixed. Thoughts @pllim?

In the meantime @mhvk you can click the checklist boxes which apply, noting that CI has not yet passed. 😄

@mhvk
Copy link
Contributor

mhvk commented Feb 6, 2022

@taldcroft - looks all good. I guess we'll see after the pytest fixes plus rebase whether the change in warning remains... But definitely all OK.

p.s. The info setting is indeed strange; turning, say, a Column into a Quantity or vice versa one would think things like name should be transferred... Anyway, as you noted, for another issue!

@nstarman
Copy link
Member

@taldcroft it looks like numpy needs to be imported for the doctest.

@taldcroft
Copy link
Member Author

it looks like numpy needs to be imported for the doctest.

@nstarman - done, hopefully that will do the trick.

@nstarman
Copy link
Member

The test suite doesn't appear to be running. Rebase?

@pllim
Copy link
Member

pllim commented Mar 25, 2022

Actions going through maintenance. Try again in a few hours, maybe.

@taldcroft
Copy link
Member Author

@pllim - actions still look stuck. What's the easiest way to restart?

@pllim
Copy link
Member

pllim commented Mar 28, 2022

Lemme close/re-open

@pllim pllim closed this Mar 28, 2022
@pllim pllim reopened this Mar 28, 2022
@pllim
Copy link
Member

pllim commented Mar 28, 2022

OK running now

@taldcroft
Copy link
Member Author

Dang, I figured out what is happening with the numpy=1.19 test failure. It looks like there was a bug in numpy that got fixed in numpy 1.20.

In numpy 1.18 and 1.19:

>>> from astropy.time import Time
>>> tm = Time([1,2,3], format='cxcsec')
>>> import numpy as np

# This should FAIL because it is reducing an object time that doesn't support addition
# but instead it just reduces the 0-index value
>>> np.add.reduceat(tm, np.array([0]))
array([<Time object: scale='tt' format='cxcsec' value=1.0>], dtype=object)

In numpy 1.20:

>>> from astropy.time import Time
>>> tm = Time([1,2,3], format='cxcsec')
>>> import numpy as np
>>> np.add.reduceat(tm, np.array([0]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/aldcroft/miniconda3/envs/numpy-19/lib/python3.9/site-packages/astropy/time/core.py", line 2067, in __add__
    raise OperandTypeError(self, other, '+')
astropy.time.core.OperandTypeError: Unsupported operand type(s) for +: 'Time' and 'Time'

It isn't immediately clear what's the best path forward given this problem for numpy < 1.20. Maybe only allow Quantity mixins for aggregation in that case?

@nstarman
Copy link
Member

nstarman commented Mar 30, 2022

Restating the problem to make sure I understand.
For all versions of numpy, Quantity mixin aggregations works.
For numpy v1.20+ any mixin aggregation works.

The proposed solution is to check the numpy version and allow only aggregation of Quantity mixins for numpy < 1.20?

@taldcroft
Copy link
Member Author

For all versions of numpy, Quantity mixin aggregations works.

Correct.

For numpy v1.20+ any mixin aggregation works.

Correct, with the caveat that "works" means that it correctly fails in some case (where it will generate a warning and drop the column from the aggregation output).

The proposed solution is to check the numpy version and allow only aggregation of Quantity mixins for numpy < 1.20?

Correct. I haven't evaluated the code required for this special-casing but I'm guessing it is not too bad. Based on NEP 29 it looks like astropy 5.2 could support only numpy 1.20+ and we can back out the special-case code.

@taldcroft
Copy link
Member Author

@nstarman - I rebased and added a possible solution to the numpy < 1.20 issue in cca0400.

Copy link
Member

@nstarman nstarman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good! CI failure appears real from a reduction on a flexible type.

@taldcroft
Copy link
Member Author

I think the failure is just a sphinx doctests question (@plim?). The specific error message from numpy changes depending on version:

280   >>> obs_mean = obs_by_name.groups.aggregate(np.mean)  # doctest: +SHOW_WARNINGS
Expected:
    AstropyUserWarning: Cannot aggregate column 'obs_date' with type '<U10': ufunc 'add' did not contain a loop with signature matching types (dtype('<U10'), dtype('<U10')) -> None
Got:
    AstropyUserWarning: Cannot aggregate column 'obs_date' with type '<U10': cannot perform reduceat with flexible type

/home/runner/work/astropy/astropy/docs/table/operations.rst:280: DocTestFailure

Can I do a ... in this where the docs would be:

  >>> obs_mean = obs_by_name.groups.aggregate(np.mean)  # doctest: +SHOW_WARNINGS
  AstropyUserWarning: Cannot aggregate column 'obs_date' with type '<U10':  ...

@pllim
Copy link
Member

pllim commented Apr 6, 2022

Yes, you can use the ellipsis directive.

@taldcroft
Copy link
Member Author

@nstarman @mhvk - with the docs fix I think this is ready. The ci/circleci failures are unrelated.

@taldcroft taldcroft requested review from nstarman and mhvk April 21, 2022 14:16
@saimn
Copy link
Contributor

saimn commented Apr 22, 2022

@taldcroft - Could you rebase to see if it fixes the dev job ? (Failure is probably unrelated ?)

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Hopefully the CI errors can be resolved by a rebase.

@pllim
Copy link
Member

pllim commented Apr 23, 2022

The failures are unrelated, so merging. Thanks!

@pllim pllim merged commit 05d5a69 into astropy:main Apr 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SkyCoord in Table breaks aggregate on group_by
5 participants