Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add keep_order argument to table.join for original table order #16361

Merged
merged 3 commits into from
May 1, 2024

Conversation

taldcroft
Copy link
Member

@taldcroft taldcroft commented Apr 30, 2024

Description

This pull request is to allow maintaining the original table order when doing a left, right or inner table join. This is done by adding a new keyword argument keep_order to the join function.

For outer or cartesian joins there is no meaningful unique ordering so the keep_order argument value is ignored but a warning is issued for keep_order=True.

Fixes #11619

Close #15994

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

@taldcroft taldcroft added this to the v7.0.0 milestone Apr 30, 2024
@taldcroft taldcroft changed the title Add keep_order argument to astropy.table.join for original table order Add keep_order argument to table.join for original table order Apr 30, 2024
Copy link

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good. My one worry is that if the _join itself fails (say because of a metadata conflict), an input table can be left with an extra column. So, I'd suggest to wrap the lot in a try/finally with the special column removed in the finally part.

On the API: an option to avoid having to swap input arguments for inner would be to allow keep_order to be a string ('right' or 'left'). But arguably overkill.

@taldcroft
Copy link
Member Author

Overall, looks good. My one worry is that if the _join itself fails (say because of a metadata conflict), an input table can be left with an extra column. So, I'd suggest to wrap the lot in a try/finally with the special column removed in the finally part.

Good catch, will do.

On the API: an option to avoid having to swap input arguments for inner would be to allow keep_order to be a string ('right' or 'left'). But arguably overkill.

The 'left' and 'right' then require validation to avoid keep_order="right", join_type="left" or vice-versa. Overall it seems like overkill and the bool option is good enough for pandas.

Copy link
Contributor

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two minor remarks, otherwise looks good (and a relief if it means I can ditch #15994 😅) !

astropy/table/operations.py Show resolved Hide resolved
@@ -426,6 +449,13 @@ def join(
keys_right=keys_right,
)

if sort_table is not None:
# Sort the table back to the original order and remove the temporary columns.
# If sort_table is not None that implies keep_order=True.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so ... why not use if keep_order: directly ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For keep_order=True and join_type="outer" then sort_table is None and the code would fail.

@taldcroft
Copy link
Member Author

@mhvk @neutrinoceros - I think I have addressed the comments and CI is passing.

@taldcroft taldcroft enabled auto-merge (squash) May 1, 2024 11:28
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, all OK now!

@taldcroft taldcroft merged commit 835f64f into astropy:main May 1, 2024
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add option to preserve input order on table joins
3 participants