-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support table join on different keys #11954
Conversation
👋 Thank you for your draft pull request! Do you know that you can use |
47bd8bd
to
8a6da12
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very; my comments are tiny.
One alternative for the API: to avoid having any API change and the checks on keys
not being used when keys_left
and keys_right
are, you could also let the keys
argument be a dict with two entries, 'left' and 'right'. Or, if it is a list, allow it to have entries that are 2-element tuples (i.e., the example would be either keys={'left': 'name', 'right': 'obs_id'}
or keys=[('name', 'obs_id')]
Note that I'm not convinced either is better - just thought I'd pass them by.
Since I do think this is all OK except for the typo in the docstring, I'll approve now.
astropy/table/operations.py
Outdated
@@ -354,6 +355,12 @@ def join(left, right, keys=None, join_type='inner', | |||
Default is to use all columns which are common to both tables. | |||
join_type : str | |||
Join type ('inner' | 'outer' | 'left' | 'right' | 'cartesian'), default is 'inner' | |||
keys_left : str or list of str of list of column-like, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second "of" should be "or list of column-like"
astropy/table/operations.py
Outdated
right_orig = right | ||
left, right, keys = _join_keys_left_right( | ||
left, right, keys, keys_left, keys_right, join_funcs) | ||
|
||
# If we have a single key, put it in a tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really directly related, but this comment should be a few lines down.
I also thought about the exact same API alternatives. In the end the precedent from pandas seems like a good thing, but I also think this API is more readable. Supplying a tuple of the left and right keys is definitely confusing (if you haven't studied the docs), so that one is easy to reject. The dict is OK, but I think at the end having distinct keywords is easier to read. This will also give a better error message if someone runs code written for astropy 5.0 on astropy < 5.0. Right now if you do that it will effectively run as if you had done |
Yes, I figured you had thought it through already! So, will merge now. Nice little addition! |
Description
This pull request is to allow doing a table join on table columns that have different names. It also allows doing a join on arbitrary column-like data which serve as the join values. The API is inspired by
left_on
andright_on
in pandas merge.This also introduces a small API change by requiring keyword-only args after
left, right, keys, join_type
. I think that after this it is unlikely that much external code is using positional args. The next arg was previouslyuniq_col_name
.Fixes #3752
Checklist for package maintainer(s)
This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.
Extra CI
label.no-changelog-entry-needed
label.astropy-bot
check might be missing; do not let the green checkmark fool you.backport-X.Y.x
label(s) before merge.