Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add outer join option for dataframe merge #3019

Closed
stress-tess opened this issue Mar 4, 2024 · 0 comments · Fixed by #3022
Closed

Add outer join option for dataframe merge #3019

stress-tess opened this issue Mar 4, 2024 · 0 comments · Fixed by #3022
Assignees

Comments

@stress-tess
Copy link
Member

A customer requested we add df.merge(df2, how='outer') like is possible in pandas

An (ugly) workaround for the time being:

>>> df1 = ak.DataFrame(
...:     {
...:         "key": ak.arange(4),
...:         "value1": ak.array(["A", "B", "C", "D"]),
...:         "value3":ak.arange(4,0,-1)
...:     }
...: )
...:
...: df2 = ak.DataFrame(
...:    {
...:         "key": ak.arange(2, 6, 1),
...:         "value1": ak.array(["A", "B", "D", "F"]),
...:         "value2": ak.array(["apple", "banana", "cherry", "date"]),
...:     }
...: )

>>> left_join = df1.merge(df2, how="left", on="key")

>>> right_join = df1.merge(df2, how="right", on="key")

>>> def convert_int_col_to_float(df):
...:     for col in df.columns:
...:         if df[col].dtype is ak.int64:
...:             df[col] = ak.cast(df[col], ak.float64)

>>> convert_int_col_to_float(left_join)

>>> convert_int_col_to_float(right_join)

>>> my_df = ak.DataFrame.append(left_join, right_join).drop_duplicates(subset=["key"]).reset_index()

>>> my_df
   key value1_y  value2 value1_x  value3
0  0.0      nan     nan        A     4.0
1  1.0      nan     nan        B     3.0
2  2.0        A   apple        C     2.0
3  3.0        B  banana        D     1.0
4  4.0        D  cherry      nan     NaN
5  5.0        F    date      nan     NaN (6 rows x 5 columns)

# verify it matches pandas
>>> df1.to_pandas().merge(df2.to_pandas(), how="outer", on="key")
   key value1_x  value3 value1_y  value2
0    0        A     4.0      NaN     NaN
1    1        B     3.0      NaN     NaN
2    2        C     2.0        A   apple
3    3        D     1.0        B  banana
4    4      NaN     NaN        D  cherry
5    5      NaN     NaN        F    date
@ajpotts ajpotts self-assigned this Mar 4, 2024
ajpotts added a commit to ajpotts/arkouda that referenced this issue Mar 8, 2024
github-merge-queue bot pushed a commit that referenced this issue Mar 11, 2024
Co-authored-by: Amanda Potts <ajpotts@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants