-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Description
Currently the hash join outputs key columns from both sides. On an outer join this can help distinguish between a row that matched but had entirely null payloads on one side and a row that didn't match on one side.
However, that distinction is sometimes not very important and many databases will simply coalesce the key columns into one. For example, we might get an outer join result today that looks like:
L_KEY | R_KEY | L_PAY | R_PAY
0 0 x Y
NULL 1 NULL Z
2 NULL A NULL
Ideally we could specify a "combine key columns" option to get a result that looks like:
KEY | L_PAY | R_PAY
0 x Y
1 NULL Z
2 A NULL
This can be done today with an extra project step, and it isn't likely to offer much performance benefit, but from a usability perspective it would be nice if users didn't have to do this extra project step.
Reporter: Weston Pace / @westonpace
Related issues:
- [C++] Key column behavior in joins (is depended upon by)
Note: This issue was originally created as ARROW-15957. Please see the migration documentation for further details.