Skip to content

[C++] Add option to consolidate key columns in hash join #31383

@asfimport

Description

@asfimport

Currently the hash join outputs key columns from both sides. On an outer join this can help distinguish between a row that matched but had entirely null payloads on one side and a row that didn't match on one side.

However, that distinction is sometimes not very important and many databases will simply coalesce the key columns into one. For example, we might get an outer join result today that looks like:


L_KEY | R_KEY | L_PAY | R_PAY
    0       0       x       Y
 NULL       1    NULL       Z
    2    NULL       A    NULL

Ideally we could specify a "combine key columns" option to get a result that looks like:


KEY | L_PAY | R_PAY
  0       x       Y
  1    NULL       Z
  2       A    NULL

This can be done today with an extra project step, and it isn't likely to offer much performance benefit, but from a usability perspective it would be nice if users didn't have to do this extra project step.

Reporter: Weston Pace / @westonpace

Related issues:

Note: This issue was originally created as ARROW-15957. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions