Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] ARROW-13642: [C++][Compute] Inner and outer join #11047

Closed

Conversation

michalursa
Copy link
Contributor

This is a work in progress provided for visibility, not a working code yet.
The code is based on the branch michalursa:ARROW-13532-filter-interface-for-grouper

Represents a collection of building blocks for implementing all flavors of hash join (semi, anti-semi, inner, outer).
For simpler navigation the code is broken into multiple files:

  • join_schema - helper classes for finding corresponding pairs of columns in two different sources (batch, hash table)
  • join_batch - helper classes for assembling and accumulating output rows in a batch taking input from both batch and hash table; source pairs of row ids are a result of hash table lookup
  • join_hashtable - building and querying hash table and related structures
  • join_filter - Bloom-like filter implementation
  • join_probe - (not implemented yet) join probe side processing logic related to implementing all 8 flavors of join
  • join_side - state of processing for each of two sides of a join, storage of accumulated rows, hash table, Bloom-like filter (called early filter or approximate membership test in the code)
  • join_type - constants and their manipulation for 8 flavors of join
  • join - (not implemented yet) glue code for all of the above and implementation of ExecNode interface

The main features that will be missing when this code is ready for review are:

  • parallel hash table and Bloom-like filter build
  • handling of dictionaries
  • support of residual predicates with outer joins (non-equality filters that are a part of join match condition)

@github-actions
Copy link

github-actions bot commented Sep 1, 2021

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW

Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant