New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-4076] Use beam join api in sql #11041
Conversation
1e950d3
to
80de858
Compare
@kennknowles This PR still needs some comments and unit tests, but what do you think? It pulls most of the complexity of the join transforms into the generic schema Join transform. A lot of code is deleted. Could've deleted even more code, except it's being used by "lookup" joins. |
A quick summary: As part of this, several bugs and gaps were discovered in the schema APIs. One was fixed in a previous PR, and the rest are in this PR. This also adds "broadcast" join capability to the schema Join API. This replaces the side-input join functionality that previously was implemented in SQL. Lookup join remains in SQL. I'm not sure it's worth (i.e. that it's a general-enough use case) trying to pull this into the core Beam transforms. |
Run Java PreCommit |
Run SQL Postcommit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This change is rather large, it is quite likely I missed some bugs. It would be nice to have some more tests. Also needs a JIRA before merging.
@apilloud Added a richer set of unit tests and a JIRA. Will merge once green. |
Run SQL Postcommit |
if (joinType == JoinRelType.LEFT) { | ||
context.output(combineTwoRowsIntoOne(leftRow, rightNullRow, swap, schema)); | ||
} | ||
private static FieldAccessDescriptor getJoinColumn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually hardcodes a bad assumption a bit further than it was before: that the join is only on columns. We want to move in the other direction, and allow join conditions to be more general RexNodes, many of which still work for CoGBK and side input lookup joins. This is BEAM-6112.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed this, because in the current codebase on master we could inline and delete SerializableRexNode
entirely. So I went looking for how we encoding a full expression to be joined on. I didn't see rules that precomputed all of them (in which case input refs would suffice).
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.