Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support leftsemijoin for sparkSQL #395

Closed
wants to merge 2 commits into from

Conversation

adrian-wang
Copy link
Contributor

No description provided.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@marmbrus
Copy link
Contributor

ok to test

@chenghao-intel
Copy link
Contributor

Besides the BroadcastNestedLoopJoin, I think the left semi join may also need to be implemented in the HashJoin.

@marmbrus
Copy link
Contributor

Thanks for adding this! It would be great if you could create a JIRA for tracking this new feature. Also, right now HashJoin is only used for Inner joins, though it would be good to also extend that at some point (though maybe not in this PR).

One design question is which of the following is better:

  • multiple operators that handle different kinds of joins, letting the planner pick the correct one
  • putting the switching logic inside of the operator as is done here

I need to look at this code closer, but will not have time to do that until after we start cutting release candidates for 1.0.

@adrian-wang
Copy link
Contributor Author

I'll create a JIRA soon.

@adrian-wang
Copy link
Contributor Author

Thanks for your comments! Here's SPARK-1495[https://issues.apache.org/jira/browse/SPARK-1495].

// TODO: One bitset per partition instead of per row.
val broadcastedRow = broadcastedRelation.value(i)
if (boundCondition(joinedRow(streamedRow, broadcastedRow))) {
matchedRows += buildRow(streamedRow)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to call buildRow here, as you can just use streamedRow.

@marmbrus
Copy link
Contributor

marmbrus commented May 8, 2014

Just checking in to see if there is anything I can help with here. Would be cool to have this feature!

@adrian-wang
Copy link
Contributor Author

Busy with some other issues recently, I'll try fix it this weekend.

@adrian-wang
Copy link
Contributor Author

I'll switch to a newer branch with #418 to split leftsemi from other joins.

pwendell added a commit to pwendell/spark that referenced this pull request May 12, 2014
…urn_scala

Remove simple redundant return statements for Scala methods/functions

Remove simple redundant return statements for Scala methods/functions:

-) Only change simple return statements at the end of method
-) Ignore the complex if-else check
-) Ignore the ones inside synchronized
-) Add small changes to making var to val if possible and remove () for simple get

This hopefully makes the review simpler =)

Pass compile and tests.
@adrian-wang
Copy link
Contributor Author

Just mention it here, I have submitted another solution as #837

@marmbrus
Copy link
Contributor

marmbrus commented Jun 2, 2014

Mind closing this version if it is subsumed by #837 ? Thanks!

@adrian-wang adrian-wang closed this Jun 2, 2014
@adrian-wang adrian-wang deleted the leftsemijoin branch June 2, 2014 22:06
@adrian-wang
Copy link
Contributor Author

Thanks, I have closed this.

asfgit pushed a commit that referenced this pull request Jun 9, 2014
Just submit another solution for #395

Author: Daoyuan <daoyuan.wang@intel.com>
Author: Michael Armbrust <michael@databricks.com>
Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #837 from adrian-wang/left-semi-join-support and squashes the following commits:

d39cd12 [Daoyuan Wang] Merge pull request #1 from marmbrus/pr/837
6713c09 [Michael Armbrust] Better debugging for failed query tests.
035b73e [Michael Armbrust] Add test for left semi that can't be done with a hash join.
5ec6fa4 [Michael Armbrust] Add left semi to SQL Parser.
4c726e5 [Daoyuan] improvement according to Michael
8d4a121 [Daoyuan] add golden files for leftsemijoin
83a3c8a [Daoyuan] scala style fix
14cff80 [Daoyuan] add support for left semi join

(cherry picked from commit 0cf6002)
Signed-off-by: Michael Armbrust <michael@databricks.com>
asfgit pushed a commit that referenced this pull request Jun 9, 2014
Just submit another solution for #395

Author: Daoyuan <daoyuan.wang@intel.com>
Author: Michael Armbrust <michael@databricks.com>
Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #837 from adrian-wang/left-semi-join-support and squashes the following commits:

d39cd12 [Daoyuan Wang] Merge pull request #1 from marmbrus/pr/837
6713c09 [Michael Armbrust] Better debugging for failed query tests.
035b73e [Michael Armbrust] Add test for left semi that can't be done with a hash join.
5ec6fa4 [Michael Armbrust] Add left semi to SQL Parser.
4c726e5 [Daoyuan] improvement according to Michael
8d4a121 [Daoyuan] add golden files for leftsemijoin
83a3c8a [Daoyuan] scala style fix
14cff80 [Daoyuan] add support for left semi join
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
Just submit another solution for apache#395

Author: Daoyuan <daoyuan.wang@intel.com>
Author: Michael Armbrust <michael@databricks.com>
Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes apache#837 from adrian-wang/left-semi-join-support and squashes the following commits:

d39cd12 [Daoyuan Wang] Merge pull request apache#1 from marmbrus/pr/837
6713c09 [Michael Armbrust] Better debugging for failed query tests.
035b73e [Michael Armbrust] Add test for left semi that can't be done with a hash join.
5ec6fa4 [Michael Armbrust] Add left semi to SQL Parser.
4c726e5 [Daoyuan] improvement according to Michael
8d4a121 [Daoyuan] add golden files for leftsemijoin
83a3c8a [Daoyuan] scala style fix
14cff80 [Daoyuan] add support for left semi join
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
Just submit another solution for apache#395

Author: Daoyuan <daoyuan.wang@intel.com>
Author: Michael Armbrust <michael@databricks.com>
Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes apache#837 from adrian-wang/left-semi-join-support and squashes the following commits:

d39cd12 [Daoyuan Wang] Merge pull request apache#1 from marmbrus/pr/837
6713c09 [Michael Armbrust] Better debugging for failed query tests.
035b73e [Michael Armbrust] Add test for left semi that can't be done with a hash join.
5ec6fa4 [Michael Armbrust] Add left semi to SQL Parser.
4c726e5 [Daoyuan] improvement according to Michael
8d4a121 [Daoyuan] add golden files for leftsemijoin
83a3c8a [Daoyuan] scala style fix
14cff80 [Daoyuan] add support for left semi join
mccheah pushed a commit to mccheah/spark that referenced this pull request Nov 28, 2018
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants