[SPARK-5583][SQL][WIP] Support unique join in hive context #4354

scwf · 2015-02-04T04:09:47Z

Support unique join in hive context, the basic idea is transform unique join into outer join + filter in spark sql:

FROM UNIQUEJOIN [PRESERVE] T1 a (a.key), [PRESERVE] T2 b (b.key), [PRESERVE] T3 c (c.key) ...

If all the tables have PRESERVE keyword ==> T1 full out join T2 full out join T3 ...
else If all the tables do not have PRESERVE keyword ==> T1 inner join T2 inner join T3 ...
else ==>
T = (T1 full out join T2 full out join T3 ...)
Filter on T, filter condition = keep the rows with any preserve field is not null.

for examples:
1 T1 a (a.key), PRESERVE T2 b (b.key), PRESERVE T3 c (c.key) ==> if b.key is not null or c.key is not null, we'll keep the row
2 T1 a (a.key), T2 b (b.key), PRESERVE T3 c (c.key) ==> if c.key is not null we'll keep the row

Correct me if i am wrong.

todos: add tests for this

SparkQA · 2015-02-04T04:12:50Z

Test build #26720 has started for PR 4354 at commit b7e89a9.

This patch merges cleanly.

SparkQA · 2015-02-04T04:13:48Z

Test build #26720 has finished for PR 4354 at commit b7e89a9.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-04T04:13:49Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26720/
Test FAILed.

SparkQA · 2015-02-04T04:22:34Z

Test build #26721 has started for PR 4354 at commit 015fe2f.

This patch does not merge cleanly.

SparkQA · 2015-02-04T04:27:32Z

Test build #26722 has started for PR 4354 at commit dd34ebf.

This patch merges cleanly.

rxin · 2015-02-04T04:59:04Z

Is this actually used by anybody?

SparkQA · 2015-02-04T05:13:17Z

Test build #26722 has finished for PR 4354 at commit dd34ebf.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-04T05:13:19Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26722/
Test FAILed.

scwf · 2015-02-04T05:18:02Z

@rxin not sure for that, but here i just adapt it as filter + join in HiveQL.scala, no changes in catalyst and sql/core, maybe we can support it since it is at a small cost?

rxin · 2015-02-04T05:20:51Z

Do you mind adding more inline comment? My worry is just complexity. If nobody uses this, it's going to be a bunch of code there that for the sake of supporting a thing in Hive.

Do any other database systems support this unique join syntax? (Or something similar)

scwf · 2015-02-04T05:26:59Z

It seems this is hive specified syntax as far as i know...

SparkQA · 2015-02-04T05:45:00Z

Test build #26721 has finished for PR 4354 at commit 015fe2f.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-04T05:45:04Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26721/
Test FAILed.

rxin · 2015-02-04T05:46:45Z

Yea in that case maybe let's not support it. It's hard for me to imagine somebody using this :)

Thanks a lot for investigating this though. We can merge this patch in the future if there are stronger demand.

scwf · 2015-02-04T06:08:17Z

ok, i am closing this.

scwf added 2 commits February 3, 2015 13:29

support unique join in hive context

b7e89a9

fix style

015fe2f

fix conflicts

dd34ebf

scwf closed this Feb 4, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-5583][SQL][WIP] Support unique join in hive context #4354

[SPARK-5583][SQL][WIP] Support unique join in hive context #4354

scwf commented Feb 4, 2015

SparkQA commented Feb 4, 2015

SparkQA commented Feb 4, 2015

AmplabJenkins commented Feb 4, 2015

SparkQA commented Feb 4, 2015

SparkQA commented Feb 4, 2015

rxin commented Feb 4, 2015

SparkQA commented Feb 4, 2015

AmplabJenkins commented Feb 4, 2015

scwf commented Feb 4, 2015

rxin commented Feb 4, 2015

scwf commented Feb 4, 2015

SparkQA commented Feb 4, 2015

AmplabJenkins commented Feb 4, 2015

rxin commented Feb 4, 2015

scwf commented Feb 4, 2015

[SPARK-5583][SQL][WIP] Support unique join in hive context #4354

[SPARK-5583][SQL][WIP] Support unique join in hive context #4354

Conversation

scwf commented Feb 4, 2015

SparkQA commented Feb 4, 2015

SparkQA commented Feb 4, 2015

AmplabJenkins commented Feb 4, 2015

SparkQA commented Feb 4, 2015

SparkQA commented Feb 4, 2015

rxin commented Feb 4, 2015

SparkQA commented Feb 4, 2015

AmplabJenkins commented Feb 4, 2015

scwf commented Feb 4, 2015

rxin commented Feb 4, 2015

scwf commented Feb 4, 2015

SparkQA commented Feb 4, 2015

AmplabJenkins commented Feb 4, 2015

rxin commented Feb 4, 2015

scwf commented Feb 4, 2015