Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5583][SQL][WIP] Support unique join in hive context #4354

Closed
wants to merge 3 commits into from

Conversation

scwf
Copy link
Contributor

@scwf scwf commented Feb 4, 2015

Support unique join in hive context, the basic idea is transform unique join into outer join + filter in spark sql:

FROM UNIQUEJOIN [PRESERVE] T1 a (a.key), [PRESERVE] T2 b (b.key), [PRESERVE] T3 c (c.key) ...

If all the tables have PRESERVE keyword ==> T1 full out join T2 full out join T3 ...
else If all the tables do not have PRESERVE keyword ==> T1 inner join T2 inner join T3 ...
else ==>
T = (T1 full out join T2 full out join T3 ...)
Filter on T, filter condition = keep the rows with any preserve field is not null.

for examples:
1 T1 a (a.key), PRESERVE T2 b (b.key), PRESERVE T3 c (c.key) ==> if b.key is not null or c.key is not null, we'll keep the row
2 T1 a (a.key), T2 b (b.key), PRESERVE T3 c (c.key) ==> if c.key is not null we'll keep the row

Correct me if i am wrong.

todos: add tests for this

@SparkQA
Copy link

SparkQA commented Feb 4, 2015

Test build #26720 has started for PR 4354 at commit b7e89a9.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 4, 2015

Test build #26720 has finished for PR 4354 at commit b7e89a9.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26720/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Feb 4, 2015

Test build #26721 has started for PR 4354 at commit 015fe2f.

  • This patch does not merge cleanly.

@SparkQA
Copy link

SparkQA commented Feb 4, 2015

Test build #26722 has started for PR 4354 at commit dd34ebf.

  • This patch merges cleanly.

@rxin
Copy link
Contributor

rxin commented Feb 4, 2015

Is this actually used by anybody?

@SparkQA
Copy link

SparkQA commented Feb 4, 2015

Test build #26722 has finished for PR 4354 at commit dd34ebf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26722/
Test FAILed.

@scwf
Copy link
Contributor Author

scwf commented Feb 4, 2015

@rxin not sure for that, but here i just adapt it as filter + join in HiveQL.scala, no changes in catalyst and sql/core, maybe we can support it since it is at a small cost?

@rxin
Copy link
Contributor

rxin commented Feb 4, 2015

Do you mind adding more inline comment? My worry is just complexity. If nobody uses this, it's going to be a bunch of code there that for the sake of supporting a thing in Hive.

Do any other database systems support this unique join syntax? (Or something similar)

@scwf
Copy link
Contributor Author

scwf commented Feb 4, 2015

It seems this is hive specified syntax as far as i know...

@SparkQA
Copy link

SparkQA commented Feb 4, 2015

Test build #26721 has finished for PR 4354 at commit 015fe2f.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26721/
Test FAILed.

@rxin
Copy link
Contributor

rxin commented Feb 4, 2015

Yea in that case maybe let's not support it. It's hard for me to imagine somebody using this :)

Thanks a lot for investigating this though. We can merge this patch in the future if there are stronger demand.

@scwf
Copy link
Contributor Author

scwf commented Feb 4, 2015

ok, i am closing this.

@scwf scwf closed this Feb 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants