New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-5583][SQL][WIP] Support unique join in hive context #4354
Conversation
Test build #26720 has started for PR 4354 at commit
|
Test build #26720 has finished for PR 4354 at commit
|
Test FAILed. |
Test build #26721 has started for PR 4354 at commit
|
Test build #26722 has started for PR 4354 at commit
|
Is this actually used by anybody? |
Test build #26722 has finished for PR 4354 at commit
|
Test FAILed. |
@rxin not sure for that, but here i just adapt it as filter + join in HiveQL.scala, no changes in catalyst and sql/core, maybe we can support it since it is at a small cost? |
Do you mind adding more inline comment? My worry is just complexity. If nobody uses this, it's going to be a bunch of code there that for the sake of supporting a thing in Hive. Do any other database systems support this unique join syntax? (Or something similar) |
It seems this is hive specified syntax as far as i know... |
Test build #26721 has finished for PR 4354 at commit
|
Test FAILed. |
Yea in that case maybe let's not support it. It's hard for me to imagine somebody using this :) Thanks a lot for investigating this though. We can merge this patch in the future if there are stronger demand. |
ok, i am closing this. |
Support unique join in hive context, the basic idea is transform unique join into outer join + filter in spark sql:
FROM UNIQUEJOIN [PRESERVE] T1 a (a.key), [PRESERVE] T2 b (b.key), [PRESERVE] T3 c (c.key) ...
If all the tables have PRESERVE keyword ==>
T1 full out join T2 full out join T3 ...
else If all the tables do not have PRESERVE keyword ==>
T1 inner join T2 inner join T3 ...
else ==>
T = (T1 full out join T2 full out join T3 ...)
Filter on T, filter condition = keep the rows with any preserve field is not null.
for examples:
1
T1 a (a.key), PRESERVE T2 b (b.key), PRESERVE T3 c (c.key)
==> if b.key is not null or c.key is not null, we'll keep the row2
T1 a (a.key), T2 b (b.key), PRESERVE T3 c (c.key)
==> if c.key is not null we'll keep the rowCorrect me if i am wrong.
todos: add tests for this