-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRILL-4232: Support for EXCEPT and INTERSECT set operator #2599
Conversation
3a94b32
to
4966db9
Compare
@Leon-WTF Is this ready for review? |
@cgivre Not yet, I'm handling the EXCEPT case, it needs to remove the duplicate records for probe side(for other three cases, it's like semi-join, I just added the records num to the hash map), I'm trying to add an Agg phase after setop phase. Any suggestion on this? |
7b8d435
to
c3b1fa5
Compare
Hi @Leon-WTF how is this coming? |
eb54353
to
c95c854
Compare
HI @Leon-WTF |
c95c854
to
b5fc5b9
Compare
We are getting ready to release Drill 1.20.3. Once that's done, I'd like to start discussions around Drill 2.0. There have been a lot of major work in Drill and I'd like to see that getting used. |
cc8a921
to
89199f7
Compare
370d5e1
to
7d02615
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Leon-WTF, thanks for adding this functionality! I have added several code review comments to address before we can merge it.
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
Outdated
Show resolved
Hide resolved
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java
Outdated
Show resolved
Hide resolved
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillSetOpRel.java
Outdated
Show resolved
Hide resolved
db66964
to
45a44d7
Compare
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillSetOpRel.java
Outdated
Show resolved
Hide resolved
...ava-exec/src/main/java/org/apache/drill/exec/physical/impl/setop/HashSetOpProbeTemplate.java
Outdated
Show resolved
Hide resolved
49a9307
to
7317fad
Compare
@vvysotskyi Hi, any more comments on this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for refactoring and addressing code review comments. I have found several places to improve, and it will be ready to go.
.../java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillAddAggForExceptRule.java
Outdated
Show resolved
Hide resolved
ImmutableBitSet.range(0, drillExceptRel.getInput(0).getRowType().getFieldList().size()), ImmutableList.of(), ImmutableList.of()); | ||
call.transformTo(drillExceptRel.copy(ImmutableList.of(aggNode, drillExceptRel.getInput(1)), true)); | ||
} else { | ||
call.transformTo(new DrillAggregateRel(drillExceptRel.getCluster(), drillExceptRel.getTraitSet(), drillExceptRel.copy(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add aggregate on top of except?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vvysotskyi It's for performance, if the data cardinality is high, aggregate before except may not reduce many data, if the data after except left are few, aggregate after except will only handle few data which is faster than before except. This may be choosen by statistics info + CBO automaticlly in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But output values should be already distinct after the execution of except operator, so the aggregate will do nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not distinct for left table after except operator. I choosed to reuse an aggregate operator to do the distinct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, ok, if it is specific to our implementation of except operator, aggregation added here possibly could be removed by other Calcite rules which assume that results would be already distinct.
I think it would be better to add an aggregation when converting it to physical rel nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, yes, I will refactor that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vvysotskyi One more question about moving converting rule to physical phase, I need to add physical agg rel node, so needs to add both hash(distribute by all keys/single key) and stream agg, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drill doesn't use streaming aggregate for distinct calls, so only hash agg should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drill doesn't use streaming aggregate for distinct calls, so only hash agg should be enough.
@vvysotskyi I see it checks aggregate.containsDistinctCall() in StreamAggPrule, but It will generate steam agg for sql like "select a,b,c from foo group by a,b,c".
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillSetOpRule.java
Outdated
Show resolved
Hide resolved
logical/src/main/java/org/apache/drill/common/logical/data/visitors/LogicalVisitor.java
Outdated
Show resolved
Hide resolved
Hey @Leon-WTF Any chance you could address @vvysotskyi 's comments soon. This is one of the last PRs slated to be merged for the next release. |
I will do it by this weekend, is that ok? |
@Leon-WTF This weekend would be great! Very excited to get this merged. |
3923425
to
d25c3e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, +1
@Leon-WTF Thanks for this. @vvysotskyi Thanks for the review. Merging now! |
DRILL-4232: Support for EXCEPT and INTERSECT set operator
Description
Can have hash set operator and sorted set operator, only implement hash version in this PR.
Compute number of left-input duplicates(numLeft, probe side) and number of right-input duplicates(numRight, build side) for each same tuple:
INTERSECT: if numRight > 0 and numLeft > 0, output one tuple
INTERSECT ALL: if numRight > 0 and numLeft > 0, output min(numLeft,numRight) tuples
EXCEPT: if numRight = 0 and numLeft > 0, output one tuple
EXCEPT ALL: if numLeft>=numRight, output numLeft - numRight tuples
Documentation
TODO
Testing
See TestSetOp