New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-3089] Deprecate EquiJoin #1247
Conversation
@@ -49,7 +50,8 @@ | |||
public abstract class FilterJoinRule extends RelOptRule { | |||
/** Predicate that always returns true. With this predicate, every filter | |||
* will be pushed into the ON clause. */ | |||
public static final Predicate TRUE_PREDICATE = (join, joinType, exp) -> true; | |||
public static final Predicate TRUE_PREDICATE = (join, joinType, exp) -> | |||
join.getConvention() != EnumerableConvention.INSTANCE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be changed back once CALCITE-2973 is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a TODO comment in the code to not forget about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should add in a Calcite convention to a logical rule,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We thought it is logical rule. Unfortunately, this rule also applies to physical operators. And the default Enumerable operators don't support hash join with non-equi conditions, that is why we have to rule it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we likely to see a fix to CALCITE-2973 reasonably soon? I agree that the code should not depend on EnumerableConvention, but we can live with it short-term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked with Lai Zhou, who said the fix to CALCITE-2973 can be done in a week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, even if this is a short-term change, let's rename TRUE_PREDICATE.
It's bad form to redefine TRUE (or FALSE, 0, 1, Pi or most especially Planck's constant).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw you renamed TRUE_PREDICATE to TRUE. That doesn't solve the problem. It was called TRUE_PREDICATE because it always returned true. Now it doesn't, so its name is wrong.
Maybe leave TRUE_PREDICATE as is (i.e. returning true), deprecate it, and use an inline lambda.
join.getVariablesSet(), | ||
join.getJoinType()); | ||
RexNode nonEqui = info.getRemaining(cluster.getRexBuilder()); | ||
if (!nonEqui.isAlwaysTrue()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if check is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is necessary. But I prefer use !info.isEqui()
since it is more clear?.
this.leftKeys = Objects.requireNonNull(leftKeys); | ||
this.rightKeys = Objects.requireNonNull(rightKeys); | ||
this.nonEqui = nonEqui; | ||
assert leftKeys.size() == rightKeys.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nonEquiConditions or remaining ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nonEquiConditions might be better.
public boolean isEqui() { | ||
return (nonEqui == null || nonEqui.isAlwaysTrue()) | ||
&& leftKeys.size() == rightKeys.size(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftKeys.size()
should always equals with rightKeys.size()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, this is sanity check.
// no longer an equi-join. SemiJoin is an example of this. | ||
return; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can not remove this, the behavior would be non-consistent, how about other systems use this rule and their join does not support non-equi conditions ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the other systems should update their own implementation, as EquiJoin is deprecated.
join.getVariablesSet(), | ||
join.getJoinType()); | ||
RexNode nonEqui = info.getRemaining(cluster.getRexBuilder()); | ||
if (!nonEqui.isAlwaysTrue()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is necessary. But I prefer use !info.isEqui()
since it is more clear?.
@@ -58,35 +58,26 @@ | |||
final RelNode right = newInputs.get(1); | |||
final JoinInfo info = JoinInfo.of(left, right, join.getCondition()); | |||
if (!info.isEqui() && join.getJoinType() != JoinRelType.INNER) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use join.analyzeCondition()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -56,8 +55,7 @@ protected EnumerableHashJoin( | |||
RelNode right, | |||
RexNode condition, | |||
Set<CorrelationId> variablesSet, | |||
JoinRelType joinType) | |||
throws InvalidRelException { | |||
JoinRelType joinType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add in this constructor a check on the condition (to verify that it is actually an equi-join, which is required by the underlying implementation algorithm)?
if (!JoinInfo.of(left, right, condition).isEqui())
throw new IllegalArgumentException("...");
or maybe at least an assert
assert JoinInfo.of(left, right, condition).isEqui();
Idem for EnumerableMergeJoin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, make sense. Will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change for the rules are kind of breaking, i'm -1 on this.
I don't think it is a breaking change, comparing with other changes like removing SemiJoin. |
I don't think so, removing SemiJoin have a fallback, user can use the Join instead, but with the rules change in this patch, there is no way to fix the regression if they use these rules from Calcite, users only have 2 way to fix this
Either is not an easy way to go. |
The other system's physical implementation should not limit what Calcite can do or can't do. Otherwise, there is no way to move forward. |
I agree with the point that other system's physical implementation should not limit what Calcite can do , but we at least let user choose whether they what to push non-equal join conditions into the join operator, just like Calcite itself, apparently, not every join operator support non-equal join conditions. |
The user can limit the rule to only apply on logical operators, instead of both logical operators and physical operators. And it makes more sense to apply on logical operators only, in which case, the problem you stated is not a concern anymore. |
@danny0405 @hsyuan What if we added a new method to |
Thanks, @rubenada , adding method to |
Ok @danny0405 , I understand |
@amitvc Please take a look, I guess you are interested in this change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a breaking change?That is, is there code that compiles and runs against 1.20 that will fail to compile and run against 1.21? If so, let's give this a special mention in the release notes.
This is a useful refactoring; thanks for pushing it through.
@@ -49,7 +50,8 @@ | |||
public abstract class FilterJoinRule extends RelOptRule { | |||
/** Predicate that always returns true. With this predicate, every filter | |||
* will be pushed into the ON clause. */ | |||
public static final Predicate TRUE_PREDICATE = (join, joinType, exp) -> true; | |||
public static final Predicate TRUE_PREDICATE = (join, joinType, exp) -> | |||
join.getConvention() != EnumerableConvention.INSTANCE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we likely to see a fix to CALCITE-2973 reasonably soon? I agree that the code should not depend on EnumerableConvention, but we can live with it short-term.
public final ImmutableIntList leftKeys; | ||
public final ImmutableIntList rightKeys; | ||
protected final RexNode nonEquiCondition; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My instinct is to make everything non-null. I don't like the test nonEquiCondition == null || nonEquiCondition.isAlwaysTrue()
, for instance.
Consider making nonEquiCondition
a conjunctive list (immutable, final and not null); if the list is empty, that means true. We do this in several other places, and it is useful, because caller can assume that ANDs have been flattened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getRemaining
is a public method and returns RexNode
. If we make nonEquiCondition
a conjunctive list, we have to convert to a RexNode
every time we call getRemainng
. How about checking the nullness of input nonEquiCondition
in the JoinInfo
constructor, set it to literal TRUE
rexnode if it is null
? At the same time, add @Nonnull
to getRemaining
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revoke what I just said, the is no way to pass RexBuilder
in the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you should allow availability of RexBuilder
to sway the design. When a JoinInfo
is created, there is always a RexBuilder
available, and that RexBuilder
(and its RelDataTypeFactory
) remain valid for the lifetime of the JoinInfo
.
Representing conditions as conjunctive lists seems to be very convenient. For example, consider the RelOptUtil.splitJoinCondition
method, which is the source of JoinInfo.remaining
. The last thing it does is to convert a conjunctive list into a RexNode
. Other callers of that method either ignore the return, or immediately convert the RexNode
back into a conjunctive list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. I will change nonEquiCondition to conjunctive list. But looks like we have to keep public RexNode getRemaining(RexBuilder rexBuilder)
unchanged, since it is a public API.
@julianhyde I addressed your comments. Thanks! |
public abstract RexNode getRemaining(RexBuilder rexBuilder); | ||
public RexNode getRemaining(RexBuilder rexBuilder) { | ||
return RexUtil.composeConjunction(rexBuilder, nonEquiConditions); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's so weird that we pass in nonEquiConditions
into the constructor and return a RexNode
here, can we have a better solution ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but given this is a public API, we can't change it to returning list instead of a RexNode. What we can do is deprecating this method, and introduce another one returning conjunction list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm +1 on deprecate this method and introduce another one returning conjunction list.
JIRA: https://issues.apache.org/jira/browse/CALCITE-3089
This patch is intended for v1.21.0