Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-3089] Deprecate EquiJoin #1247

Closed
wants to merge 4 commits into from
Closed

[CALCITE-3089] Deprecate EquiJoin #1247

wants to merge 4 commits into from

Conversation

hsyuan
Copy link
Member

@hsyuan hsyuan commented May 31, 2019

JIRA: https://issues.apache.org/jira/browse/CALCITE-3089

This patch is intended for v1.21.0

@@ -49,7 +50,8 @@
public abstract class FilterJoinRule extends RelOptRule {
/** Predicate that always returns true. With this predicate, every filter
* will be pushed into the ON clause. */
public static final Predicate TRUE_PREDICATE = (join, joinType, exp) -> true;
public static final Predicate TRUE_PREDICATE = (join, joinType, exp) ->
join.getConvention() != EnumerableConvention.INSTANCE;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be changed back once CALCITE-2973 is done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a TODO comment in the code to not forget about it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should add in a Calcite convention to a logical rule,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We thought it is logical rule. Unfortunately, this rule also applies to physical operators. And the default Enumerable operators don't support hash join with non-equi conditions, that is why we have to rule it out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we likely to see a fix to CALCITE-2973 reasonably soon? I agree that the code should not depend on EnumerableConvention, but we can live with it short-term.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked with Lai Zhou, who said the fix to CALCITE-2973 can be done in a week.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, even if this is a short-term change, let's rename TRUE_PREDICATE.

It's bad form to redefine TRUE (or FALSE, 0, 1, Pi or most especially Planck's constant).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw you renamed TRUE_PREDICATE to TRUE. That doesn't solve the problem. It was called TRUE_PREDICATE because it always returned true. Now it doesn't, so its name is wrong.

Maybe leave TRUE_PREDICATE as is (i.e. returning true), deprecate it, and use an inline lambda.

join.getVariablesSet(),
join.getJoinType());
RexNode nonEqui = info.getRemaining(cluster.getRexBuilder());
if (!nonEqui.isAlwaysTrue()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if check is redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is necessary. But I prefer use !info.isEqui() since it is more clear?.

this.leftKeys = Objects.requireNonNull(leftKeys);
this.rightKeys = Objects.requireNonNull(rightKeys);
this.nonEqui = nonEqui;
assert leftKeys.size() == rightKeys.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nonEquiConditions or remaining ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nonEquiConditions might be better.

public boolean isEqui() {
return (nonEqui == null || nonEqui.isAlwaysTrue())
&& leftKeys.size() == rightKeys.size();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftKeys.size() should always equals with rightKeys.size()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this is sanity check.

// no longer an equi-join. SemiJoin is an example of this.
return;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can not remove this, the behavior would be non-consistent, how about other systems use this rule and their join does not support non-equi conditions ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the other systems should update their own implementation, as EquiJoin is deprecated.

join.getVariablesSet(),
join.getJoinType());
RexNode nonEqui = info.getRemaining(cluster.getRexBuilder());
if (!nonEqui.isAlwaysTrue()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is necessary. But I prefer use !info.isEqui() since it is more clear?.

@@ -58,35 +58,26 @@
final RelNode right = newInputs.get(1);
final JoinInfo info = JoinInfo.of(left, right, join.getCondition());
if (!info.isEqui() && join.getJoinType() != JoinRelType.INNER) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use join.analyzeCondition()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@chunweilei
Copy link
Contributor

LGTM

Copy link
Contributor

@chunweilei chunweilei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@chunweilei chunweilei added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Aug 6, 2019
@@ -56,8 +55,7 @@ protected EnumerableHashJoin(
RelNode right,
RexNode condition,
Set<CorrelationId> variablesSet,
JoinRelType joinType)
throws InvalidRelException {
JoinRelType joinType) {
Copy link
Contributor

@rubenada rubenada Aug 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add in this constructor a check on the condition (to verify that it is actually an equi-join, which is required by the underlying implementation algorithm)?

if (!JoinInfo.of(left, right, condition).isEqui())
  throw new IllegalArgumentException("...");

or maybe at least an assert
assert JoinInfo.of(left, right, condition).isEqui();
Idem for EnumerableMergeJoin.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, make sense. Will do

Copy link
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change for the rules are kind of breaking, i'm -1 on this.

@hsyuan
Copy link
Member Author

hsyuan commented Aug 7, 2019

I don't think it is a breaking change, comparing with other changes like removing SemiJoin.

@danny0405
Copy link
Contributor

I don't think it is a breaking change, comparing with other changes like removing SemiJoin.

I don't think so, removing SemiJoin have a fallback, user can use the Join instead, but with the rules change in this patch, there is no way to fix the regression if they use these rules from Calcite, users only have 2 way to fix this

  1. copy the whole rule out and add a logic what you removed
  2. change their physical operators to allow non-equal join condition

Either is not an easy way to go.

@hsyuan
Copy link
Member Author

hsyuan commented Aug 7, 2019

The other system's physical implementation should not limit what Calcite can do or can't do. Otherwise, there is no way to move forward.

@danny0405
Copy link
Contributor

The other system's physical implementation should not limit what Calcite can do or can't do. Otherwise, there is no way to move forward.

I agree with the point that other system's physical implementation should not limit what Calcite can do , but we at least let user choose whether they what to push non-equal join conditions into the join operator, just like Calcite itself, apparently, not every join operator support non-equal join conditions.

@hsyuan
Copy link
Member Author

hsyuan commented Aug 7, 2019

The user can limit the rule to only apply on logical operators, instead of both logical operators and physical operators. And it makes more sense to apply on logical operators only, in which case, the problem you stated is not a concern anymore.

@rubenada
Copy link
Contributor

rubenada commented Aug 7, 2019

@danny0405 @hsyuan What if we added a new method to Join, something like: public boolean supportsNonEquiJoin() with a default implementation that just return true, and then each physical Join implementation that supports only EquiJoin conditions (like EnumerableHashJoin / EnumerableMergeJoin) will have to override it to return false. This method could be use as verification check by the rules, replacing somehow the current RelOptUtil#forceEquiJoin (which is removed by the current patch).

@danny0405
Copy link
Contributor

@danny0405 @hsyuan What if we added a new method to Join, something like: public boolean supportsNonEquiJoin() with a default implementation that just return true, and then each physical Join implementation that supports only EquiJoin conditions (like EnumerableHashJoin / EnumerableMergeJoin) will have to override it to return false. This method could be use as verification check by the rules, replacing somehow the current RelOptUtil#forceEquiJoin (which is removed by the current patch).

Thanks, @rubenada , adding method to Join does not seem a good idea, what we only need is just let user control the behavior of the 2 rules.

@rubenada
Copy link
Contributor

rubenada commented Aug 7, 2019

Ok @danny0405 , I understand

@eolivelli
Copy link
Contributor

@amitvc Please take a look, I guess you are interested in this change

Copy link
Contributor

@julianhyde julianhyde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a breaking change?That is, is there code that compiles and runs against 1.20 that will fail to compile and run against 1.21? If so, let's give this a special mention in the release notes.

This is a useful refactoring; thanks for pushing it through.

@@ -49,7 +50,8 @@
public abstract class FilterJoinRule extends RelOptRule {
/** Predicate that always returns true. With this predicate, every filter
* will be pushed into the ON clause. */
public static final Predicate TRUE_PREDICATE = (join, joinType, exp) -> true;
public static final Predicate TRUE_PREDICATE = (join, joinType, exp) ->
join.getConvention() != EnumerableConvention.INSTANCE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we likely to see a fix to CALCITE-2973 reasonably soon? I agree that the code should not depend on EnumerableConvention, but we can live with it short-term.

public final ImmutableIntList leftKeys;
public final ImmutableIntList rightKeys;
protected final RexNode nonEquiCondition;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My instinct is to make everything non-null. I don't like the test nonEquiCondition == null || nonEquiCondition.isAlwaysTrue(), for instance.

Consider making nonEquiCondition a conjunctive list (immutable, final and not null); if the list is empty, that means true. We do this in several other places, and it is useful, because caller can assume that ANDs have been flattened.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getRemaining is a public method and returns RexNode. If we make nonEquiCondition a conjunctive list, we have to convert to a RexNode every time we call getRemainng. How about checking the nullness of input nonEquiCondition in the JoinInfo constructor, set it to literal TRUE rexnode if it is null? At the same time, add @Nonnull to getRemaining.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revoke what I just said, the is no way to pass RexBuilder in the constructor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should allow availability of RexBuilder to sway the design. When a JoinInfo is created, there is always a RexBuilder available, and that RexBuilder (and its RelDataTypeFactory) remain valid for the lifetime of the JoinInfo.

Representing conditions as conjunctive lists seems to be very convenient. For example, consider the RelOptUtil.splitJoinCondition method, which is the source of JoinInfo.remaining. The last thing it does is to convert a conjunctive list into a RexNode. Other callers of that method either ignore the return, or immediately convert the RexNode back into a conjunctive list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. I will change nonEquiCondition to conjunctive list. But looks like we have to keep public RexNode getRemaining(RexBuilder rexBuilder) unchanged, since it is a public API.

@hsyuan
Copy link
Member Author

hsyuan commented Aug 13, 2019

@julianhyde I addressed your comments. Thanks!

public abstract RexNode getRemaining(RexBuilder rexBuilder);
public RexNode getRemaining(RexBuilder rexBuilder) {
return RexUtil.composeConjunction(rexBuilder, nonEquiConditions);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's so weird that we pass in nonEquiConditions into the constructor and return a RexNode here, can we have a better solution ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but given this is a public API, we can't change it to returning list instead of a RexNode. What we can do is deprecating this method, and introduce another one returning conjunction list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 on deprecate this method and introduce another one returning conjunction list.

julianhyde pushed a commit to julianhyde/calcite that referenced this pull request Aug 15, 2019
julianhyde pushed a commit to julianhyde/calcite that referenced this pull request Aug 16, 2019
julianhyde pushed a commit to julianhyde/calcite that referenced this pull request Aug 16, 2019
julianhyde pushed a commit to julianhyde/calcite that referenced this pull request Aug 20, 2019
@asfgit asfgit closed this in 36e3109 Aug 20, 2019
wangxlong pushed a commit to wangxlong/calcite that referenced this pull request Feb 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LGTM-will-merge-soon Overall PR looks OK. Only minor things left.
Projects
None yet
6 participants