-
Notifications
You must be signed in to change notification settings - Fork 13.9k
[FLINK-18445][table] Add pre-filter optimization for lookup join #23316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
swuferhong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @lincoln-lil. Thanks for your contribution, I left some comments.
...le-planner/src/main/scala/org/apache/flink/table/planner/codegen/FunctionCodeGenerator.scala
Show resolved
Hide resolved
...runtime/src/main/java/org/apache/flink/table/runtime/generated/GeneratedFilterCondition.java
Show resolved
Hide resolved
| /** Describes a generated {@link FilterCondition}. */ | ||
| public class GeneratedFilterCondition extends GeneratedFunction<FilterCondition> { | ||
|
|
||
| private static final long serialVersionUID = 1L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change serialVersionUID to 2L as other GeneratedXXX do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Flink code guideline, all new serializable class should contain the uid start from 1L, the existed GeneratedXXXs have the number 2L because they have been changed and increased.
| @JsonProperty(FIELD_NAME_PRE_FILTER_CONDITION) | ||
| @JsonInclude(JsonInclude.Include.NON_NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why need add this JsonInclude? Add Itcase & Ut case to cover it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pre-filter condition is a nullable attribute, we can omit it in the json plan which will not affect the serialization & deserialization. This has been covered by the LookupJoinJsonPlanTest, agree you that add another case into LookupJoinJsonPlanITCase
...va/org/apache/flink/table/planner/plan/optimize/StreamNonDeterministicUpdatePlanVisitor.java
Show resolved
Hide resolved
| finalPreFilterCondition.orNull, | ||
| finalRemainingCondition.orNull, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are finalPreFilterCondition and finalRemainingCondition both orNull, but in CommonExexcLookupJoin, only preFilterCondition with json include NON_NULL, but remainingJoinCondition without.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the compatibility, I didn't change the original finalRemainingCondition (also add a NON_NULL json annotation), but the newly added one (finalPreFilterCondition) can be tagged safely.
| val remainingCondition: Option[RexNode] = getRemainingJoinCondition( | ||
| // split remaining condition into pre-filter(used to filter the left input before lookup) and | ||
| // remaining parts(used to filter the joined records) | ||
| val (finalPreFilterCondition, finalRemainingCondition) = splitRemainingJoinCondition( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can rename this method. splitRemainingJoinCondition looks like split the finalRemainingCondition instead of split condition into pre-filter and remaining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just simplified to splitJoinCondition?
...ala/org/apache/flink/table/planner/plan/nodes/physical/common/CommonPhysicalLookupJoin.scala
Outdated
Show resolved
Hide resolved
.../test/java/org/apache/flink/table/planner/plan/nodes/exec/stream/LookupJoinJsonPlanTest.java
Show resolved
Hide resolved
lincoln-lil
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@swuferhong thanks for reviewing this! I've udpated the pr according to your comments.
| @JsonProperty(FIELD_NAME_PRE_FILTER_CONDITION) | ||
| @JsonInclude(JsonInclude.Include.NON_NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pre-filter condition is a nullable attribute, we can omit it in the json plan which will not affect the serialization & deserialization. This has been covered by the LookupJoinJsonPlanTest, agree you that add another case into LookupJoinJsonPlanITCase
...va/org/apache/flink/table/planner/plan/optimize/StreamNonDeterministicUpdatePlanVisitor.java
Show resolved
Hide resolved
| val remainingCondition: Option[RexNode] = getRemainingJoinCondition( | ||
| // split remaining condition into pre-filter(used to filter the left input before lookup) and | ||
| // remaining parts(used to filter the joined records) | ||
| val (finalPreFilterCondition, finalRemainingCondition) = splitRemainingJoinCondition( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just simplified to splitJoinCondition?
| finalPreFilterCondition.orNull, | ||
| finalRemainingCondition.orNull, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the compatibility, I didn't change the original finalRemainingCondition (also add a NON_NULL json annotation), but the newly added one (finalPreFilterCondition) can be tagged safely.
.../test/java/org/apache/flink/table/planner/plan/nodes/exec/stream/LookupJoinJsonPlanTest.java
Show resolved
Hide resolved
| /** Describes a generated {@link FilterCondition}. */ | ||
| public class GeneratedFilterCondition extends GeneratedFunction<FilterCondition> { | ||
|
|
||
| private static final long serialVersionUID = 1L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Flink code guideline, all new serializable class should contain the uid start from 1L, the existed GeneratedXXXs have the number 2L because they have been changed and increased.
...le-planner/src/main/scala/org/apache/flink/table/planner/codegen/FunctionCodeGenerator.scala
Show resolved
Hide resolved
lsyldliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lincoln-lil Thanks for your contribution, the changes look good to me overall. I just left one minor comment. In addition, it make sense to if we can add some tests to cover the batch mode.
|
|
||
| /** remaining join condition except pre-filter & equi-conditions except lookup keys. */ | ||
| @JsonProperty(FIELD_NAME_REMAINING_JOIN_CONDITION) | ||
| private final @Nullable RexNode remainingJoinCondition; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the context, so this field also can add JsonInclude annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason for not adding 'NON_NULL' json annotation is to support older versions of serialized json plan for compatibility reasons.
swuferhong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
lincoln-lil
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lsyldliu thanks for your comments! For the testing, the existed batch case LookupJoinITCase#testLeftJoinTemporalTableWithLocalPredicate already covers the new pre-filter condition path, considering both batch and streaming share the same codegen & runtime operator, and also there's no json plan in batch mode, so I didn't add more case for batch.
|
|
||
| /** remaining join condition except pre-filter & equi-conditions except lookup keys. */ | ||
| @JsonProperty(FIELD_NAME_REMAINING_JOIN_CONDITION) | ||
| private final @Nullable RexNode remainingJoinCondition; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason for not adding 'NON_NULL' json annotation is to support older versions of serialized json plan for compatibility reasons.
|
reorg the commits before merging. |
…enerating pre-filter condition for lookup join
fccc44a to
91d5531
Compare
What is the purpose of the change
As the issue shows there's some chance for optimizing the lookup join when do a left join (maybe full outer join as well in future) which has filter condition on left input in the join condition. We can achieve this by adding a prefilter in lookup join operator, this is what has been done in the pr.
Brief change log
Verifying this change
Does this pull request potentially affect one of the following parts:
Documentation