[FLINK-18445][table] Add pre-filter optimization for lookup join #23316

lincoln-lil · 2023-08-29T03:44:05Z

What is the purpose of the change

As the issue shows there's some chance for optimizing the lookup join when do a left join (maybe full outer join as well in future) which has filter condition on left input in the join condition. We can achieve this by adding a prefilter in lookup join operator, this is what has been done in the pr.

Brief change log

add a new FilterCondition to the codegen part
add pre-filter (via codegen) for lookup join operator

Verifying this change

json plan test (LookupJoinJsonPlanTest)
lookup join operator tests (LookupJoinHarnessTest KeyedLookupJoinHarnessTest AsyncLookupJoinHarnessTest)
lookup join itcase (LookupJoinITCase)

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (no)
The public API, i.e., is any changed class annotated with @public(Evolving): (no)
The serializers: (no )
The runtime per-record code paths (performance sensitive): (no)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
The S3 file system connector: (no)

Documentation

Does this pull request introduce a new feature? (no)

flinkbot · 2023-08-29T03:51:59Z

CI report:

91d5531 Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

swuferhong

Hi @lincoln-lil. Thanks for your contribution, I left some comments.

...le-planner/src/main/scala/org/apache/flink/table/planner/codegen/FunctionCodeGenerator.scala

...runtime/src/main/java/org/apache/flink/table/runtime/generated/GeneratedFilterCondition.java

swuferhong · 2023-08-29T11:52:07Z

...runtime/src/main/java/org/apache/flink/table/runtime/generated/GeneratedFilterCondition.java

+/** Describes a generated {@link FilterCondition}. */
+public class GeneratedFilterCondition extends GeneratedFunction<FilterCondition> {
+
+    private static final long serialVersionUID = 1L;


Change serialVersionUID to 2L as other GeneratedXXX do.

As Flink code guideline, all new serializable class should contain the uid start from 1L, the existed GeneratedXXXs have the number 2L because they have been changed and increased.

swuferhong · 2023-08-29T12:21:11Z

...rc/main/java/org/apache/flink/table/planner/plan/nodes/exec/common/CommonExecLookupJoin.java

+    @JsonProperty(FIELD_NAME_PRE_FILTER_CONDITION)
+    @JsonInclude(JsonInclude.Include.NON_NULL)


Why need add this JsonInclude? Add Itcase & Ut case to cover it.

pre-filter condition is a nullable attribute, we can omit it in the json plan which will not affect the serialization & deserialization. This has been covered by the LookupJoinJsonPlanTest, agree you that add another case into LookupJoinJsonPlanITCase

...va/org/apache/flink/table/planner/plan/optimize/StreamNonDeterministicUpdatePlanVisitor.java

swuferhong · 2023-08-29T12:31:05Z

...ala/org/apache/flink/table/planner/plan/nodes/physical/stream/StreamPhysicalLookupJoin.scala

+      finalPreFilterCondition.orNull,
+      finalRemainingCondition.orNull,


Why are finalPreFilterCondition and finalRemainingCondition both orNull, but in CommonExexcLookupJoin, only preFilterCondition with json include NON_NULL, but remainingJoinCondition without.

Consider the compatibility, I didn't change the original finalRemainingCondition (also add a NON_NULL json annotation), but the newly added one (finalPreFilterCondition) can be tagged safely.

swuferhong · 2023-08-29T12:34:35Z

...ala/org/apache/flink/table/planner/plan/nodes/physical/common/CommonPhysicalLookupJoin.scala

-  val remainingCondition: Option[RexNode] = getRemainingJoinCondition(
+  // split remaining condition into pre-filter(used to filter the left input before lookup) and
+  // remaining parts(used to filter the joined records)
+  val (finalPreFilterCondition, finalRemainingCondition) = splitRemainingJoinCondition(


Maybe we can rename this method. splitRemainingJoinCondition looks like split the finalRemainingCondition instead of split condition into pre-filter and remaining.

Just simplified to splitJoinCondition?

...ala/org/apache/flink/table/planner/plan/nodes/physical/common/CommonPhysicalLookupJoin.scala

.../test/java/org/apache/flink/table/planner/plan/nodes/exec/stream/LookupJoinJsonPlanTest.java

lincoln-lil

@swuferhong thanks for reviewing this! I've udpated the pr according to your comments.

lincoln-lil · 2023-08-29T13:10:00Z

...rc/main/java/org/apache/flink/table/planner/plan/nodes/exec/common/CommonExecLookupJoin.java

+    @JsonProperty(FIELD_NAME_PRE_FILTER_CONDITION)
+    @JsonInclude(JsonInclude.Include.NON_NULL)


pre-filter condition is a nullable attribute, we can omit it in the json plan which will not affect the serialization & deserialization. This has been covered by the LookupJoinJsonPlanTest, agree you that add another case into LookupJoinJsonPlanITCase

...va/org/apache/flink/table/planner/plan/optimize/StreamNonDeterministicUpdatePlanVisitor.java

lincoln-lil · 2023-08-29T13:17:55Z

...ala/org/apache/flink/table/planner/plan/nodes/physical/common/CommonPhysicalLookupJoin.scala

-  val remainingCondition: Option[RexNode] = getRemainingJoinCondition(
+  // split remaining condition into pre-filter(used to filter the left input before lookup) and
+  // remaining parts(used to filter the joined records)
+  val (finalPreFilterCondition, finalRemainingCondition) = splitRemainingJoinCondition(


Just simplified to splitJoinCondition?

lincoln-lil · 2023-08-29T13:21:17Z

...ala/org/apache/flink/table/planner/plan/nodes/physical/stream/StreamPhysicalLookupJoin.scala

+      finalPreFilterCondition.orNull,
+      finalRemainingCondition.orNull,


Consider the compatibility, I didn't change the original finalRemainingCondition (also add a NON_NULL json annotation), but the newly added one (finalPreFilterCondition) can be tagged safely.

.../test/java/org/apache/flink/table/planner/plan/nodes/exec/stream/LookupJoinJsonPlanTest.java

lincoln-lil · 2023-08-29T13:32:04Z

...runtime/src/main/java/org/apache/flink/table/runtime/generated/GeneratedFilterCondition.java

+/** Describes a generated {@link FilterCondition}. */
+public class GeneratedFilterCondition extends GeneratedFunction<FilterCondition> {
+
+    private static final long serialVersionUID = 1L;


As Flink code guideline, all new serializable class should contain the uid start from 1L, the existed GeneratedXXXs have the number 2L because they have been changed and increased.

...le-planner/src/main/scala/org/apache/flink/table/planner/codegen/FunctionCodeGenerator.scala

lsyldliu

@lincoln-lil Thanks for your contribution, the changes look good to me overall. I just left one minor comment. In addition, it make sense to if we can add some tests to cover the batch mode.

lsyldliu · 2023-08-30T09:40:09Z

...rc/main/java/org/apache/flink/table/planner/plan/nodes/exec/common/CommonExecLookupJoin.java

+
+    /** remaining join condition except pre-filter & equi-conditions except lookup keys. */
+    @JsonProperty(FIELD_NAME_REMAINING_JOIN_CONDITION)
+    private final @Nullable RexNode remainingJoinCondition;


According to the context, so this field also can add JsonInclude annotation?

The main reason for not adding 'NON_NULL' json annotation is to support older versions of serialized json plan for compatibility reasons.

swuferhong

LGTM +1

lincoln-lil

@lsyldliu thanks for your comments! For the testing, the existed batch case LookupJoinITCase#testLeftJoinTemporalTableWithLocalPredicate already covers the new pre-filter condition path, considering both batch and streaming share the same codegen & runtime operator, and also there's no json plan in batch mode, so I didn't add more case for batch.

lincoln-lil · 2023-08-30T10:06:10Z

...rc/main/java/org/apache/flink/table/planner/plan/nodes/exec/common/CommonExecLookupJoin.java

+
+    /** remaining join condition except pre-filter & equi-conditions except lookup keys. */
+    @JsonProperty(FIELD_NAME_REMAINING_JOIN_CONDITION)
+    private final @Nullable RexNode remainingJoinCondition;


The main reason for not adding 'NON_NULL' json annotation is to support older versions of serialized json plan for compatibility reasons.

lincoln-lil · 2023-08-30T12:11:44Z

reorg the commits before merging.

…enerating pre-filter condition for lookup join

swuferhong reviewed Aug 29, 2023

View reviewed changes

lincoln-lil commented Aug 29, 2023

View reviewed changes

lsyldliu reviewed Aug 30, 2023

View reviewed changes

swuferhong approved these changes Aug 30, 2023

View reviewed changes

lincoln-lil commented Aug 30, 2023

View reviewed changes

lincoln-lil added 2 commits August 30, 2023 20:12

[FLINK-18445][table] Pre-step: add a new FilterCondition to support g…

e785dc4

…enerating pre-filter condition for lookup join

[FLINK-18445][table] Add pre-filtering optimization for lookup join

91d5531

lincoln-lil force-pushed the FLINK-18445 branch from fccc44a to 91d5531 Compare August 30, 2023 12:12

lincoln-lil merged commit 360b97a into apache:master Aug 31, 2023

lincoln-lil deleted the FLINK-18445 branch August 31, 2023 01:04

flinkbot added the component=TableSQL/Runtime label Apr 4, 2024

		@JsonProperty(FIELD_NAME_PRE_FILTER_CONDITION)
		@JsonInclude(JsonInclude.Include.NON_NULL)

		finalPreFilterCondition.orNull,
		finalRemainingCondition.orNull,

[FLINK-18445][table] Add pre-filter optimization for lookup join #23316

[FLINK-18445][table] Add pre-filter optimization for lookup join #23316

Uh oh!

Conversation

lincoln-lil commented Aug 29, 2023

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

flinkbot commented Aug 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI report:

Uh oh!

swuferhong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lincoln-lil left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lsyldliu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

swuferhong left a comment

Choose a reason for hiding this comment

Uh oh!

lincoln-lil left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lincoln-lil commented Aug 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

flinkbot commented Aug 29, 2023 •

edited

Loading