Skip to content

Conversation

@snuyanzin
Copy link
Contributor

@snuyanzin snuyanzin commented Mar 15, 2023

What is the purpose of the change

The main idea is to backport optimization for FlinkFilterJoiRule from the Calcite's FilterJoinRule done in https://issues.apache.org/jira/browse/CALCITE-4499

Most of the things are done in core/src/main/java/org/apache/calcite/plan/RelOptUtil.java https://github.com/apache/calcite/pull/2349/files#diff-6da50f2ef212d40bcdda85600f1cc1b2022712667fd975fccc3d446122d12facR2804

Here only change usages to the improved method

Brief change log

FlinkFilterJoinRule
plans

Verifying this change

org.apache.flink.table.planner.plan.batch.sql.join.SemiAntiJoinTestBase
org.apache.flink.table.planner.plan.batch.sql.join.BroadcastHashSemiAntiJoinTest
org.apache.flink.table.planner.plan.batch.sql.join.NestedLoopSemiAntiJoinTest
org.apache.flink.table.planner.plan.batch.sql.join.SemiAntiJoinTest
org.apache.flink.table.planner.plan.batch.sql.join.ShuffledHashSemiAntiJoinTest
org.apache.flink.table.planner.plan.batch.sql.join.SortMergeSemiAntiJoinTest

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): ( no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): ( no)
  • The serializers: ( no)
  • The runtime per-record code paths (performance sensitive): ( no )
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no )
  • The S3 file system connector: (no )

Documentation

  • Does this pull request introduce a new feature? ( no)
  • If yes, how is the feature documented? (not applicable)

…filter to semijoin input to FlinkFilterJoinRule
@flinkbot
Copy link
Collaborator

flinkbot commented Mar 15, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@snuyanzin
Copy link
Contributor Author

@lincoln-lil , @godfreyhe sorry for the poke,
can someone of you have a look please?

Copy link
Contributor

@lincoln-lil lincoln-lil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snuyanzin thanks for driving this!
I don't see any problem with this optimization itself, but during local validation I found that the empty output calc node like 'Calc(select=[]...' generated in plan test is not covered in the current stream itcase(SemiAntiJoinStreamITCase, the batch test SemiJoinITCase has covered the case), so it is better to add the corresponding tests like such sql case SELECT a + 10, c FROM l WHERE NOT( b > 10 OR (c like 'abc' OR NOT EXISTS (SELECT d FROM r))), WDYT?

…to empty cover output calc node like `Calc(select=[]...` in stream case
@snuyanzin
Copy link
Contributor Author

Thanks for having a look @lincoln-lil
Yes, makes sense.
I added a test to SemiAntiJoinStreamITCase

Copy link
Contributor

@lincoln-lil lincoln-lil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@snuyanzin snuyanzin merged commit e91eb5e into apache:master Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants