Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-30841][table-planner] Fix incorrect calc merge to avoid wrong plans #21799

Merged
merged 1 commit into from Feb 3, 2023

Conversation

lincoln-lil
Copy link
Contributor

@lincoln-lil lincoln-lil commented Jan 31, 2023

What is the purpose of the change

currently we have a FlinkCalcMergeRuleTest, take one test as example:

  @Test
  def testCalcMergeWithNonDeterministicExpr1(): Unit = {
    val sqlQuery = "SELECT a, a1 FROM (SELECT a, random_udf(a) AS a1 FROM MyTable) t WHERE a1 > 10"
    util.verifyRelPlan(sqlQuery)
  }

the current final optimized plan will be wrong:

Calc(select=[a, random_udf(b) AS a1], where=[(random_udf(b) > 10)])
+- LegacyTableSourceScan(table=[[default_catalog, default_database, MyTable, source: [TestTableSource(a, b, c)]]], fields=[a, b, c])

the merged calc contains two random_udf call, users may encounter the result satisfied by where predicate (>10) but the selected column <= 10, that's counter-intuitive for users

the expected plan is:

Calc(select=[a, a1], where=[(a1 > 10)])
+- Calc(select=[a, random_udf(b) AS a1])
   +- LegacyTableSourceScan(table=[[default_catalog, default_database, MyTable, source: [TestTableSource(a, b, c)]]], fields=[a, b, c])

Brief change log

  • add a new rules related to this problem, FlinkFilterCalcMergeRule & FlinkFilterProjectTransposeRule

Verifying this change

Add new plan test CalcMergeTest

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @public(Evolving): (no)
  • The serializers: (no )
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)

@flinkbot
Copy link
Collaborator

flinkbot commented Jan 31, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@lincoln-lil lincoln-lil marked this pull request as ready for review January 31, 2023 12:10
@lincoln-lil lincoln-lil changed the title [FLINK-30841][table-planner] Fix incorrect calc merge in streaming to avoid wrong plans [FLINK-30841][table-planner] Fix incorrect calc merge to avoid wrong plans Feb 1, 2023
@lincoln-lil
Copy link
Contributor Author

I'll rebase the lastest master after reviewing, considering there'll be some adapting work to the new rule config changes(the new immutable annotation may cause some inconvenience in IDE) since calcite-1.29 upgrading has been done

Copy link
Contributor

@godfreyhe godfreyhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lincoln-lil
Copy link
Contributor Author

rebased latest master include patches fixing the unstable tests

@lincoln-lil lincoln-lil merged commit c3a376f into apache:master Feb 3, 2023
@lincoln-lil lincoln-lil deleted the FLINK-30841 branch February 3, 2023 13:15
akkinenivijay pushed a commit to krisnaru/flink that referenced this pull request Feb 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants