[SPARK-39259][SQL][FOLLOWUP] Fix source and binary incompatibilities in transformDownWithSubqueries #36765

JoshRosen · 2022-06-04T01:44:01Z

What changes were proposed in this pull request?

This is a followup to #36654. That PR modified the existing QueryPlan.transformDownWithSubqueries to add additional arguments for tree pattern pruning.

In this PR, I roll back the change to that method's signature and instead add a new transformDownWithSubqueriesAndPruning method.

Why are the changes needed?

The original change breaks binary and source compatibility in Catalyst. Technically speaking, Catalyst APIs are considered internal to Spark and are subject to change between minor releases (see source), but I think it's nice to try to avoid API breakage when possible.

While trying to compile some custom Catalyst code, I ran into issues when trying to call the transformDownWithSubqueries method without supplying a tree pattern filter condition. If I do transformDownWithSubqueries() { f } then I get a compilation error. I think this is due to the first parameter group containing all default parameters.

My PR's solution of adding a new transformDownWithSubqueriesAndPruning method solves this problem. It's also more consistent with the naming convention used for other pruning-enabled tree transformation methods.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

MaxGekk · 2022-06-04T06:11:44Z

+1, LGTM. Merging to master/3.3.
Thank you, @JoshRosen.

…in transformDownWithSubqueries ### What changes were proposed in this pull request? This is a followup to #36654. That PR modified the existing `QueryPlan.transformDownWithSubqueries` to add additional arguments for tree pattern pruning. In this PR, I roll back the change to that method's signature and instead add a new `transformDownWithSubqueriesAndPruning` method. ### Why are the changes needed? The original change breaks binary and source compatibility in Catalyst. Technically speaking, Catalyst APIs are considered internal to Spark and are subject to change between minor releases (see [source](https://github.com/apache/spark/blob/bb51add5c79558df863d37965603387d40cc4387/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/package.scala#L20-L24)), but I think it's nice to try to avoid API breakage when possible. While trying to compile some custom Catalyst code, I ran into issues when trying to call the `transformDownWithSubqueries` method without supplying a tree pattern filter condition. If I do `transformDownWithSubqueries() { f} ` then I get a compilation error. I think this is due to the first parameter group containing all default parameters. My PR's solution of adding a new `transformDownWithSubqueriesAndPruning` method solves this problem. It's also more consistent with the naming convention used for other pruning-enabled tree transformation methods. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes #36765 from JoshRosen/SPARK-39259-binary-compatibility-followup. Authored-by: Josh Rosen <joshrosen@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit eda6c4b) Signed-off-by: Max Gekk <max.gekk@gmail.com>

HyukjinKwon · 2022-06-04T07:46:20Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala

+   * first to this node, then this node's subqueries and finally this node's children.
+   * When the partial function does not apply to a given node, it is left unchanged.
+   */
+  def transformDownWithSubqueriesAndPruning(


I was about to say we shouldn't make these changes for binary compatibility for internal API (e.g., #35378) but reading the codes, it looks more like a refactoring. So LGTM from me 2.

Fix source and binary incompatibilities

583a9c7

JoshRosen requested review from MaxGekk and dongjoon-hyun June 4, 2022 01:44

github-actions bot added the SQL label Jun 4, 2022

JoshRosen mentioned this pull request Jun 4, 2022

[SPARK-39259][SQL] Evaluate timestamps consistently in subqueries #36654

Closed

JoshRosen changed the title ~~[SPARK-39259][FOLLOWUP] Fix source and binary incompatibilities in transformDownWithSubqueries~~ [SPARK-39259][SQL][FOLLOWUP] Fix source and binary incompatibilities in transformDownWithSubqueries Jun 4, 2022

MaxGekk approved these changes Jun 4, 2022

View reviewed changes

MaxGekk closed this in eda6c4b Jun 4, 2022

HyukjinKwon reviewed Jun 4, 2022

View reviewed changes

JoshRosen mentioned this pull request Jun 6, 2022

[SPARK-39259][SQL][3.2] Evaluate timestamps consistently in subqueries #36753

Closed

dongjoon-hyun mentioned this pull request Jun 9, 2022

Revert "[SPARK-37670][SQL] Support predicate pushdown and column pruning for de-duped CTEs" #36819

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-39259][SQL][FOLLOWUP] Fix source and binary incompatibilities in transformDownWithSubqueries #36765

[SPARK-39259][SQL][FOLLOWUP] Fix source and binary incompatibilities in transformDownWithSubqueries #36765

JoshRosen commented Jun 4, 2022 •

edited

Loading

MaxGekk commented Jun 4, 2022

HyukjinKwon Jun 4, 2022

[SPARK-39259][SQL][FOLLOWUP] Fix source and binary incompatibilities in transformDownWithSubqueries #36765

[SPARK-39259][SQL][FOLLOWUP] Fix source and binary incompatibilities in transformDownWithSubqueries #36765

Conversation

JoshRosen commented Jun 4, 2022 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

MaxGekk commented Jun 4, 2022

HyukjinKwon Jun 4, 2022

Choose a reason for hiding this comment

JoshRosen commented Jun 4, 2022 •

edited

Loading