Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45760][SQL][FOLLOWUP] Inline With inside conditional branches #43978

Closed
wants to merge 2 commits into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

This is a followup of #43623 to fix a regression. For With inside conditional branches, they may not be evaluated at all and we should not pull out the common expressions into a Project, but just inline.

Why are the changes needed?

avoid perf regression

Does this PR introduce any user-facing change?

No

How was this patch tested?

new test

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Nov 23, 2023
@cloud-fan
Copy link
Contributor Author

@viirya @peter-toth

Comment on lines +99 to +100
val newAlwaysEvaluatedInputs = c.alwaysEvaluatedInputs.map(
rewriteWithExprAndInputPlans(_, inputPlans))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dealing with common expressions only in always evaluated input e.g., predicate of If.

How about common expressions shared between predicate and branches?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it before. The problem is that it's hard to update the original ConditionalExpression with the new shared common expressions. alwaysEvaluatedInputs is static so that I can let every ConditionalExpression to implement a method to update it.

e match {
case w: With =>
// Rewrite nested With expression in CommonExpressionDef first.
val defs = w.defs.map(rewriteWithExprAndInputPlans(_, inputPlans))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have "manual" recursion (instead of transformExpressionsUp()), shall we deal with nested Withs in w.child too?

Copy link
Contributor

@peter-toth peter-toth Nov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the current logic seems to behave correctly if there is an inner With in an outer With's child and the inner has a definition with a reference to an outer definition . (The previous transformExpressionsUp() had issues in that case.) But the rule is not idempotent now, so maybe we should recurse into w.child after replacing CommonExpressionRefs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good catch! It seems doesn't matter when to recurse into w.child, either before replacing CommonExpressionRef or after is fine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe before is better, as the expression tree may be much larger after replacing CommonExpressionRef

Copy link
Contributor

@peter-toth peter-toth Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. E.g. if we have With(With(x + x, Seq(x = y + y)), Seq(y = a + 1)) where x and y are references and a is an attribute and we would recurse into With(x + x, Seq(x = y + y)) before replacing the y references to actual attributes, that aliases a + 1, then the childProjectionIndex calculation for y + y won't find the right child, will it? But an UT covering this case would be good. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh correlated nested With! I'm not sure if we want to support it or not... But at least we should fail if we don't want to support it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we may need a test for that (either supported or failed if not).

Comment on lines +104 to +106
case With(child, defs) =>
// For With in the conditional branches, they may not be evaluated at all and we can't
// pull the common expressions into a project which will always be evaluated. Inline it.
Copy link
Member

@viirya viirya Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, for specific conditional expression, e.g., If, we can still extract common expression shared on both branches which will be evaluated for sure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as https://github.com/apache/spark/pull/43978/files#r1403392772 .

It's easy to find these common expressions shared on both branches, but it's hard to put them back to If. I think it's better to do it when we make it into a general rule that find shared common expressions and create With to deduplicate.

@Kimahriman
Copy link
Contributor

Trying to follow along with this since duplicate expression evaluations have been a huge pain for us for a while (mostly on the execution side, but having incomprehensible explain strings isn't fun either). Am I understanding this correctly that this effectively negates the NullIf use of the With expression? And this still won't have any real uses until some follow up (like "a general rule that find shared common expressions")?

@cloud-fan
Copy link
Contributor Author

And this still won't have any real uses until some follow up (like "a general rule that find shared common expressions")?

NullIf uses it already. We can avoid duplicating expressions in NullIf if it's not used in unsupported plan nodes like join condition.

@Kimahriman
Copy link
Contributor

NullIf uses it already. We can avoid duplicating expressions in NullIf if it's not used in unsupported plan nodes like join condition.

So is this just saying not to recurse into conditional expressions to find With expressions, but a With wrapping a conditional expression will still have the common expression pulled into a new project?

@cloud-fan
Copy link
Contributor Author

@Kimahriman Yes. With is a very conservative solution for now and fixes the problem only for NullIf in certain cases. We will improve it and expand the usage later.

@cloud-fan
Copy link
Contributor Author

The doc issue is unrelated

don't know which module to import for autodocumenting 'apply_batch' (try placing a "module" or "currentmodule" directive in the document, or giving an explicit module name)
make: *** [Makefile:35: html] Error 2

cc @HyukjinKwon @LuciferYang

I'm merging it to master, thanks for review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants