Add LogicalPlanSignature and use in the optimizer loop #5623

mslapek · 2023-03-16T18:03:56Z

Which issue does this PR close?

Closes #3892

Removes a blocker for #4628 (allows to have only one LogicalPlan in RAM)

Attractive alternative for returning Option<LogicalPlan> in the optimizer loop (see the comment #4628 (comment))

Rationale for this change

Explained in the issue.

What changes are included in this PR?

Added a new structure LogicalPlanSignature with compute method.

Used LogicalPlanSignature in the main optimization loop to detect cycles with hash-set.

(Maybe we should raise default max_passes in optimizer? We can detect cycles.)

Are these changes tested?

Added unit-test for computation in LogicalPlanSignature (for get_node_number function).

Added unit-tests to optimizer.rs testing cycle-detection ability.

Existing tests to detect regression.

Are there any user-facing changes?

NO API changes. The new module with LogicalPlanSignature is private (mod vs pub use).

alamb

I really like this idea, and the code is well commented and tested - thank you @mslapek . 🏅

In general I think the chance of hash collisions is very small.

Even if collisions happen, I think the result will be that a plan is not optimized as much as it might otherwise be (the loop will exit earlier than it might otherwise), which seems like a reasonable outcome to me, especially given the low chance of collisions

alamb · 2023-03-18T15:33:01Z

datafusion/optimizer/src/optimizer.rs

@@ -419,6 +412,34 @@ impl Optimizer {
    }
 }

+/// Returns an error if plans have different schemas.


This is a nice refactor

The Boy Scout Rule: "Leave the campground cleaner than you found it" 🤣

alamb · 2023-03-18T15:40:04Z

datafusion/optimizer/src/plan_signature.rs

+    /// When two [`LogicalPlan`]s differ only in metadata, then they will have
+    /// the same [`LogicalPlanSignature`]s (due to hash implementation in
+    /// [`LogicalPlan`]).
+    pub fn compute(plan: &LogicalPlan) -> Self {


I think pub fn new() would be more idiomatic to construct a new plan

jackwener

Great job, thanks @mslapek

mslapek · 2023-03-22T06:45:15Z

@alamb @jackwener Thanks for the review! 🪀

findepi · 2024-07-03T13:22:39Z

datafusion/optimizer/src/optimizer.rs

+            // HashSet::insert returns, whether the value was newly inserted.
+            let plan_is_fresh =
+                previous_plans.insert(LogicalPlanSignature::new(&new_plan));
+            if !plan_is_fresh {


At this point we detected optimization cycle. Cycles are bad, so we exit. Exiting isn't ideal because our plan is optimized yet. We simply stopped applying rules.

IMO cycle should be error condition

Yes, a cycle shows a logical error in the optimizer - because at each step the performance should be improved.

At the same time, it does NOT imply a correctness error. The plan probably will be still correct (but not optimal).

IMO cycle should be error condition

We should take the perspective of a user. If a data scientist does research, should we harm the availability of the database?

IMO Maybe a log of this situation could be useful, with a configuration flag turning this into an error...

@findepi Btw. What motivated you to review this merged (one year ago) PR?

@mslapek thanks for response!

What motivated you to review this merged (one year ago) PR?

I find the source PR as an efficient way to ask a question about the code and reach people with context -- code author(s) and reviewers -- in one shot.

Yes, a cycle shows a logical error in the optimizer - because at each step the performance should be improved.

Agreed!

At the same time, it does NOT imply a correctness error. The plan probably will be still correct (but not optimal).

Agreed!

We should take the perspective of a user. If a data scientist does research, should we harm the availability of the database?

Suboptimal plan can be orders of magnitude more expensive to execute, so allowing it to run may cause unavailability for others, but I see your point. It's especially difficult to transition from bug lenient treatment to more strict. It should be gradual. I like the idea of having this initially controlled with a flag. In tests and on CI this flag should be set to "fail". Then we can switch the flag on for runtime as well.

@alamb @jackwener thoughts?

Thank you for bringing this up @mslapek and @findepi

Suboptimal plan can be orders of magnitude more expensive to execute, so allowing it to run may cause unavailability for others, but I see your point. It's especially difficult to transition from bug lenient treatment to more strict. It should be gradual. I like the idea of having this initially controlled with a flag. In tests and on CI this flag should be set to "fail". Then we can switch the flag on for runtime as well.

My thoughts are that I can see the tradeoffs with both behavior:

Fail fast (raise an error if cycle is detected) that @findepi increases the liklihood that such an error would be found and fixed quickly (as it would prevent the query in question from running)

Lenient (ignore error) user still gets their answer (though maybe woudl take much longer)

One middle ground might be as @findepi suggests and use a flag -- we could default to raising an error if a cycle was detected but have a way for users to ignore the error and proceed anyways. As long as the error message said how to work around the error I think it would be fine.

In fact we have a similar setting already for failed optimizer rules, for many of the same reasons discussed, that we could model the behavior on datafusion.optimizer.skip_failed_rules: https://datafusion.apache.org/user-guide/configs.html

datafusion.optimizer.skip_failed_rules false When set to true, the logical plan optimizer will produce warning messages if any optimization rules produce errors and then proceed to the next rule. When set to false, any rules that produce errors will cause the query to fail

filed #11285 for this

github-actions bot added the optimizer Optimizer rules label Mar 16, 2023

mslapek force-pushed the plan-signature branch 3 times, most recently from 8c0864a to 9239495 Compare March 17, 2023 07:44

Add LogicalPlanSignature and use in the optimizer loop

9239495

alamb approved these changes Mar 18, 2023

View reviewed changes

CR fix

3d1d791

alamb approved these changes Mar 19, 2023

View reviewed changes

alamb mentioned this pull request Mar 19, 2023

Remove Arc<LogicalPlan> from LogicalPlan, stop copying LogicalPlans #4628

Open

alamb requested review from Dandandan and jackwener March 20, 2023 13:33

mslapek mentioned this pull request Mar 20, 2023

Optimizer is slow: Avoid too many string cloning in the optimizer #5157

Closed

jackwener approved these changes Mar 21, 2023

View reviewed changes

alamb merged commit 30dba58 into apache:main Mar 21, 2023

findepi reviewed Jul 3, 2024

View reviewed changes

findepi mentioned this pull request Jul 5, 2024

Avoid masking errors and producing suboptimal plans #11285

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LogicalPlanSignature and use in the optimizer loop #5623

Add LogicalPlanSignature and use in the optimizer loop #5623

mslapek commented Mar 16, 2023 •

edited

Loading

alamb left a comment

alamb Mar 18, 2023

mslapek Mar 18, 2023

alamb Mar 18, 2023

jackwener left a comment

mslapek commented Mar 22, 2023

findepi Jul 3, 2024

mslapek Jul 3, 2024

findepi Jul 3, 2024

alamb Jul 5, 2024

findepi Jul 5, 2024

Add LogicalPlanSignature and use in the optimizer loop #5623

Add LogicalPlanSignature and use in the optimizer loop #5623

Conversation

mslapek commented Mar 16, 2023 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jackwener left a comment

Choose a reason for hiding this comment

mslapek commented Mar 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mslapek commented Mar 16, 2023 •

edited

Loading