feat: RewriteCycle API for short-circuiting optimizer loops #10386

erratic-pattern · 2024-05-06T01:12:34Z

Which issue does this PR close?

Closes #1160.

Rationale for this change

This is a follow up to #10358 with a new approach that should short-circuit earlier. See previous discussion there.

What changes are included in this PR?

Are these changes tested?

yes

Are there any user-facing changes?

no

erratic-pattern · 2024-05-06T02:12:04Z

Benchmark results between the two PRs and main

jayzhan211 · 2024-05-06T03:00:20Z

datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

+impl RewriteCycle {
+    fn new() -> Self {
+        RewriteCycle {
+            // use usize::MAX as default to avoid checking null in is_done() comparison


Is the reason to avoid checking null because of cost? I'm fine with either way, but curious on the choice.

I think modeling this with num_rewriters: Option<usize> would make it clearer what is going on and help the compiler check the logic

I think modeling this with num_rewriters: Option<usize> would make it clearer what is going on and help the compiler check the logic

yes I tried to get too fancy here with micro-optimizations when I saw benchmark regressions 😆

I think cycle_length would also be a more clear name for this variable 🤔 I think it is confusing that it is only initialized after the first iteration

jayzhan211 · 2024-05-06T03:07:59Z

datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

+    }
+
+    fn is_done(&self) -> bool {
+        self.consecutive_unchanged_count >= self.num_rewriters


The logic to stop looping is checking if transformed meet the num of rewriters, but can we just stop if any of the transformed is false? I think it is more straightforward and we don't need to have a max_iter num too.

Idea is something like

loop { transformed |= rewrite_rule() if not transformed { return } }

🤔 I think there are cases when simplification will do something that allows more constant propagation to proceed so even if transformed=false is returned another application of a different pass could actually simplify things -- so in this case for example if the const evaluator returns false, simpliy could still return true

so in this case for example if the const evaluator returns false, simpliy could still return true

In this case, transformed is true, so we should run another pass.
I think once we found every rules is not transformed in a pass that means we done the optimization

I think once we found every rules is not transformed in a pass that means we done the optimization

Yes, I agree -- I think this is what the counter is attempting to do

🤔 I think there are cases when simplification will do something that allows more constant propagation to proceed so even if transformed=false is returned another application of a different pass could actually simplify things -- so in this case for example if the const evaluator returns false, simpliy could still return true

yes I found this to be true when running tests against this code. an additional constant evaluation produced a new transformation

alamb

Thanks @erratic-pattern and @jayzhan211

I am going to run the planning benchmarks on this branch using a VM and report performance numbers

alamb · 2024-05-06T19:09:37Z

datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

+impl RewriteCycle {
+    fn new() -> Self {
+        RewriteCycle {
+            // use usize::MAX as default to avoid checking null in is_done() comparison


I think modeling this with num_rewriters: Option<usize> would make it clearer what is going on and help the compiler check the logic

alamb · 2024-05-06T19:47:45Z

Here are my measurements on my gcp machine (it does seem to help by 1-2%)

Details

++ critcmp main loop-expr-simplifier-static-dispatch
group                                         loop-expr-simplifier-static-dispatch    main
-----                                         ------------------------------------    ----
logical_aggregate_with_join                   1.00  1211.4±14.74µs        ? ?/sec     1.00  1214.7±64.80µs        ? ?/sec
logical_plan_tpcds_all                        1.00    156.4±1.31ms        ? ?/sec     1.01    158.4±2.20ms        ? ?/sec
logical_plan_tpch_all                         1.00     16.9±0.19ms        ? ?/sec     1.00     16.9±0.17ms        ? ?/sec
logical_select_all_from_1000                  1.00     18.6±0.17ms        ? ?/sec     1.01     18.8±0.13ms        ? ?/sec
logical_select_one_from_700                   1.00   814.0±10.94µs        ? ?/sec     1.00   815.5±32.65µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   758.0±10.12µs        ? ?/sec     1.00   759.1±19.82µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00    743.0±8.02µs        ? ?/sec     1.01    749.9±9.83µs        ? ?/sec
physical_plan_tpcds_all                       1.00   1336.8±8.42ms        ? ?/sec     1.01   1354.8±8.50ms        ? ?/sec
physical_plan_tpch_all                        1.00     90.2±1.61ms        ? ?/sec     1.03     93.1±1.37ms        ? ?/sec
physical_plan_tpch_q1                         1.00      4.9±0.07ms        ? ?/sec     1.06      5.1±0.09ms        ? ?/sec
physical_plan_tpch_q10                        1.00      4.3±0.06ms        ? ?/sec     1.02      4.4±0.08ms        ? ?/sec
physical_plan_tpch_q11                        1.00      3.9±0.05ms        ? ?/sec     1.02      4.0±0.06ms        ? ?/sec
physical_plan_tpch_q12                        1.00      3.1±0.07ms        ? ?/sec     1.00      3.1±0.08ms        ? ?/sec
physical_plan_tpch_q13                        1.00      2.1±0.03ms        ? ?/sec     1.04      2.2±0.06ms        ? ?/sec
physical_plan_tpch_q14                        1.00      2.7±0.05ms        ? ?/sec     1.06      2.8±0.05ms        ? ?/sec
physical_plan_tpch_q16                        1.00      3.7±0.08ms        ? ?/sec     1.04      3.8±0.09ms        ? ?/sec
physical_plan_tpch_q17                        1.00      3.5±0.05ms        ? ?/sec     1.02      3.6±0.06ms        ? ?/sec
physical_plan_tpch_q18                        1.00      3.9±0.05ms        ? ?/sec     1.00      3.9±0.07ms        ? ?/sec
physical_plan_tpch_q19                        1.00      6.0±0.07ms        ? ?/sec     1.03      6.2±0.08ms        ? ?/sec
physical_plan_tpch_q2                         1.00      7.7±0.10ms        ? ?/sec     1.02      7.9±0.06ms        ? ?/sec
physical_plan_tpch_q20                        1.00      4.5±0.07ms        ? ?/sec     1.04      4.7±0.09ms        ? ?/sec
physical_plan_tpch_q21                        1.00      6.1±0.07ms        ? ?/sec     1.04      6.3±0.06ms        ? ?/sec
physical_plan_tpch_q22                        1.00      3.3±0.06ms        ? ?/sec     1.04      3.4±0.07ms        ? ?/sec
physical_plan_tpch_q3                         1.00      3.1±0.05ms        ? ?/sec     1.01      3.1±0.06ms        ? ?/sec
physical_plan_tpch_q4                         1.00      2.3±0.05ms        ? ?/sec     1.02      2.3±0.05ms        ? ?/sec
physical_plan_tpch_q5                         1.00      4.5±0.07ms        ? ?/sec     1.02      4.5±0.07ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1538.7±18.47µs        ? ?/sec     1.03  1586.4±21.54µs        ? ?/sec
physical_plan_tpch_q7                         1.00      5.7±0.08ms        ? ?/sec     1.01      5.7±0.13ms        ? ?/sec
physical_plan_tpch_q8                         1.00      7.3±0.10ms        ? ?/sec     1.01      7.4±0.08ms        ? ?/sec
physical_plan_tpch_q9                         1.00      5.6±0.09ms        ? ?/sec     1.00      5.6±0.08ms        ? ?/sec
physical_select_all_from_1000                 1.00     60.6±0.25ms        ? ?/sec     1.01     61.4±0.29ms        ? ?/sec
physical_select_one_from_700                  1.00      3.6±0.05ms        ? ?/sec     1.02      3.7±0.04ms        ? ?/sec

alamb · 2024-05-06T21:02:21Z

My second run was consistent:

Details

++ critcmp main loop-expr-simplifier-static-dispatch
group                                         loop-expr-simplifier-static-dispatch    main
-----                                         ------------------------------------    ----
logical_aggregate_with_join                   1.00   1213.6±9.49µs        ? ?/sec     1.01  1220.2±37.22µs        ? ?/sec
logical_plan_tpcds_all                        1.00    158.8±1.65ms        ? ?/sec     1.01    160.0±1.55ms        ? ?/sec
logical_plan_tpch_all                         1.00     16.9±0.21ms        ? ?/sec     1.00     16.9±0.20ms        ? ?/sec
logical_select_all_from_1000                  1.00     18.8±0.12ms        ? ?/sec     1.00     18.7±0.10ms        ? ?/sec
logical_select_one_from_700                   1.00   809.7±10.72µs        ? ?/sec     1.00   812.4±11.32µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00    756.6±7.03µs        ? ?/sec     1.01    761.9±7.95µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00   742.9±13.09µs        ? ?/sec     1.01   748.3±14.13µs        ? ?/sec
physical_plan_tpcds_all                       1.00   1337.9±9.37ms        ? ?/sec     1.01   1346.9±8.49ms        ? ?/sec
physical_plan_tpch_all                        1.00     92.2±1.47ms        ? ?/sec     1.00     92.0±1.27ms        ? ?/sec
physical_plan_tpch_q1                         1.00      5.0±0.09ms        ? ?/sec     1.03      5.1±0.07ms        ? ?/sec
physical_plan_tpch_q10                        1.00      4.4±0.11ms        ? ?/sec     1.00      4.4±0.08ms        ? ?/sec
physical_plan_tpch_q11                        1.00      3.9±0.06ms        ? ?/sec     1.02      4.0±0.08ms        ? ?/sec
physical_plan_tpch_q12                        1.00      3.1±0.06ms        ? ?/sec     1.01      3.1±0.05ms        ? ?/sec
physical_plan_tpch_q13                        1.00      2.1±0.04ms        ? ?/sec     1.00      2.1±0.04ms        ? ?/sec
physical_plan_tpch_q14                        1.00      2.7±0.05ms        ? ?/sec     1.04      2.8±0.06ms        ? ?/sec
physical_plan_tpch_q16                        1.00      3.7±0.08ms        ? ?/sec     1.03      3.8±0.07ms        ? ?/sec
physical_plan_tpch_q17                        1.01      3.6±0.07ms        ? ?/sec     1.00      3.6±0.07ms        ? ?/sec
physical_plan_tpch_q18                        1.00      4.0±0.07ms        ? ?/sec     1.01      4.0±0.05ms        ? ?/sec
physical_plan_tpch_q19                        1.00      6.2±0.07ms        ? ?/sec     1.02      6.3±0.07ms        ? ?/sec
physical_plan_tpch_q2                         1.00      7.8±0.09ms        ? ?/sec     1.01      7.9±0.07ms        ? ?/sec
physical_plan_tpch_q20                        1.00      4.6±0.08ms        ? ?/sec     1.00      4.6±0.07ms        ? ?/sec
physical_plan_tpch_q21                        1.00      6.2±0.08ms        ? ?/sec     1.01      6.3±0.09ms        ? ?/sec
physical_plan_tpch_q22                        1.00      3.4±0.08ms        ? ?/sec     1.02      3.5±0.09ms        ? ?/sec
physical_plan_tpch_q3                         1.00      3.1±0.07ms        ? ?/sec     1.00      3.2±0.05ms        ? ?/sec
physical_plan_tpch_q4                         1.00      2.3±0.06ms        ? ?/sec     1.01      2.3±0.05ms        ? ?/sec
physical_plan_tpch_q5                         1.00      4.4±0.06ms        ? ?/sec     1.03      4.6±0.05ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1551.7±25.84µs        ? ?/sec     1.03  1592.4±22.64µs        ? ?/sec
physical_plan_tpch_q7                         1.00      5.7±0.06ms        ? ?/sec     1.00      5.7±0.09ms        ? ?/sec
physical_plan_tpch_q8                         1.00      7.4±0.08ms        ? ?/sec     1.02      7.5±0.07ms        ? ?/sec
physical_plan_tpch_q9                         1.00      5.7±0.09ms        ? ?/sec     1.00      5.7±0.08ms        ? ?/sec
physical_select_all_from_1000                 1.00     61.2±0.81ms        ? ?/sec     1.00     61.3±0.33ms        ? ?/sec
physical_select_one_from_700                  1.01      3.7±0.05ms        ? ?/sec     1.00      3.6±0.04ms        ? ?/sec

alamb · 2024-05-07T10:44:28Z

Other PR has been merged in: #10358

I think we can merge / rebase this PR now and mark it ready for review

erratic-pattern · 2024-05-13T16:25:30Z

I will take another look at this and see if I can clean it up a bit more.

erratic-pattern · 2024-05-15T05:26:05Z

datafusion/common/src/tree_node.rs

@@ -503,7 +503,7 @@ pub trait TreeNodeVisitor: Sized {
 ///
 /// # See Also:
 /// * [`TreeNode::visit`] to inspect borrowed `TreeNode`s
-pub trait TreeNodeRewriter: Sized {


Not sure why this was a supertrait of Sized but it can't be if we want trait objects

erratic-pattern · 2024-05-15T05:26:23Z

datafusion/common/src/tree_node.rs

@@ -172,7 +172,7 @@ pub trait TreeNode: Sized {
    /// TreeNodeRewriter::f_up(ChildNode2)
    /// TreeNodeRewriter::f_up(ParentNode)
    /// ```
-    fn rewrite<R: TreeNodeRewriter<Node = Self>>(
+    fn rewrite<R: TreeNodeRewriter<Node = Self> + ?Sized>(


Needed in order to be callable with a trait object

erratic-pattern · 2024-05-15T05:32:43Z

Alright I think this is in a good state now. I added a dynamic dispatch API which I think could be useful if we ever add TreeNodeRewriter impl for OptimizerRule, then we could run the whole optimizer through this cycle logic.

I made some stuff public for doctests, but if we don't want to do that I can just mark the doctest as ignored

erratic-pattern · 2024-05-15T05:56:05Z

I removed the dynamic dispatch API for now because I think there is a better way to write it, but I can't really do that until we get to a point where we have a TreeNodeRewriter impl for Arc<dyn OptimizerRule>

jayzhan211 · 2024-05-15T09:30:00Z

datafusion/optimizer/src/rewrite_cycle.rs

+        }
+        // run remaining cycles
+        match (1..self.max_cycles).try_fold(state, |state, _| f(state)) {
+            ControlFlow::Break(result) => result?.finish(),


should we return here?

I think when we hit controlflow::break, it means is_done, so we can early return the result

ControlFlow::Break(result) => return result?.finish(),

@erratic-pattern how about this?

I tried this suggestion locally and it seems to work well 👍

Suggested change

ControlFlow::Break(result) => result?.finish(),

ControlFlow::Break(result) => return result?.finish(),

I think when we hit controlflow::break, it means is_done, so we can early return the result

ControlFlow::Break(result) => return result?.finish(),

It is not needed because of implicit return semantics, since we are already at the end of the function. I can add an explicit return as it probably makes the code easier to read, but there may be some linter error that gets triggered

you are right, ControlFlow::Break does the early return

jayzhan211 · 2024-05-15T09:37:02Z

I think the logic is correct, although it takes me sometime to understand the difference between the first pass and other pass, but I did not have a better design about this.

erratic-pattern · 2024-05-15T16:57:38Z

I think the logic is correct, although it takes me sometime to understand the difference between the first pass and other pass, but I did not have a better design about this.

We need to figure out how many rewrites are in a cycle. Since there is no Vec or other data structure, we cannot use a length, so we manually count and then record the cycle length at the end of the first pass.

Note that the dynamic dispatch API that I deleted from this PR does not have this problem, since we can simply set cycle_length to vec.len().

You can make this code to not have a special first pass if you wanted to make it simpler to understand. If you check cycle_length.is_none() in record_cycle_length before setting the cycle_length, you could call it at the end of each cycle and then you can just run everything in a single try_fold loop. However, I wanted to avoid unneccessary conditional checks, so I run the first cycle, record the cycle length, the continue with the remaining cycles.

This makes the code and the API kind of complicated for something that should be relatively simple. But I found dynamic dispatch and additional conditional checks to have a significant performance regressions when we're operating over a small number of rewriters (only 3 rewriters here), so I wanted to make an API useable with static dispatch.

A dynamic dispatch API is more appropriate for the top-level optimizer because:

it's already using dynamic dispatch
the overhead of dynamic dispatch should be less significant of a % of execution time when we are operating over a larger number of rewriters

alamb

@erratic-pattern -- thank you -- this is really cool

I have some suggestions on testing and not needing to make the parts of the expr simplifier public.

Let me know if you think you have time to do this -- if not maybe I can take a shot at helping

datafusion/optimizer/src/rewrite_cycle.rs

alamb · 2024-05-15T19:38:01Z

datafusion/optimizer/src/rewrite_cycle.rs

+    Result,
+};
+
+/// A builder with methods for executing a "rewrite cycle".


This is a very powerful API -- I can imagine using it for the optimizer rules in general 🤔 -- very neat

This is a very powerful API -- I can imagine using it for the optimizer rules in general 🤔 -- very neat

See 1886002 for an example of adding dynamic dispatch helpers. That code is actually much simpler and can leverage the same internal state logic.

alamb · 2024-05-15T19:40:20Z

datafusion/optimizer/src/rewrite_cycle.rs

+    /// individual [TreeNodeRewriter] in the cycle. [RewriteCycleState::rewrite] returns a [RewriteCycleControlFlow]
+    /// result, indicating whether the loop should break or continue.
+    ///
+    /// ```rust


This is a neat example, but I think it may overly complicated to try and reuse the simplification logic

What if you wrote your own simple test / example tree node rewriter (especially given this API is generic)

Maybe something that adds 1 to integers but not floats 🤔

I think that would both:

Avoid having to make stuff in the simplifier pub

Make it clearer what this API did

I updated this doctest with a simple example of two rewriters for constant addition/multiplication. I think it is important for the example to show a rewriter that eventually stops making changes, and makes changes that another rewriter depends on.

alamb · 2024-05-15T19:41:00Z

datafusion/optimizer/src/rewrite_cycle.rs

+    }
+}
+
+pub type RewriteCycleControlFlow<T> = ControlFlow<Result<T>, T>;


Since you have this API so nicely broken out, I think it would be really neat to add tests for it that demonstrate it works properly in isolation as well as make its api clearer in code

alamb · 2024-05-21T20:38:23Z

Is this PR ready for the next round of review @erratic-pattern ? Or do you plan to make further changes to it?

erratic-pattern · 2024-05-21T22:54:03Z

@alamb I will try to update this today or tomorrow. I've been putting this off a bit.

alamb · 2024-05-22T20:50:32Z

Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look

erratic-pattern · 2024-06-10T01:09:54Z

datafusion/optimizer/src/rewrite_cycle.rs

+}
+
+pub type RewriteCycleControlFlow<T> = ControlFlow<Result<T>, T>;
+#[cfg(test)]


alamb · 2024-06-10T17:21:23Z

Checking this out...

alamb

Thank you @erratic-pattern and @jayzhan211 -- I think this PR looks really nice.

I have some documentation / naming suggestions, but nothing that I think is required

I also tried out @jayzhan211 's suggestion https://github.com/apache/datafusion/pull/10386/files#r1632502092 and it seems to work well , so I think you should respond to that before we merge this PR

While I didn't see any performance change with this PR, I think it is worthwhile simply for the readability improvements

Performance

I ran the planning benchmark with this brach an in summary there is basically no performance change

logical_plan_tpcds_all                        1.00    155.0±1.26ms        ? ?/sec    1.00    155.2±1.27ms        ? ?/sec
logical_plan_tpch_all                         1.00     17.1±0.21ms        ? ?/sec    1.00     17.2±0.17ms        ? ?/sec

This may be due to the fact we have improved the performance of some of the optimizer passes now so the overhead of running them multiple times is not as great.

Details

++ critcmp main simplifier-static-dispatch
group                                         main                                   simplifier-static-dispatch
-----                                         ----                                   --------------------------
logical_aggregate_with_join                   1.00  1009.0±12.76µs        ? ?/sec    1.01  1014.2±13.57µs        ? ?/sec
logical_plan_tpcds_all                        1.00    155.0±1.26ms        ? ?/sec    1.00    155.2±1.27ms        ? ?/sec
logical_plan_tpch_all                         1.00     17.1±0.21ms        ? ?/sec    1.00     17.2±0.17ms        ? ?/sec
logical_select_all_from_1000                  1.00     19.0±0.12ms        ? ?/sec    1.00     19.1±0.17ms        ? ?/sec
logical_select_one_from_700                   1.01    825.9±8.11µs        ? ?/sec    1.00    819.5±8.51µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   768.0±11.59µs        ? ?/sec    1.06   811.6±41.08µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00   758.8±23.88µs        ? ?/sec    1.00   756.8±10.68µs        ? ?/sec
physical_plan_tpcds_all                       1.00   1250.3±8.57ms        ? ?/sec    1.00   1248.0±4.97ms        ? ?/sec
physical_plan_tpch_all                        1.00     87.0±1.17ms        ? ?/sec    1.00     86.8±1.06ms        ? ?/sec
physical_plan_tpch_q1                         1.00      4.6±0.06ms        ? ?/sec    1.03      4.7±0.07ms        ? ?/sec
physical_plan_tpch_q10                        1.00      4.1±0.07ms        ? ?/sec    1.00      4.1±0.07ms        ? ?/sec
physical_plan_tpch_q11                        1.00      3.6±0.06ms        ? ?/sec    1.01      3.6±0.05ms        ? ?/sec
physical_plan_tpch_q12                        1.00      2.7±0.03ms        ? ?/sec    1.00      2.7±0.04ms        ? ?/sec
physical_plan_tpch_q13                        1.00      2.0±0.02ms        ? ?/sec    1.00      2.0±0.02ms        ? ?/sec
physical_plan_tpch_q14                        1.00      2.4±0.04ms        ? ?/sec    1.00      2.4±0.04ms        ? ?/sec
physical_plan_tpch_q16                        1.02      3.6±0.07ms        ? ?/sec    1.00      3.5±0.06ms        ? ?/sec
physical_plan_tpch_q17                        1.00      3.4±0.06ms        ? ?/sec    1.00      3.4±0.06ms        ? ?/sec
physical_plan_tpch_q18                        1.01      3.8±0.07ms        ? ?/sec    1.00      3.8±0.07ms        ? ?/sec
physical_plan_tpch_q19                        1.02      5.6±0.08ms        ? ?/sec    1.00      5.6±0.07ms        ? ?/sec
physical_plan_tpch_q2                         1.00      7.4±0.08ms        ? ?/sec    1.03      7.6±0.10ms        ? ?/sec
physical_plan_tpch_q20                        1.02      4.5±0.13ms        ? ?/sec    1.00      4.4±0.06ms        ? ?/sec
physical_plan_tpch_q21                        1.00      6.0±0.08ms        ? ?/sec    1.00      6.0±0.10ms        ? ?/sec
physical_plan_tpch_q22                        1.01      3.3±0.09ms        ? ?/sec    1.00      3.2±0.05ms        ? ?/sec
physical_plan_tpch_q3                         1.00      3.0±0.04ms        ? ?/sec    1.01      3.0±0.07ms        ? ?/sec
physical_plan_tpch_q4                         1.00      2.2±0.02ms        ? ?/sec    1.02      2.2±0.04ms        ? ?/sec
physical_plan_tpch_q5                         1.00      4.2±0.06ms        ? ?/sec    1.02      4.3±0.09ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1449.0±66.64µs        ? ?/sec    1.01  1461.8±26.90µs        ? ?/sec
physical_plan_tpch_q7                         1.00      5.3±0.09ms        ? ?/sec    1.01      5.3±0.08ms        ? ?/sec
physical_plan_tpch_q8                         1.00      6.8±0.09ms        ? ?/sec    1.02      6.9±0.09ms        ? ?/sec
physical_plan_tpch_q9                         1.00      5.2±0.08ms        ? ?/sec    1.01      5.2±0.09ms        ? ?/sec
physical_select_all_from_1000                 1.00     61.3±0.26ms        ? ?/sec    1.01     61.7±0.24ms        ? ?/sec
physical_select_one_from_700                  1.00      3.6±0.04ms        ? ?/sec    1.01      3.6±0.03ms        ? ?/sec

alamb · 2024-06-10T18:20:14Z

datafusion/optimizer/src/rewrite_cycle.rs

+        self.max_cycles
+    }
+
+    /// Runs a rewrite cycle on the given [TreeNode] using the given callback function to


Could you please define how a "cycle" and "iteration" are related in this function? They are used in the apis below but I didn't see a definition of the terms and how they are related

I think it is that a cycle is composed of several iterations

Alternately, maybe "rewrites" is a better term than "iteration" as this is a "rewrite cycle" 🤔

alamb · 2024-06-10T18:21:07Z

datafusion/optimizer/src/rewrite_cycle.rs

+#[derive(Debug)]
+pub struct RewriteCycleState<Node: TreeNode> {
+    node: Node,
+    consecutive_unchanged_count: usize,


I think it would help me follow the code better if these fields used the terms "cycle" and "iteration" consistently

Like

consecutive_unchanged_iterations: usize, iteration_count: usize, // the number of iterations in each cycle (set after the first cycle) cycle_length: Option<usize>,

alamb · 2024-06-10T18:22:14Z

datafusion/optimizer/src/rewrite_cycle.rs

+        self.consecutive_unchanged_count >= cycle_length
+    }
+
+    /// Finishes the iteration by consuming the state and returning a [TreeNode] and


this finishes the cycle right (not the iteration?)

this finishes the cycle right (not the iteration?)

it finishes all cycles. as in, it runs all of the remaining iterations and terminates the loop. I will reword to clarify

alamb · 2024-06-10T18:31:18Z

datafusion/optimizer/src/rewrite_cycle.rs

+        }
+        // run remaining cycles
+        match (1..self.max_cycles).try_fold(state, |state, _| f(state)) {
+            ControlFlow::Break(result) => result?.finish(),


I tried this suggestion locally and it seems to work well 👍

Suggested change

ControlFlow::Break(result) => result?.finish(),

ControlFlow::Break(result) => return result?.finish(),

alamb · 2024-06-10T18:33:38Z

I fixed a clippy lint and merged up from main to get the latest changes into this PR

erratic-pattern · 2024-06-12T20:04:20Z

I ran the planning benchmark with this brach an in summary there is basically no performance change

Yes that is what I observed as well. I assume the overhead offsets any potential improvement from skipping steps here. I think the value of this API comes from the potential reuseability of the logic for the entire optimizer. It is not ready to do that in its current state, but with some improvements it could be used across the optimizer.

erratic-pattern · 2024-06-12T20:05:13Z

I will update this PR with review suggestions (maybe) this weekend

alamb · 2024-06-12T22:21:03Z

Marking as draft as feedback is incorporated

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

github-actions bot added optimizer Optimizer rules core Core datafusion crate labels May 6, 2024

erratic-pattern force-pushed the adam/loop-expr-simplifier-static-dispatch branch 2 times, most recently from cdb389a to 7330dfe Compare May 6, 2024 01:16

erratic-pattern marked this pull request as draft May 6, 2024 01:17

erratic-pattern force-pushed the adam/loop-expr-simplifier-static-dispatch branch 2 times, most recently from ae31462 to dcc9140 Compare May 6, 2024 01:21

erratic-pattern mentioned this pull request May 6, 2024

feat: run expression simplifier in a loop until a fixedpoint or 3 cycles #10358

Merged

jayzhan211 reviewed May 6, 2024

View reviewed changes

alamb mentioned this pull request May 6, 2024

DataFusion weekly project plan (Andrew Lamb) - May 6, 2024 #10395

Closed

7 tasks

alamb reviewed May 6, 2024

View reviewed changes

erratic-pattern force-pushed the adam/loop-expr-simplifier-static-dispatch branch 2 times, most recently from a0b8397 to 4700004 Compare May 15, 2024 04:34

erratic-pattern marked this pull request as ready for review May 15, 2024 04:34

erratic-pattern force-pushed the adam/loop-expr-simplifier-static-dispatch branch 5 times, most recently from 83de911 to f8e5c75 Compare May 15, 2024 05:19

erratic-pattern changed the title ~~feat: short-circuiting expression simplifier (second version)~~ feat: RewriteCycle API for short-circuiting optimizer loops May 15, 2024

erratic-pattern requested review from alamb and jayzhan211 May 15, 2024 05:24

erratic-pattern commented May 15, 2024

View reviewed changes

jayzhan211 reviewed May 15, 2024

View reviewed changes

alamb mentioned this pull request May 15, 2024

DataFusion weekly project plan (Andrew Lamb) - May 13, 2024 #10482

Closed

8 tasks

alamb reviewed May 15, 2024

View reviewed changes

alamb marked this pull request as draft May 22, 2024 20:50

erratic-pattern force-pushed the adam/loop-expr-simplifier-static-dispatch branch from 1886002 to c1ffdfa Compare June 9, 2024 21:51

erratic-pattern marked this pull request as ready for review June 10, 2024 01:08

erratic-pattern requested a review from alamb June 10, 2024 01:08

erratic-pattern commented Jun 10, 2024

View reviewed changes

erratic-pattern force-pushed the adam/loop-expr-simplifier-static-dispatch branch from efb2966 to 3161f14 Compare June 10, 2024 01:15

alamb approved these changes Jun 10, 2024

View reviewed changes

alamb marked this pull request as draft June 12, 2024 22:21

erratic-pattern and others added 6 commits June 16, 2024 18:55

feat: RewriteCycle API for short-circuiting optimizer loops

c707ccf

remove dynamic dispatch API for now

21616ff

Update datafusion/optimizer/src/rewrite_cycle.rs

4aee80a

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

update doctest and add new tests

161cbed

refactor tests

d8c8cff

clippy

f5d6ed4

erratic-pattern force-pushed the adam/loop-expr-simplifier-static-dispatch branch from d64dbe3 to f5d6ed4 Compare June 16, 2024 22:55

	ControlFlow::Break(result) => result?.finish(),
	ControlFlow::Break(result) => return result?.finish(),

feat: RewriteCycle API for short-circuiting optimizer loops #10386

Are you sure you want to change the base?

feat: RewriteCycle API for short-circuiting optimizer loops #10386

Conversation

erratic-pattern commented May 6, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

erratic-pattern commented May 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayzhan211 May 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented May 6, 2024

alamb commented May 6, 2024

alamb commented May 7, 2024

erratic-pattern commented May 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erratic-pattern commented May 15, 2024

erratic-pattern commented May 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayzhan211 commented May 15, 2024 • edited Loading

erratic-pattern commented May 15, 2024 • edited Loading

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented May 21, 2024

erratic-pattern commented May 21, 2024

alamb commented May 22, 2024

Choose a reason for hiding this comment

alamb commented Jun 10, 2024

alamb left a comment

Choose a reason for hiding this comment

Performance

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erratic-pattern Jun 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Jun 10, 2024

erratic-pattern commented Jun 12, 2024

erratic-pattern commented Jun 12, 2024

alamb commented Jun 12, 2024

erratic-pattern commented May 6, 2024 •

edited

Loading

jayzhan211 May 7, 2024 •

edited

Loading

jayzhan211 commented May 15, 2024 •

edited

Loading

erratic-pattern commented May 15, 2024 •

edited

Loading

erratic-pattern Jun 12, 2024 •

edited

Loading