Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: RewriteCycle API for short-circuiting optimizer loops #10386

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

erratic-pattern
Copy link
Contributor

@erratic-pattern erratic-pattern commented May 6, 2024

Which issue does this PR close?

Closes #1160.

Rationale for this change

This is a follow up to #10358 with a new approach that should short-circuit earlier. See previous discussion there.

What changes are included in this PR?

Are these changes tested?

yes

Are there any user-facing changes?

no

@github-actions github-actions bot added optimizer Optimizer rules core Core datafusion crate labels May 6, 2024
@erratic-pattern erratic-pattern force-pushed the adam/loop-expr-simplifier-static-dispatch branch 2 times, most recently from cdb389a to 7330dfe Compare May 6, 2024 01:16
@erratic-pattern erratic-pattern marked this pull request as draft May 6, 2024 01:17
@erratic-pattern erratic-pattern force-pushed the adam/loop-expr-simplifier-static-dispatch branch 2 times, most recently from ae31462 to dcc9140 Compare May 6, 2024 01:21
@erratic-pattern
Copy link
Contributor Author

Benchmark results between the two PRs and main

image

impl RewriteCycle {
fn new() -> Self {
RewriteCycle {
// use usize::MAX as default to avoid checking null in is_done() comparison
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the reason to avoid checking null because of cost? I'm fine with either way, but curious on the choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think modeling this with num_rewriters: Option<usize> would make it clearer what is going on and help the compiler check the logic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think modeling this with num_rewriters: Option<usize> would make it clearer what is going on and help the compiler check the logic

yes I tried to get too fancy here with micro-optimizations when I saw benchmark regressions 😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cycle_length would also be a more clear name for this variable 🤔 I think it is confusing that it is only initialized after the first iteration

}

fn is_done(&self) -> bool {
self.consecutive_unchanged_count >= self.num_rewriters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic to stop looping is checking if transformed meet the num of rewriters, but can we just stop if any of the transformed is false? I think it is more straightforward and we don't need to have a max_iter num too.

Idea is something like

loop {
  transformed |= rewrite_rule()
  if not transformed {
     return
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I think there are cases when simplification will do something that allows more constant propagation to proceed so even if transformed=false is returned another application of a different pass could actually simplify things -- so in this case for example if the const evaluator returns false, simpliy could still return true

Copy link
Contributor

@jayzhan211 jayzhan211 May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so in this case for example if the const evaluator returns false, simpliy could still return true

In this case, transformed is true, so we should run another pass.
I think once we found every rules is not transformed in a pass that means we done the optimization

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think once we found every rules is not transformed in a pass that means we done the optimization

Yes, I agree -- I think this is what the counter is attempting to do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I think there are cases when simplification will do something that allows more constant propagation to proceed so even if transformed=false is returned another application of a different pass could actually simplify things -- so in this case for example if the const evaluator returns false, simpliy could still return true

yes I found this to be true when running tests against this code. an additional constant evaluation produced a new transformation

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @erratic-pattern and @jayzhan211

I am going to run the planning benchmarks on this branch using a VM and report performance numbers

impl RewriteCycle {
fn new() -> Self {
RewriteCycle {
// use usize::MAX as default to avoid checking null in is_done() comparison
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think modeling this with num_rewriters: Option<usize> would make it clearer what is going on and help the compiler check the logic

@alamb
Copy link
Contributor

alamb commented May 6, 2024

Here are my measurements on my gcp machine (it does seem to help by 1-2%)

Details

++ critcmp main loop-expr-simplifier-static-dispatch
group                                         loop-expr-simplifier-static-dispatch    main
-----                                         ------------------------------------    ----
logical_aggregate_with_join                   1.00  1211.4±14.74µs        ? ?/sec     1.00  1214.7±64.80µs        ? ?/sec
logical_plan_tpcds_all                        1.00    156.4±1.31ms        ? ?/sec     1.01    158.4±2.20ms        ? ?/sec
logical_plan_tpch_all                         1.00     16.9±0.19ms        ? ?/sec     1.00     16.9±0.17ms        ? ?/sec
logical_select_all_from_1000                  1.00     18.6±0.17ms        ? ?/sec     1.01     18.8±0.13ms        ? ?/sec
logical_select_one_from_700                   1.00   814.0±10.94µs        ? ?/sec     1.00   815.5±32.65µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   758.0±10.12µs        ? ?/sec     1.00   759.1±19.82µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00    743.0±8.02µs        ? ?/sec     1.01    749.9±9.83µs        ? ?/sec
physical_plan_tpcds_all                       1.00   1336.8±8.42ms        ? ?/sec     1.01   1354.8±8.50ms        ? ?/sec
physical_plan_tpch_all                        1.00     90.2±1.61ms        ? ?/sec     1.03     93.1±1.37ms        ? ?/sec
physical_plan_tpch_q1                         1.00      4.9±0.07ms        ? ?/sec     1.06      5.1±0.09ms        ? ?/sec
physical_plan_tpch_q10                        1.00      4.3±0.06ms        ? ?/sec     1.02      4.4±0.08ms        ? ?/sec
physical_plan_tpch_q11                        1.00      3.9±0.05ms        ? ?/sec     1.02      4.0±0.06ms        ? ?/sec
physical_plan_tpch_q12                        1.00      3.1±0.07ms        ? ?/sec     1.00      3.1±0.08ms        ? ?/sec
physical_plan_tpch_q13                        1.00      2.1±0.03ms        ? ?/sec     1.04      2.2±0.06ms        ? ?/sec
physical_plan_tpch_q14                        1.00      2.7±0.05ms        ? ?/sec     1.06      2.8±0.05ms        ? ?/sec
physical_plan_tpch_q16                        1.00      3.7±0.08ms        ? ?/sec     1.04      3.8±0.09ms        ? ?/sec
physical_plan_tpch_q17                        1.00      3.5±0.05ms        ? ?/sec     1.02      3.6±0.06ms        ? ?/sec
physical_plan_tpch_q18                        1.00      3.9±0.05ms        ? ?/sec     1.00      3.9±0.07ms        ? ?/sec
physical_plan_tpch_q19                        1.00      6.0±0.07ms        ? ?/sec     1.03      6.2±0.08ms        ? ?/sec
physical_plan_tpch_q2                         1.00      7.7±0.10ms        ? ?/sec     1.02      7.9±0.06ms        ? ?/sec
physical_plan_tpch_q20                        1.00      4.5±0.07ms        ? ?/sec     1.04      4.7±0.09ms        ? ?/sec
physical_plan_tpch_q21                        1.00      6.1±0.07ms        ? ?/sec     1.04      6.3±0.06ms        ? ?/sec
physical_plan_tpch_q22                        1.00      3.3±0.06ms        ? ?/sec     1.04      3.4±0.07ms        ? ?/sec
physical_plan_tpch_q3                         1.00      3.1±0.05ms        ? ?/sec     1.01      3.1±0.06ms        ? ?/sec
physical_plan_tpch_q4                         1.00      2.3±0.05ms        ? ?/sec     1.02      2.3±0.05ms        ? ?/sec
physical_plan_tpch_q5                         1.00      4.5±0.07ms        ? ?/sec     1.02      4.5±0.07ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1538.7±18.47µs        ? ?/sec     1.03  1586.4±21.54µs        ? ?/sec
physical_plan_tpch_q7                         1.00      5.7±0.08ms        ? ?/sec     1.01      5.7±0.13ms        ? ?/sec
physical_plan_tpch_q8                         1.00      7.3±0.10ms        ? ?/sec     1.01      7.4±0.08ms        ? ?/sec
physical_plan_tpch_q9                         1.00      5.6±0.09ms        ? ?/sec     1.00      5.6±0.08ms        ? ?/sec
physical_select_all_from_1000                 1.00     60.6±0.25ms        ? ?/sec     1.01     61.4±0.29ms        ? ?/sec
physical_select_one_from_700                  1.00      3.6±0.05ms        ? ?/sec     1.02      3.7±0.04ms        ? ?/sec

@alamb
Copy link
Contributor

alamb commented May 6, 2024

My second run was consistent:

Details

++ critcmp main loop-expr-simplifier-static-dispatch
group                                         loop-expr-simplifier-static-dispatch    main
-----                                         ------------------------------------    ----
logical_aggregate_with_join                   1.00   1213.6±9.49µs        ? ?/sec     1.01  1220.2±37.22µs        ? ?/sec
logical_plan_tpcds_all                        1.00    158.8±1.65ms        ? ?/sec     1.01    160.0±1.55ms        ? ?/sec
logical_plan_tpch_all                         1.00     16.9±0.21ms        ? ?/sec     1.00     16.9±0.20ms        ? ?/sec
logical_select_all_from_1000                  1.00     18.8±0.12ms        ? ?/sec     1.00     18.7±0.10ms        ? ?/sec
logical_select_one_from_700                   1.00   809.7±10.72µs        ? ?/sec     1.00   812.4±11.32µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00    756.6±7.03µs        ? ?/sec     1.01    761.9±7.95µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00   742.9±13.09µs        ? ?/sec     1.01   748.3±14.13µs        ? ?/sec
physical_plan_tpcds_all                       1.00   1337.9±9.37ms        ? ?/sec     1.01   1346.9±8.49ms        ? ?/sec
physical_plan_tpch_all                        1.00     92.2±1.47ms        ? ?/sec     1.00     92.0±1.27ms        ? ?/sec
physical_plan_tpch_q1                         1.00      5.0±0.09ms        ? ?/sec     1.03      5.1±0.07ms        ? ?/sec
physical_plan_tpch_q10                        1.00      4.4±0.11ms        ? ?/sec     1.00      4.4±0.08ms        ? ?/sec
physical_plan_tpch_q11                        1.00      3.9±0.06ms        ? ?/sec     1.02      4.0±0.08ms        ? ?/sec
physical_plan_tpch_q12                        1.00      3.1±0.06ms        ? ?/sec     1.01      3.1±0.05ms        ? ?/sec
physical_plan_tpch_q13                        1.00      2.1±0.04ms        ? ?/sec     1.00      2.1±0.04ms        ? ?/sec
physical_plan_tpch_q14                        1.00      2.7±0.05ms        ? ?/sec     1.04      2.8±0.06ms        ? ?/sec
physical_plan_tpch_q16                        1.00      3.7±0.08ms        ? ?/sec     1.03      3.8±0.07ms        ? ?/sec
physical_plan_tpch_q17                        1.01      3.6±0.07ms        ? ?/sec     1.00      3.6±0.07ms        ? ?/sec
physical_plan_tpch_q18                        1.00      4.0±0.07ms        ? ?/sec     1.01      4.0±0.05ms        ? ?/sec
physical_plan_tpch_q19                        1.00      6.2±0.07ms        ? ?/sec     1.02      6.3±0.07ms        ? ?/sec
physical_plan_tpch_q2                         1.00      7.8±0.09ms        ? ?/sec     1.01      7.9±0.07ms        ? ?/sec
physical_plan_tpch_q20                        1.00      4.6±0.08ms        ? ?/sec     1.00      4.6±0.07ms        ? ?/sec
physical_plan_tpch_q21                        1.00      6.2±0.08ms        ? ?/sec     1.01      6.3±0.09ms        ? ?/sec
physical_plan_tpch_q22                        1.00      3.4±0.08ms        ? ?/sec     1.02      3.5±0.09ms        ? ?/sec
physical_plan_tpch_q3                         1.00      3.1±0.07ms        ? ?/sec     1.00      3.2±0.05ms        ? ?/sec
physical_plan_tpch_q4                         1.00      2.3±0.06ms        ? ?/sec     1.01      2.3±0.05ms        ? ?/sec
physical_plan_tpch_q5                         1.00      4.4±0.06ms        ? ?/sec     1.03      4.6±0.05ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1551.7±25.84µs        ? ?/sec     1.03  1592.4±22.64µs        ? ?/sec
physical_plan_tpch_q7                         1.00      5.7±0.06ms        ? ?/sec     1.00      5.7±0.09ms        ? ?/sec
physical_plan_tpch_q8                         1.00      7.4±0.08ms        ? ?/sec     1.02      7.5±0.07ms        ? ?/sec
physical_plan_tpch_q9                         1.00      5.7±0.09ms        ? ?/sec     1.00      5.7±0.08ms        ? ?/sec
physical_select_all_from_1000                 1.00     61.2±0.81ms        ? ?/sec     1.00     61.3±0.33ms        ? ?/sec
physical_select_one_from_700                  1.01      3.7±0.05ms        ? ?/sec     1.00      3.6±0.04ms        ? ?/sec

@alamb
Copy link
Contributor

alamb commented May 7, 2024

Other PR has been merged in: #10358

I think we can merge / rebase this PR now and mark it ready for review

@erratic-pattern
Copy link
Contributor Author

I will take another look at this and see if I can clean it up a bit more.

@erratic-pattern erratic-pattern force-pushed the adam/loop-expr-simplifier-static-dispatch branch 2 times, most recently from a0b8397 to 4700004 Compare May 15, 2024 04:34
@erratic-pattern erratic-pattern marked this pull request as ready for review May 15, 2024 04:34
@erratic-pattern erratic-pattern force-pushed the adam/loop-expr-simplifier-static-dispatch branch 5 times, most recently from 83de911 to f8e5c75 Compare May 15, 2024 05:19
@erratic-pattern erratic-pattern changed the title feat: short-circuiting expression simplifier (second version) feat: RewriteCycle API for short-circuiting optimizer loops May 15, 2024
@@ -503,7 +503,7 @@ pub trait TreeNodeVisitor: Sized {
///
/// # See Also:
/// * [`TreeNode::visit`] to inspect borrowed `TreeNode`s
pub trait TreeNodeRewriter: Sized {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this was a supertrait of Sized but it can't be if we want trait objects

@@ -172,7 +172,7 @@ pub trait TreeNode: Sized {
/// TreeNodeRewriter::f_up(ChildNode2)
/// TreeNodeRewriter::f_up(ParentNode)
/// ```
fn rewrite<R: TreeNodeRewriter<Node = Self>>(
fn rewrite<R: TreeNodeRewriter<Node = Self> + ?Sized>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed in order to be callable with a trait object

@erratic-pattern
Copy link
Contributor Author

Alright I think this is in a good state now. I added a dynamic dispatch API which I think could be useful if we ever add TreeNodeRewriter impl for OptimizerRule, then we could run the whole optimizer through this cycle logic.

I made some stuff public for doctests, but if we don't want to do that I can just mark the doctest as ignored

@erratic-pattern
Copy link
Contributor Author

I removed the dynamic dispatch API for now because I think there is a better way to write it, but I can't really do that until we get to a point where we have a TreeNodeRewriter impl for Arc<dyn OptimizerRule>

}
// run remaining cycles
match (1..self.max_cycles).try_fold(state, |state, _| f(state)) {
ControlFlow::Break(result) => result?.finish(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we return here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when we hit controlflow::break, it means is_done, so we can early return the result

ControlFlow::Break(result) => return result?.finish(),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erratic-pattern how about this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this suggestion locally and it seems to work well 👍

Suggested change
ControlFlow::Break(result) => result?.finish(),
ControlFlow::Break(result) => return result?.finish(),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when we hit controlflow::break, it means is_done, so we can early return the result

ControlFlow::Break(result) => return result?.finish(),

It is not needed because of implicit return semantics, since we are already at the end of the function. I can add an explicit return as it probably makes the code easier to read, but there may be some linter error that gets triggered

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, ControlFlow::Break does the early return

@jayzhan211
Copy link
Contributor

jayzhan211 commented May 15, 2024

I think the logic is correct, although it takes me sometime to understand the difference between the first pass and other pass, but I did not have a better design about this.

@erratic-pattern
Copy link
Contributor Author

erratic-pattern commented May 15, 2024

I think the logic is correct, although it takes me sometime to understand the difference between the first pass and other pass, but I did not have a better design about this.

We need to figure out how many rewrites are in a cycle. Since there is no Vec or other data structure, we cannot use a length, so we manually count and then record the cycle length at the end of the first pass.

Note that the dynamic dispatch API that I deleted from this PR does not have this problem, since we can simply set cycle_length to vec.len().

You can make this code to not have a special first pass if you wanted to make it simpler to understand. If you check cycle_length.is_none() in record_cycle_length before setting the cycle_length, you could call it at the end of each cycle and then you can just run everything in a single try_fold loop. However, I wanted to avoid unneccessary conditional checks, so I run the first cycle, record the cycle length, the continue with the remaining cycles.

This makes the code and the API kind of complicated for something that should be relatively simple. But I found dynamic dispatch and additional conditional checks to have a significant performance regressions when we're operating over a small number of rewriters (only 3 rewriters here), so I wanted to make an API useable with static dispatch.

A dynamic dispatch API is more appropriate for the top-level optimizer because:

  1. it's already using dynamic dispatch
  2. the overhead of dynamic dispatch should be less significant of a % of execution time when we are operating over a larger number of rewriters

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erratic-pattern -- thank you -- this is really cool

I have some suggestions on testing and not needing to make the parts of the expr simplifier public.

Let me know if you think you have time to do this -- if not maybe I can take a shot at helping

datafusion/optimizer/src/rewrite_cycle.rs Outdated Show resolved Hide resolved
Result,
};

/// A builder with methods for executing a "rewrite cycle".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very powerful API -- I can imagine using it for the optimizer rules in general 🤔 -- very neat

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very powerful API -- I can imagine using it for the optimizer rules in general 🤔 -- very neat

See 1886002 for an example of adding dynamic dispatch helpers. That code is actually much simpler and can leverage the same internal state logic.

/// individual [TreeNodeRewriter] in the cycle. [RewriteCycleState::rewrite] returns a [RewriteCycleControlFlow]
/// result, indicating whether the loop should break or continue.
///
/// ```rust
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a neat example, but I think it may overly complicated to try and reuse the simplification logic

What if you wrote your own simple test / example tree node rewriter (especially given this API is generic)

Maybe something that adds 1 to integers but not floats 🤔

I think that would both:

  1. Avoid having to make stuff in the simplifier pub
  2. Make it clearer what this API did

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this doctest with a simple example of two rewriters for constant addition/multiplication. I think it is important for the example to show a rewriter that eventually stops making changes, and makes changes that another rewriter depends on.

}
}

pub type RewriteCycleControlFlow<T> = ControlFlow<Result<T>, T>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have this API so nicely broken out, I think it would be really neat to add tests for it that demonstrate it works properly in isolation as well as make its api clearer in code

@alamb
Copy link
Contributor

alamb commented May 21, 2024

Is this PR ready for the next round of review @erratic-pattern ? Or do you plan to make further changes to it?

@erratic-pattern
Copy link
Contributor Author

@alamb I will try to update this today or tomorrow. I've been putting this off a bit.

@alamb
Copy link
Contributor

alamb commented May 22, 2024

Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look

@alamb alamb marked this pull request as draft May 22, 2024 20:50
@erratic-pattern erratic-pattern force-pushed the adam/loop-expr-simplifier-static-dispatch branch from 1886002 to c1ffdfa Compare June 9, 2024 21:51
@erratic-pattern erratic-pattern marked this pull request as ready for review June 10, 2024 01:08
}

pub type RewriteCycleControlFlow<T> = ControlFlow<Result<T>, T>;
#[cfg(test)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new tests

@erratic-pattern erratic-pattern force-pushed the adam/loop-expr-simplifier-static-dispatch branch from efb2966 to 3161f14 Compare June 10, 2024 01:15
@alamb
Copy link
Contributor

alamb commented Jun 10, 2024

Checking this out...

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @erratic-pattern and @jayzhan211 -- I think this PR looks really nice.

I have some documentation / naming suggestions, but nothing that I think is required

I also tried out @jayzhan211 's suggestion https://github.com/apache/datafusion/pull/10386/files#r1632502092 and it seems to work well , so I think you should respond to that before we merge this PR

While I didn't see any performance change with this PR, I think it is worthwhile simply for the readability improvements

Performance

I ran the planning benchmark with this brach an in summary there is basically no performance change

logical_plan_tpcds_all                        1.00    155.0±1.26ms        ? ?/sec    1.00    155.2±1.27ms        ? ?/sec
logical_plan_tpch_all                         1.00     17.1±0.21ms        ? ?/sec    1.00     17.2±0.17ms        ? ?/sec

This may be due to the fact we have improved the performance of some of the optimizer passes now so the overhead of running them multiple times is not as great.

Details

++ critcmp main simplifier-static-dispatch
group                                         main                                   simplifier-static-dispatch
-----                                         ----                                   --------------------------
logical_aggregate_with_join                   1.00  1009.0±12.76µs        ? ?/sec    1.01  1014.2±13.57µs        ? ?/sec
logical_plan_tpcds_all                        1.00    155.0±1.26ms        ? ?/sec    1.00    155.2±1.27ms        ? ?/sec
logical_plan_tpch_all                         1.00     17.1±0.21ms        ? ?/sec    1.00     17.2±0.17ms        ? ?/sec
logical_select_all_from_1000                  1.00     19.0±0.12ms        ? ?/sec    1.00     19.1±0.17ms        ? ?/sec
logical_select_one_from_700                   1.01    825.9±8.11µs        ? ?/sec    1.00    819.5±8.51µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   768.0±11.59µs        ? ?/sec    1.06   811.6±41.08µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00   758.8±23.88µs        ? ?/sec    1.00   756.8±10.68µs        ? ?/sec
physical_plan_tpcds_all                       1.00   1250.3±8.57ms        ? ?/sec    1.00   1248.0±4.97ms        ? ?/sec
physical_plan_tpch_all                        1.00     87.0±1.17ms        ? ?/sec    1.00     86.8±1.06ms        ? ?/sec
physical_plan_tpch_q1                         1.00      4.6±0.06ms        ? ?/sec    1.03      4.7±0.07ms        ? ?/sec
physical_plan_tpch_q10                        1.00      4.1±0.07ms        ? ?/sec    1.00      4.1±0.07ms        ? ?/sec
physical_plan_tpch_q11                        1.00      3.6±0.06ms        ? ?/sec    1.01      3.6±0.05ms        ? ?/sec
physical_plan_tpch_q12                        1.00      2.7±0.03ms        ? ?/sec    1.00      2.7±0.04ms        ? ?/sec
physical_plan_tpch_q13                        1.00      2.0±0.02ms        ? ?/sec    1.00      2.0±0.02ms        ? ?/sec
physical_plan_tpch_q14                        1.00      2.4±0.04ms        ? ?/sec    1.00      2.4±0.04ms        ? ?/sec
physical_plan_tpch_q16                        1.02      3.6±0.07ms        ? ?/sec    1.00      3.5±0.06ms        ? ?/sec
physical_plan_tpch_q17                        1.00      3.4±0.06ms        ? ?/sec    1.00      3.4±0.06ms        ? ?/sec
physical_plan_tpch_q18                        1.01      3.8±0.07ms        ? ?/sec    1.00      3.8±0.07ms        ? ?/sec
physical_plan_tpch_q19                        1.02      5.6±0.08ms        ? ?/sec    1.00      5.6±0.07ms        ? ?/sec
physical_plan_tpch_q2                         1.00      7.4±0.08ms        ? ?/sec    1.03      7.6±0.10ms        ? ?/sec
physical_plan_tpch_q20                        1.02      4.5±0.13ms        ? ?/sec    1.00      4.4±0.06ms        ? ?/sec
physical_plan_tpch_q21                        1.00      6.0±0.08ms        ? ?/sec    1.00      6.0±0.10ms        ? ?/sec
physical_plan_tpch_q22                        1.01      3.3±0.09ms        ? ?/sec    1.00      3.2±0.05ms        ? ?/sec
physical_plan_tpch_q3                         1.00      3.0±0.04ms        ? ?/sec    1.01      3.0±0.07ms        ? ?/sec
physical_plan_tpch_q4                         1.00      2.2±0.02ms        ? ?/sec    1.02      2.2±0.04ms        ? ?/sec
physical_plan_tpch_q5                         1.00      4.2±0.06ms        ? ?/sec    1.02      4.3±0.09ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1449.0±66.64µs        ? ?/sec    1.01  1461.8±26.90µs        ? ?/sec
physical_plan_tpch_q7                         1.00      5.3±0.09ms        ? ?/sec    1.01      5.3±0.08ms        ? ?/sec
physical_plan_tpch_q8                         1.00      6.8±0.09ms        ? ?/sec    1.02      6.9±0.09ms        ? ?/sec
physical_plan_tpch_q9                         1.00      5.2±0.08ms        ? ?/sec    1.01      5.2±0.09ms        ? ?/sec
physical_select_all_from_1000                 1.00     61.3±0.26ms        ? ?/sec    1.01     61.7±0.24ms        ? ?/sec
physical_select_one_from_700                  1.00      3.6±0.04ms        ? ?/sec    1.01      3.6±0.03ms        ? ?/sec

self.max_cycles
}

/// Runs a rewrite cycle on the given [TreeNode] using the given callback function to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please define how a "cycle" and "iteration" are related in this function? They are used in the apis below but I didn't see a definition of the terms and how they are related

I think it is that a cycle is composed of several iterations

Alternately, maybe "rewrites" is a better term than "iteration" as this is a "rewrite cycle" 🤔

#[derive(Debug)]
pub struct RewriteCycleState<Node: TreeNode> {
node: Node,
consecutive_unchanged_count: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would help me follow the code better if these fields used the terms "cycle" and "iteration" consistently

Like

    consecutive_unchanged_iterations: usize,
    iteration_count: usize,
    // the number of iterations in each cycle (set after the first cycle)
    cycle_length: Option<usize>,

self.consecutive_unchanged_count >= cycle_length
}

/// Finishes the iteration by consuming the state and returning a [TreeNode] and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this finishes the cycle right (not the iteration?)

Copy link
Contributor Author

@erratic-pattern erratic-pattern Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this finishes the cycle right (not the iteration?)

it finishes all cycles. as in, it runs all of the remaining iterations and terminates the loop. I will reword to clarify

}
// run remaining cycles
match (1..self.max_cycles).try_fold(state, |state, _| f(state)) {
ControlFlow::Break(result) => result?.finish(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this suggestion locally and it seems to work well 👍

Suggested change
ControlFlow::Break(result) => result?.finish(),
ControlFlow::Break(result) => return result?.finish(),

@alamb
Copy link
Contributor

alamb commented Jun 10, 2024

I fixed a clippy lint and merged up from main to get the latest changes into this PR

@erratic-pattern
Copy link
Contributor Author

I ran the planning benchmark with this brach an in summary there is basically no performance change

Yes that is what I observed as well. I assume the overhead offsets any potential improvement from skipping steps here. I think the value of this API comes from the potential reuseability of the logic for the entire optimizer. It is not ready to do that in its current state, but with some improvements it could be used across the optimizer.

@erratic-pattern
Copy link
Contributor Author

I will update this PR with review suggestions (maybe) this weekend

@alamb
Copy link
Contributor

alamb commented Jun 12, 2024

Marking as draft as feedback is incorporated

@alamb alamb marked this pull request as draft June 12, 2024 22:21
@erratic-pattern erratic-pattern force-pushed the adam/loop-expr-simplifier-static-dispatch branch from d64dbe3 to f5d6ed4 Compare June 16, 2024 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core datafusion crate optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support simplification that requires multiple applications of constant folding / simplification
3 participants