Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: run expression simplifier in a loop until a fixedpoint or 3 cycles #10358

Merged
merged 14 commits into from
May 7, 2024

Conversation

erratic-pattern
Copy link
Contributor

Which issue does this PR close?

Closes #1160.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

yes

Are there any user-facing changes?

no

@github-actions github-actions bot added optimizer Optimizer rules core Core datafusion crate labels May 2, 2024
@erratic-pattern
Copy link
Contributor Author

erratic-pattern commented May 2, 2024

The test failure here is caused by a bug in the log UDF simplifier.

I've filled an issue #10359 and submitted a PR #10360 to fix it.

@@ -107,6 +110,7 @@ impl<S: SimplifyInfo> ExprSimplifier<S> {
info,
guarantees: vec![],
canonicalize: true,
max_simplifier_iterations: DEFAULT_MAX_SIMPLIFIER_ITERATIONS,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now there's no reason to have a struct field, but the idea is that we might want to make this configurable later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could have an API to update it to mirror

    pub fn with_canonicalize(mut self, canonicalize: bool) -> Self {
        self.canonicalize = canonicalize;
        self
    }

Something like

    pub fn with_max_simplifier_iterations(mut self, max_simplifier_iterations: usize) -> Self {
        self.max_simplifier_iterations = max_simplifier_iterations;
        self
    }

Perhaps


i += 1;
if !transformed || i >= self.max_simplifier_iterations {
return Ok(expr);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning (Expr, usize) would be nice for testing. Especially if the number of iterations becomes a configurable parameter since that opens up the door for new simplification bugs; we might want to test that specifically X number of iterations runs on a given expression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could add a function like simplify_inner that did and then use that in tests

    pub fn simplify(&self, mut expr: Expr) -> Result<Expr> {
       self.simplify_inner(expr).0
    }

   /// Returns the simplified expr and the number of iterations required.
   fn simplify_inner(&self, mut expr: Expr) -> Result<(usize, Expr)> {

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @erratic-pattern -- this looks really neat.

Closes #1160.

One of the lower numbered tickets we have closed recently

This PR looks awesome to me. Once we get the CI sorted out I think it would be good to go. I left some suggestions on how to improve the tests and I do think we should add some coverage if possible

@@ -107,6 +110,7 @@ impl<S: SimplifyInfo> ExprSimplifier<S> {
info,
guarantees: vec![],
canonicalize: true,
max_simplifier_iterations: DEFAULT_MAX_SIMPLIFIER_ITERATIONS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could have an API to update it to mirror

    pub fn with_canonicalize(mut self, canonicalize: bool) -> Self {
        self.canonicalize = canonicalize;
        self
    }

Something like

    pub fn with_max_simplifier_iterations(mut self, max_simplifier_iterations: usize) -> Self {
        self.max_simplifier_iterations = max_simplifier_iterations;
        self
    }

Perhaps


i += 1;
if !transformed || i >= self.max_simplifier_iterations {
return Ok(expr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could add a function like simplify_inner that did and then use that in tests

    pub fn simplify(&self, mut expr: Expr) -> Result<Expr> {
       self.simplify_inner(expr).0
    }

   /// Returns the simplified expr and the number of iterations required.
   fn simplify_inner(&self, mut expr: Expr) -> Result<(usize, Expr)> {

datafusion/core/tests/simplification.rs Outdated Show resolved Hide resolved
@github-actions github-actions bot removed the core Core datafusion crate label May 2, 2024
assert_eq!(num_iter, 3);

// NOTE: this currently does not simplify
// (((c4 - 10) + 10) *100) / 100
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that one of @devinjdangelo 's examples did not simplify. I think we would need to rebalance parens in the simplifier or maybe the canonicalizer for that to work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the simplifier needs to be made more sophisticated to handle some of these cases It might be cool to add a ticket explaining the case

Maybe it is something like (<expr> + CONST) + CONST) --> <expr> + (CONST + CONST) (basically apply associative rules

let expected = lit(true);
let (expr, num_iter) = simplify_count(expr);
assert_eq!(expr, expected);
assert_eq!(num_iter, 3);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb I've added the other examples from the comments you linked. So far this one is the only one that runs up to 3 iterations.

@@ -323,6 +326,14 @@ impl<S: SimplifyInfo> ExprSimplifier<S> {
self.canonicalize = canonicalize;
self
}

pub fn with_max_simplifier_iterations(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs documentation

let Transformed { data, transformed, .. } = expr
.rewrite(&mut const_evaluator)?
.transform_data(|expr| expr.rewrite(&mut simplifier))?
.transform_data(|expr| expr.rewrite(&mut guarantee_rewriter))?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guarantee_rewriter and shorten_in_list_simplifier only ran once in the previous version of this code. Should we continue only running them once here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would probably be a good idea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would probably be a good idea

I can put shorten_in_list_simplifier at the end of the loop, since that's where it was previously. I am unsure where guarantee_rewriter is supposed to go as it previously ran once inbetween the simplifier/constant evaluator passes.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR may actually speed up planning too (as it will avoid re-simplifying expressions that don't need to be simplified.

I am running the benchmarks now and will post results

@erratic-pattern erratic-pattern force-pushed the adam/loop-expr-simplifier branch 3 times, most recently from c6efe00 to 559ad14 Compare May 3, 2024 01:34
@alamb
Copy link
Contributor

alamb commented May 3, 2024

My benchmark runs show this ma help q19 a bit. I'll rerun to be sure

++ critcmp main loop-expr-simplifier
group                                         loop-expr-simplifier                   main
-----                                         --------------------                   ----
logical_aggregate_with_join                   1.00  1211.9±11.64µs        ? ?/sec    1.00  1210.9±15.89µs        ? ?/sec
logical_plan_tpcds_all                        1.00    157.6±1.80ms        ? ?/sec    1.01    159.2±2.01ms        ? ?/sec
logical_plan_tpch_all                         1.00     16.9±0.21ms        ? ?/sec    1.00     17.0±0.20ms        ? ?/sec
logical_select_all_from_1000                  1.00     18.8±0.09ms        ? ?/sec    1.00     18.9±0.08ms        ? ?/sec
logical_select_one_from_700                   1.00   808.4±10.14µs        ? ?/sec    1.02    823.6±9.81µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   765.1±11.51µs        ? ?/sec    1.00   763.0±11.30µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00   751.1±20.24µs        ? ?/sec    1.00   753.6±26.51µs        ? ?/sec
physical_plan_tpcds_all                       1.00   1348.1±9.21ms        ? ?/sec    1.01   1357.7±7.90ms        ? ?/sec
physical_plan_tpch_all                        1.01     93.8±0.79ms        ? ?/sec    1.00     92.6±1.47ms        ? ?/sec
physical_plan_tpch_q1                         1.00      5.0±0.10ms        ? ?/sec    1.03      5.2±0.06ms        ? ?/sec
physical_plan_tpch_q10                        1.00      4.4±0.04ms        ? ?/sec    1.01      4.4±0.06ms        ? ?/sec
physical_plan_tpch_q11                        1.00      3.9±0.05ms        ? ?/sec    1.04      4.1±0.07ms        ? ?/sec
physical_plan_tpch_q12                        1.06      3.3±0.04ms        ? ?/sec    1.00      3.1±0.07ms        ? ?/sec
physical_plan_tpch_q13                        1.00      2.1±0.03ms        ? ?/sec    1.02      2.2±0.04ms        ? ?/sec
physical_plan_tpch_q14                        1.00      2.8±0.05ms        ? ?/sec    1.02      2.8±0.06ms        ? ?/sec
physical_plan_tpch_q16                        1.00      3.6±0.07ms        ? ?/sec    1.05      3.8±0.07ms        ? ?/sec
physical_plan_tpch_q17                        1.00      3.5±0.05ms        ? ?/sec    1.02      3.6±0.06ms        ? ?/sec
physical_plan_tpch_q18                        1.00      4.0±0.06ms        ? ?/sec    1.00      4.0±0.05ms        ? ?/sec
physical_plan_tpch_q19                        1.11      7.0±0.07ms        ? ?/sec    1.00      6.3±0.11ms        ? ?/sec
physical_plan_tpch_q2                         1.00      7.8±0.06ms        ? ?/sec    1.02      7.9±0.12ms        ? ?/sec
physical_plan_tpch_q20                        1.00      4.5±0.06ms        ? ?/sec    1.03      4.7±0.07ms        ? ?/sec
physical_plan_tpch_q21                        1.00      6.2±0.07ms        ? ?/sec    1.01      6.2±0.08ms        ? ?/sec
physical_plan_tpch_q22                        1.00      3.4±0.06ms        ? ?/sec    1.01      3.5±0.07ms        ? ?/sec
physical_plan_tpch_q3                         1.00      3.2±0.06ms        ? ?/sec    1.03      3.3±0.07ms        ? ?/sec
physical_plan_tpch_q4                         1.00      2.3±0.03ms        ? ?/sec    1.02      2.3±0.05ms        ? ?/sec
physical_plan_tpch_q5                         1.00      4.5±0.06ms        ? ?/sec    1.02      4.6±0.09ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1570.2±23.85µs        ? ?/sec    1.02  1596.3±26.21µs        ? ?/sec
physical_plan_tpch_q7                         1.00      5.8±0.06ms        ? ?/sec    1.02      5.9±0.07ms        ? ?/sec
physical_plan_tpch_q8                         1.00      7.3±0.07ms        ? ?/sec    1.02      7.4±0.12ms        ? ?/sec
physical_plan_tpch_q9                         1.00      5.4±0.07ms        ? ?/sec    1.05      5.7±0.07ms        ? ?/sec
physical_select_all_from_1000                 1.00     61.2±0.25ms        ? ?/sec    1.01     61.7±0.41ms        ? ?/sec
physical_select_one_from_700                  1.00      3.7±0.04ms        ? ?/sec    1.02      3.7±0.05ms        ? ?/sec

@alamb alamb changed the title feat: run expression simplifier in a loop feat: run expression simplifier in a loop until no changes May 3, 2024
@alamb
Copy link
Contributor

alamb commented May 3, 2024

I reran benchmarks and this PR certatainly doesn't slow things down and maybe makes it marginally faster

++ critcmp main loop-expr-simplifier
group                                         loop-expr-simplifier                   main
-----                                         --------------------                   ----
logical_aggregate_with_join                   1.02  1222.1±13.04µs        ? ?/sec    1.00  1203.3±14.34µs        ? ?/sec
logical_plan_tpcds_all                        1.00    158.8±1.61ms        ? ?/sec    1.00    159.2±1.57ms        ? ?/sec
logical_plan_tpch_all                         1.00     17.0±0.17ms        ? ?/sec    1.00     17.0±0.22ms        ? ?/sec
logical_select_all_from_1000                  1.00     18.8±0.15ms        ? ?/sec    1.00     18.8±0.14ms        ? ?/sec
logical_select_one_from_700                   1.00   815.8±10.12µs        ? ?/sec    1.00    818.8±8.69µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00    762.9±8.56µs        ? ?/sec    1.00   759.7±26.05µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00   747.9±10.40µs        ? ?/sec    1.00   748.6±22.28µs        ? ?/sec
physical_plan_tpcds_all                       1.00   1337.1±9.07ms        ? ?/sec    1.02   1364.5±7.75ms        ? ?/sec
physical_plan_tpch_all                        1.00     91.0±1.13ms        ? ?/sec    1.03     93.8±1.39ms        ? ?/sec
physical_plan_tpch_q1                         1.00      5.0±0.08ms        ? ?/sec    1.03      5.1±0.08ms        ? ?/sec
physical_plan_tpch_q10                        1.00      4.4±0.06ms        ? ?/sec    1.00      4.4±0.07ms        ? ?/sec
physical_plan_tpch_q11                        1.01      4.0±0.07ms        ? ?/sec    1.00      3.9±0.06ms        ? ?/sec
physical_plan_tpch_q12                        1.00      3.1±0.05ms        ? ?/sec    1.02      3.2±0.08ms        ? ?/sec
physical_plan_tpch_q13                        1.00      2.1±0.03ms        ? ?/sec    1.00      2.1±0.03ms        ? ?/sec
physical_plan_tpch_q14                        1.00      2.7±0.06ms        ? ?/sec    1.02      2.7±0.05ms        ? ?/sec
physical_plan_tpch_q16                        1.00      3.7±0.06ms        ? ?/sec    1.04      3.8±0.06ms        ? ?/sec
physical_plan_tpch_q17                        1.00      3.6±0.07ms        ? ?/sec    1.02      3.6±0.09ms        ? ?/sec
physical_plan_tpch_q18                        1.00      3.9±0.06ms        ? ?/sec    1.02      4.0±0.06ms        ? ?/sec
physical_plan_tpch_q19                        1.00      6.1±0.06ms        ? ?/sec    1.03      6.3±0.08ms        ? ?/sec
physical_plan_tpch_q2                         1.00      7.7±0.07ms        ? ?/sec    1.02      7.9±0.07ms        ? ?/sec
physical_plan_tpch_q20                        1.00      4.5±0.08ms        ? ?/sec    1.04      4.7±0.07ms        ? ?/sec
physical_plan_tpch_q21                        1.00      6.1±0.09ms        ? ?/sec    1.02      6.2±0.09ms        ? ?/sec
physical_plan_tpch_q22                        1.00      3.4±0.06ms        ? ?/sec    1.03      3.5±0.07ms        ? ?/sec
physical_plan_tpch_q3                         1.00      3.1±0.05ms        ? ?/sec    1.02      3.2±0.06ms        ? ?/sec
physical_plan_tpch_q4                         1.00      2.3±0.04ms        ? ?/sec    1.00      2.3±0.04ms        ? ?/sec
physical_plan_tpch_q5                         1.00      4.4±0.07ms        ? ?/sec    1.02      4.5±0.06ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1569.9±20.55µs        ? ?/sec    1.03  1612.7±22.24µs        ? ?/sec
physical_plan_tpch_q7                         1.00      5.7±0.08ms        ? ?/sec    1.02      5.8±0.06ms        ? ?/sec
physical_plan_tpch_q8                         1.00      7.4±0.17ms        ? ?/sec    1.01      7.4±0.08ms        ? ?/sec
physical_plan_tpch_q9                         1.00      5.5±0.07ms        ? ?/sec    1.01      5.6±0.06ms        ? ?/sec
physical_select_all_from_1000                 1.00     60.9±0.40ms        ? ?/sec    1.01     61.3±0.40ms        ? ?/sec
physical_select_one_from_700                  1.00      3.7±0.04ms        ? ?/sec    1.00      3.7±0.04ms        ? ?/sec

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @erratic-pattern -- this is looking good

Seems like we need:

  1. Some docs (per https://github.com/apache/datafusion/pull/10358/files#r1588554192)
  2. Remove the datafusion-function dependency

Otherwise this seems like it is almost ready to go

Thank you very much @erratic-pattern and @jayzhan211 -- something @rdettai and I discussed a long time ago is finally happening 🎉

datafusion/optimizer/Cargo.toml Outdated Show resolved Hide resolved
let expected = lit_bool_null();
let (expr, num_iter) = simplify_count(expr);
assert_eq!(expr, expected);
assert_eq!(num_iter, 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool

@alamb alamb marked this pull request as draft May 3, 2024 16:17
@alamb
Copy link
Contributor

alamb commented May 3, 2024

Marking as draft so it doesn't show up on the "ready to review list" as I think @erratic-pattern has some additional plans

@github-actions github-actions bot added the core Core datafusion crate label May 3, 2024
@erratic-pattern
Copy link
Contributor Author

I future proofed the naming a bit by renaming "iterations" to "cycles", because I can improve the algorithm a bit further to short-circuit mid-cycle and so we might later want to capture the actual "iteration" or "step" count

@erratic-pattern erratic-pattern requested a review from alamb May 3, 2024 20:15
@alamb alamb changed the title feat: run expression simplifier in a loop until no changes feat: run expression simplifier in a loop until no changes or 3 times May 5, 2024
@alamb alamb changed the title feat: run expression simplifier in a loop until no changes or 3 times feat: run expression simplifier in a loop until a fixedpoint or 3 cycles May 5, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @erratic-pattern -- I think this PR looks great!

I left some small suggestions for improvements but I think we could do them as follow on PRs (or never)

Prior to merging this PR I think we should file follow on tickets to track the additional potential improvements. I can do this next week if someone doesn't beat me to it.

  1. Improve the const evaluator / simplifier to report more accurately when it changed expressions
  2. Improve the "did anything change" detection logic to track how many simplifications are applied rather than iterations
  3. Simplification of expressions like ((c1 + 5) + 10)

let expr = cast(now(), DataType::Int64)
.lt(cast(to_timestamp(vec![lit(0)]), DataType::Int64) + lit(i64::MAX));
let expected = lit(true);
test_simplify_with_cycle_count(expr, expected, 3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@@ -323,6 +337,63 @@ impl<S: SimplifyInfo> ExprSimplifier<S> {
self.canonicalize = canonicalize;
self
}

/// Specifies the maximum number of simplification cycles to run.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

assert_eq!(num_iter, 3);

// NOTE: this currently does not simplify
// (((c4 - 10) + 10) *100) / 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the simplifier needs to be made more sophisticated to handle some of these cases It might be cool to add a ticket explaining the case

Maybe it is something like (<expr> + CONST) + CONST) --> <expr> + (CONST + CONST) (basically apply associative rules

@alamb alamb marked this pull request as ready for review May 5, 2024 11:18
@erratic-pattern
Copy link
Contributor Author

I've made a new algorithm for this that should in theory reduce the amount of work needed to be done by short-circuiting earlier once there is a consecutive sequence of unchanged expression trees equal to the number of rewriters. However, it ended up being harder than I thought to get actual performance improvements when comparing with local benchmarks. I tried several different approaches, but most were actually slower than the code that I have here in this PR.

I did eventually come up with something that could possibly compete with the simpler algorithm in this PR. You can compare the branch here

My guess is that the additional overhead of checking each iteration is actual significant when we are only running 3 rewriters. I think we would see bigger improvements in cases where the number of optimization rules is larger. Many of the approaches I tried required dynamic dispatch on the TreeNodeRewriters which was surprisingly a larger cost that I expected. The approach I ended up with avoids the dynamic dispatch which seems to be the main reason it's faster.

I would be interested in scaling this up to run across all of the OptimizationRules where we should see a bigger improvement. However, I don't think everything has fully migrated from try_optimize so we would have to wait for that, I think. Once that happens, it should be possible to generalize the new branch code to work with both OptimizationRules as well as TreeNodeRewriters .

@erratic-pattern
Copy link
Contributor Author

Benchmark results #10386 (comment)

@jayzhan211
Copy link
Contributor

when we are only running 3 rewriters.

After reading the code in gurantee_rewriter, I think we can even move it into simplifier. Not sure why it is an independent rule 🤔

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit to this branch and merged up from main. Once CI passes I intend to merge it in

@alamb alamb merged commit f0e96c6 into apache:main May 7, 2024
23 checks passed
@alamb
Copy link
Contributor

alamb commented May 7, 2024

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core datafusion crate optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support simplification that requires multiple applications of constant folding / simplification
3 participants