Skip to content

Conversation

@leiysky
Copy link
Contributor

@leiysky leiysky commented Mar 10, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

@vercel
Copy link

vercel bot commented Mar 10, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated
databend ⬜️ Ignored (Inspect) Mar 10, 2023 at 8:13AM (UTC)

@mergify mergify bot added the pr-chore this PR only has small changes that no need to record, like coding styles. label Mar 10, 2023
@leiysky leiysky marked this pull request as ready for review March 11, 2023 04:09
@BohuTANG BohuTANG merged commit 5a7820d into databendlabs:main Mar 11, 2023
@dusx1981
Copy link
Contributor

I did a test, and the rules can be classified and managed by modifying Rule.rs and heuristic.rs, so that all rule applications can be correctly mapped to each operator's own unique rule set. I originally submitted a PR today , but the logic of seeing all rule-related bitmaps was abandoned, so I closed the PR. As there are more and more rules, the impact on efficiency of each full traversal rule will become greater and greater. If necessary, I can resubmit a PR.
Here is the modified code snippet:

pub enum RuleID {
    // Rewrite rules
    // Filter
    NormalizeScalarFilter,
    NormalizeDisjunctiveFilter,
    PushDownFilterAggregate,
    PushDownFilterEvalScalar,
    PushDownFilterUnion,
    PushDownFilterJoin,
    PushDownFilterScan,
    PushDownFilterSort,
    EliminateFilter,
    MergeFilter,

    // EvalScalar
    EliminateEvalScalar,
    MergeEvalScalar,

    // Limit
    PushDownLimitUnion,
    PushDownLimitOuterJoin,
    RulePushDownLimitExpression,
    PushDownLimitSort,
    PushDownLimitAggregate,
    PushDownLimitScan,

    // Agg
    SplitAggregate,
    FoldCountAggregate,

    PushDownPrewhere,

    // Sort
    PushDownSortScan,
pub static DEFAULT_REWRITE_RULES: Lazy<Vec<RuleID>> = Lazy::new(|| {
    vec![
        // Filter
        RuleID::NormalizeDisjunctiveFilter,
        RuleID::NormalizeScalarFilter,
        RuleID::EliminateFilter,
        RuleID::MergeFilter,
        RuleID::PushDownFilterUnion,
        RuleID::PushDownFilterAggregate,
        RuleID::PushDownFilterSort,
        RuleID::PushDownFilterEvalScalar,
        RuleID::PushDownFilterJoin,
        RuleID::PushDownFilterScan,
        // EvalScalar
        RuleID::EliminateEvalScalar,
        RuleID::MergeEvalScalar,
        // Limit
        RuleID::PushDownLimitUnion,
        RuleID::RulePushDownLimitExpression,
        RuleID::PushDownLimitSort,
        RuleID::PushDownLimitAggregate,
        RuleID::PushDownLimitOuterJoin,
        RuleID::PushDownLimitScan,
        // Agg
        RuleID::FoldCountAggregate,
        RuleID::SplitAggregate,
        RuleID::PushDownPrewhere, /* PushDownPrwhere should be after all rules except PushDownFilterScan */
        // Sort
        RuleID::PushDownSortScan, // PushDownFilterScan should be after PushDownPrewhere
    ]
});

@leiysky leiysky deleted the fix-ruleset branch March 13, 2023 06:16
@leiysky
Copy link
Contributor Author

leiysky commented Mar 13, 2023

@dusx1981 Thanks for your concern.

I removed the logic about operator-specific rule sets because in the current heuristic optimizer, the order of rule application matters(and that's why we use a RuleList instead of RuleSet to store the rewrite rules). RuleSet cannot represent this semantic well and will bring confusion.

@dusx1981
Copy link
Contributor

dusx1981 commented Mar 13, 2023 via email

@dusx1981
Copy link
Contributor

dusx1981 commented Mar 13, 2023

@leiysky

@dusx1981 Thanks for your concern.

I removed the logic about operator-specific rule sets because in the current heuristic optimizer, the order of rule application matters(and that's why we use a RuleList instead of RuleSet to store the rewrite rules). RuleSet cannot represent this semantic well and will bring confusion.

After the rules are classified and managed, it is easier to add new rules later, coupled with the impact of efficiency, isn’t it necessary to optimize the application of its own rules for each operator?

optimized operator

  1. filter
  2. limit
  3. eval scala
  4. Agg
  5. Sort
  6. Scan
  7. ...

@dusx1981
Copy link
Contributor

dusx1981 commented Mar 14, 2023

@leiysky

@dusx1981 Thanks for your concern.

I removed the logic about operator-specific rule sets because in the current heuristic optimizer, the order of rule application matters(and that's why we use a RuleList instead of RuleSet to store the rewrite rules). RuleSet cannot represent this semantic well and will bring confusion.

Optimization of the heuristic optimizer

  1. Filter: Reduce the number of loops from 22 to 10
  2. EvalScalar: Reduce the number of loops from 22 to 2
  3. Limit: Reduce the number of loops from 22 to 6
  4. Agg: Reduce the number of loops from 22 to 2
  5. Sort: Reduce the number of loops from 22 to 1
  6. Scan: Reduce the number of loops from 22 to 0
  7. DumpTableScan: Reduce the number of loops from 22 to 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-chore this PR only has small changes that no need to record, like coding styles.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants