Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Array specific rewrites to datafusion_functions_array AnalyzerRule #9557

Closed
wants to merge 7 commits into from

Conversation

jayzhan211
Copy link
Contributor

@jayzhan211 jayzhan211 commented Mar 11, 2024

Which issue does this PR close?

Closes #9519

Rationale for this change

What changes are included in this PR?

Use LinkedList for analyzer rules, unlike udfs, we need to preserve order and duplication is acceptable.
Move AnalyzerRule into datafusion_expr, and register the rule into optimizer through SessionState

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules core Core datafusion crate labels Mar 11, 2024
@jayzhan211 jayzhan211 marked this pull request as ready for review March 11, 2024 14:49
@jayzhan211 jayzhan211 requested a review from alamb March 11, 2024 14:49
@alamb alamb changed the title Enable to register analyzer rule Move Array specific rewrites to datafusion_functions_array AnalyzerRule Mar 12, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @jayzhan211 -- this is looking really nice.

My biggest suggestion after seeing the code in this PR is about the API design (rather than exposing the entire AnalyzerRule maybe we can simply expose a RewriteRule)

Let me know what you think. I can also find time to help with this project too if it doesn't make sense to you

@@ -2144,6 +2128,25 @@ impl FunctionRegistry for SessionState {
})
}

/// Add `analyzer_rule` to the `index` of the list of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would a user know what index to add I wonder 🤔

Also, if the built in analyzer rules were changed, then any hard coded index might become incorrect

Instead of this API, what if we followed the model of the physical_optimizers() APIL #9575

Specifically, we added a SesssionContext::analyzer_rules() API that returned all the existing rules? Using that and with_analyzer_rules the user could put the rules wherever they wanted

For example

let ctx = SessionContext::new();

let mut analyzer_rules = ctx.analyzer_rules();
// insert new rule at index 3
analyzer_rules.insert(3, new_rule);
// update rules in session context
let ctx = ctx.with_analyzer_rules(analyzer_rules);

(I think adding access to the analyzer rules makes sense even if we go with a different approach for rewrite)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, the example above is easier for the user to overwrite the rules. Let me add analyzer_rules to return the existing rules.

btw, is #9575 a correct link?

Copy link
Contributor Author

@jayzhan211 jayzhan211 Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think i need to move analyzer_rules and with_analyzer_rules under FunctionRegistry, so it can be called in register_all

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, is #9575 a correct link?

No sorry -- I meant to link to the SessionCOntext::optimizers() API. sorry about that

@@ -2179,6 +2182,21 @@ impl FunctionRegistry for SessionState {
fn deregister_udwf(&mut self, name: &str) -> Result<Option<Arc<WindowUDF>>> {
Ok(self.window_functions.remove(name))
}

fn deregister_analyzer_rule(&mut self, index: usize) -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we added SessionContext::analyzer_rules I don't think this API is necessary -- instead the user could just override the existing list.

@@ -90,5 +92,19 @@ pub fn register_all(registry: &mut dyn FunctionRegistry) -> Result<()> {
}
Ok(()) as Result<()>
})?;

let rules: Vec<(AnalyzerRuleRef, usize)> = vec![
// The order here should be adjusted based on the rules defined in the optimizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems quite brittle as it would potentially break if either the users introduced their own analyzer rules or we introduced new ones in DataFusion. I think the RewriteRule API might avoid this potential problem.

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
let mut rules = registry.analyzer_rules();
// OperatorToFunction should be run before TypeCoercion, since it rewrite based on the argument types (List or Scalar),
// and TypeCoercion may cast the argument types from Scalar to List.
rules.insert(1, Arc::new(OperatorToFunction::new()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't notice that vec has insert...

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211
Copy link
Contributor Author

Thank you so much @jayzhan211 -- this is looking really nice.

My biggest suggestion after seeing the code in this PR is about the API design (rather than exposing the entire AnalyzerRule maybe we can simply expose a RewriteRule)

Let me know what you think. I can also find time to help with this project too if it doesn't make sense to you

I change the design, feel free to improve or change the code if you have time to help, thanks

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jayzhan211 - I will try and make a proposal shortly

@@ -2156,6 +2138,16 @@ impl FunctionRegistry for SessionState {
})
}

/// Override the `AnalyzerRule`s optimizer plan rules.
fn with_analyzer_rules(&mut self, rules: Vec<AnalyzerRuleRef>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@alamb
Copy link
Contributor

alamb commented Mar 12, 2024

Thank you so much @jayzhan211 -- this is looking really nice.
My biggest suggestion after seeing the code in this PR is about the API design (rather than exposing the entire AnalyzerRule maybe we can simply expose a RewriteRule)
Let me know what you think. I can also find time to help with this project too if it doesn't make sense to you

I change the design, feel free to improve or change the code if you have time to help, thanks

@jayzhan211 here is the alternate design I was hinting at -- #9583

Please let me know what you think if you have time

@alamb alamb closed this in #9583 Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core datafusion crate logical-expr Logical plan and expressions optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move datafusion_array_function specific rewrite rules like to datafusion_functions_array crate
2 participants