-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make OptimizerConfig a trait (#4631) (#4638) #4645
Conversation
@@ -1557,14 +1557,6 @@ impl SessionState { | |||
.register_catalog(config.default_catalog.clone(), default_catalog); | |||
} | |||
|
|||
let optimizer_config = OptimizerConfig::new().filter_null_keys( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this into the OptimizerConfig
passed at optimize time is the fix for #4638
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
]; | ||
if config.filter_null_keys { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the other half of the fix for #4638 - determine the rules dynamically
fn query_execution_start_time(&self) -> DateTime<Utc>; | ||
|
||
/// Returns false if the given rule should be skipped | ||
fn rule_enabled(&self, name: &str) -> bool; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I felt this formulation was more future proof than adding a specific function for enabling/disabling FilterNullJoins
/// Create optimizer config | ||
pub fn new() -> Self { | ||
Self { | ||
query_execution_start_time: Utc::now(), | ||
next_id: 0, // useful for generating things like unique subquery aliases | ||
next_id: AtomicUsize::new(1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use 1, as previously the implementation returned the value post-increment which the atomics don't provide
// TODO this should not be on the config, | ||
// it should be its own 'OptimizerState' or something) | ||
next_id: usize, | ||
next_id: AtomicUsize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this limited interior mutability is better than the potential confusion around rules taking a mutable OptimizerConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree -- cc @waynexia
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also good that OptimizerContext
doesn't implement Clone
which will discourage making copies that could get out of date
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍
.with_query_execution_start_time( | ||
self.execution_props.query_execution_start_time, | ||
); | ||
// TODO: Implement OptimizerContext directly on DataFrame (#4631) (#4626) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the ultimate motivation for this change, to allow a single config container, that then just implements the traits needed by the various sub-systems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having the sub crates depend on a traits that are implemented in the core trait sounds like a good idea to me
8ce5714
to
babfb99
Compare
I have some PR about I will wait for this PR merge, and then do them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a (really) nice improvement to me -- thank you @tustvold 🏆
I think we should gather some more input and leave this open a while before merging it even though I think the actual required API change downstream are relatively low
cc @andygrove @jackwener @yahoNanJing @Dandandan @thinkharderdev @xudong963
.with_query_execution_start_time( | ||
self.execution_props.query_execution_start_time, | ||
); | ||
// TODO: Implement OptimizerContext directly on DataFrame (#4631) (#4626) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having the sub crates depend on a traits that are implemented in the core trait sounds like a good idea to me
@@ -1557,14 +1557,6 @@ impl SessionState { | |||
.register_catalog(config.default_catalog.clone(), default_catalog); | |||
} | |||
|
|||
let optimizer_config = OptimizerConfig::new().filter_null_keys( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
// TODO this should not be on the config, | ||
// it should be its own 'OptimizerState' or something) | ||
next_id: usize, | ||
next_id: AtomicUsize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree -- cc @waynexia
// TODO this should not be on the config, | ||
// it should be its own 'OptimizerState' or something) | ||
next_id: usize, | ||
next_id: AtomicUsize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also good that OptimizerContext
doesn't implement Clone
which will discourage making copies that could get out of date
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change makes sense to me, thanks @tustvold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me
I plan to merge this in the next few hours unless anybody objects / needs more time to review |
Benchmark runs are scheduled for baseline = 8944581 and contender = ca8985e. ca8985e is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
🎉 really nice work @tustvold |
Which issue does this PR close?
Closes #4631
Closes #4638
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?