-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Implement/fix Eq and Hash for Expr and LogicalPlan #5421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution @mslapek -- this looks good except for user defined logical nodes and PartialEq for TableScan.
| impl Hash for DFSchema { | ||
| fn hash<H: std::hash::Hasher>(&self, state: &mut H) { | ||
| self.fields.hash(state); | ||
| self.metadata.len().hash(state); // HashMap is not hashable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it is ok to just use the metadata's length to hash as it satisfies the EQ constraint
https://doc.rust-lang.org/std/hash/trait.Hash.html#hash-and-eq
|
|
||
| impl PartialEq for TableScan { | ||
| fn eq(&self, other: &Self) -> bool { | ||
| self.table_name == other.table_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also needs to check that the source is equal as well -- I think we can do so via https://doc.rust-lang.org/std/sync/struct.Arc.html#method.ptr_eq
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Arc::ptr_eq might be risky... Arc::ptr_eq and std::ptr::eq say that dyn trait comparisons are unreliable. 😕
Even clippy gives an error vtable_address_comparisons from correctness 🔞 category.
I suggest to reconsider the request about source comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon consideration within a single plan, the table_name should be unique. I was originally worried about the case where two TableScan's that had different sources but all other fields are the same resulting in a false positive.
| impl Hash for TableScan { | ||
| fn hash<H: Hasher>(&self, state: &mut H) { | ||
| self.table_name.hash(state); | ||
| self.projection.hash(state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hash is ok that it doesn't also include source I think
|
|
||
| impl Eq for Extension {} | ||
|
|
||
| impl Hash for Extension { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about using the textual output for equality comparison as someone who has implemented Extension may have a constant output, for example. I think a safer (though backwards incompatible change) would be to make UserDefinedLogicalNode also be Hash and PartialEq
Like
trait UserDefinedLogicalNode: Hash + PartialEq {
...
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't add Hash + PartialEq, because UserDefinedLogicalNode must be object-safe.
Instead added dyn_eq and dyn_hash methods serving the same purpose.
|
|
||
| /// Subquery | ||
| #[derive(Clone)] | ||
| #[derive(Clone, PartialEq, Eq, Hash)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
| // instead | ||
| let new_plan_str = format!("{}", new_plan.display_indent()); | ||
| if plan_str == new_plan_str { | ||
| if old_plan.as_ref() == &new_plan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 nice
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mslapek
|
Benchmark runs are scheduled for baseline = be6efbc and contender = 61fc514. 61fc514 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
|
@alamb Thanks for the review! 🎉 |
Which issue does this PR close?
Closes #5400.
Rationale for this change
Makes datafusion::logical_expr::Subquery to respect the requirements of std::cmp::Eq.
Because
ExprcontainsSubquery, it also fixesExpr::eq(..).What changes are included in this PR?
Fixed
EqforExpr.Added
PartialEq,EqandHashtraits toLogicalPlan(becauseExprcontainsLogicalPlanthroughSubquery).Replaced the comparison of stringified plans with
eq(...)comparison inoptimizer.rs.Are these changes tested?
No new tests - most of the PRs contents is generated by
derive(..)macros.Optimizer loop already has existing tests.
Are there any user-facing changes?
Added
PartialEq,EqandHashtraits toLogicalPlanand a few other structures.Breaking change: Added
dyn_eqanddyn_hashtoUserDefinedLogicalNode.