Implement/fix Eq and Hash for Expr and LogicalPlan #5421

mslapek · 2023-02-27T16:57:29Z

Which issue does this PR close?

Closes #5400.

Rationale for this change

Makes datafusion::logical_expr::Subquery to respect the requirements of std::cmp::Eq.

Because Expr contains Subquery, it also fixes Expr::eq(..).

What changes are included in this PR?

Fixed Eq for Expr.

Added PartialEq, Eq and Hash traits to LogicalPlan (because Expr contains LogicalPlan through Subquery).

Replaced the comparison of stringified plans with eq(...) comparison in optimizer.rs.

Are these changes tested?

No new tests - most of the PRs contents is generated by derive(..) macros.

Optimizer loop already has existing tests.

Are there any user-facing changes?

Added PartialEq, Eq and Hash traits to LogicalPlan and a few other structures.

Breaking change: Added dyn_eq and dyn_hash to UserDefinedLogicalNode.

alamb

Thank you for the contribution @mslapek -- this looks good except for user defined logical nodes and PartialEq for TableScan.

alamb · 2023-02-28T20:52:13Z

datafusion/common/src/dfschema.rs

+impl Hash for DFSchema {
+    fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
+        self.fields.hash(state);
+        self.metadata.len().hash(state); // HashMap is not hashable


I agree it is ok to just use the metadata's length to hash as it satisfies the EQ constraint

https://doc.rust-lang.org/std/hash/trait.Hash.html#hash-and-eq

alamb · 2023-02-28T20:53:33Z

datafusion/expr/src/logical_plan/plan.rs


+impl PartialEq for TableScan {
+    fn eq(&self, other: &Self) -> bool {
+        self.table_name == other.table_name


I think this also needs to check that the source is equal as well -- I think we can do so via https://doc.rust-lang.org/std/sync/struct.Arc.html#method.ptr_eq

This Arc::ptr_eq might be risky... Arc::ptr_eq and std::ptr::eq say that dyn trait comparisons are unreliable. 😕

Even clippy gives an error vtable_address_comparisons from correctness 🔞 category.

I suggest to reconsider the request about source comparison.

Upon consideration within a single plan, the table_name should be unique. I was originally worried about the case where two TableScan's that had different sources but all other fields are the same resulting in a false positive.

alamb · 2023-02-28T20:53:51Z

datafusion/expr/src/logical_plan/plan.rs

+impl Hash for TableScan {
+    fn hash<H: Hasher>(&self, state: &mut H) {
+        self.table_name.hash(state);
+        self.projection.hash(state);


Hash is ok that it doesn't also include source I think

alamb · 2023-02-28T20:57:43Z

datafusion/expr/src/logical_plan/plan.rs

+
+impl Eq for Extension {}
+
+impl Hash for Extension {


I am not sure about using the textual output for equality comparison as someone who has implemented Extension may have a constant output, for example. I think a safer (though backwards incompatible change) would be to make UserDefinedLogicalNode also be Hash and PartialEq

Like

trait UserDefinedLogicalNode: Hash + PartialEq { ... }

Couldn't add Hash + PartialEq, because UserDefinedLogicalNode must be object-safe.

Instead added dyn_eq and dyn_hash methods serving the same purpose.

alamb · 2023-02-28T20:57:59Z

datafusion/expr/src/logical_plan/plan.rs


 /// Subquery
-#[derive(Clone)]
+#[derive(Clone, PartialEq, Eq, Hash)]


alamb · 2023-02-28T20:58:55Z

datafusion/optimizer/src/optimizer.rs

            // instead
-            let new_plan_str = format!("{}", new_plan.display_indent());
-            if plan_str == new_plan_str {
+            if old_plan.as_ref() == &new_plan {


alamb

Thank you @mslapek

ursabot · 2023-03-03T12:33:08Z

Benchmark runs are scheduled for baseline = be6efbc and contender = 61fc514. 61fc514 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

mslapek · 2023-03-03T12:37:16Z

@alamb Thanks for the review! 🎉

github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules labels Feb 27, 2023

Implement/fix Eq and Hash for Expr and LogicalPlan

47b3a38

alamb reviewed Feb 28, 2023

View reviewed changes

github-actions bot added the core Core DataFusion crate label Mar 1, 2023

mslapek added 3 commits March 1, 2023 20:13

CR fix

369250d

Merge remote-tracking branch 'upstream/main' into fix-logic-plan-eq

61d686c

Fix merge from main

302a81e

alamb approved these changes Mar 2, 2023

View reviewed changes

alamb merged commit 61fc514 into apache:main Mar 3, 2023

alamb mentioned this pull request Mar 8, 2023

Minor: Improve docs for UserDefinedLogicalNode dyn_eq and dyn_hash #5515

Merged

Implement/fix Eq and Hash for Expr and LogicalPlan #5421

Implement/fix Eq and Hash for Expr and LogicalPlan #5421

Uh oh!

Conversation

mslapek commented Feb 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

ursabot commented Mar 3, 2023

Uh oh!

mslapek commented Mar 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mslapek commented Feb 27, 2023 •

edited

Loading