Skip to content

Conversation

@mslapek
Copy link
Contributor

@mslapek mslapek commented Feb 27, 2023

Which issue does this PR close?

Closes #5400.

Rationale for this change

Makes datafusion::logical_expr::Subquery to respect the requirements of std::cmp::Eq.

Because Expr contains Subquery, it also fixes Expr::eq(..).

What changes are included in this PR?

Fixed Eq for Expr.

Added PartialEq, Eq and Hash traits to LogicalPlan (because Expr contains LogicalPlan through Subquery).

Replaced the comparison of stringified plans with eq(...) comparison in optimizer.rs.

Are these changes tested?

No new tests - most of the PRs contents is generated by derive(..) macros.

Optimizer loop already has existing tests.

Are there any user-facing changes?

Added PartialEq, Eq and Hash traits to LogicalPlan and a few other structures.

Breaking change: Added dyn_eq and dyn_hash to UserDefinedLogicalNode.

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules labels Feb 27, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution @mslapek -- this looks good except for user defined logical nodes and PartialEq for TableScan.

impl Hash for DFSchema {
fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
self.fields.hash(state);
self.metadata.len().hash(state); // HashMap is not hashable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is ok to just use the metadata's length to hash as it satisfies the EQ constraint

https://doc.rust-lang.org/std/hash/trait.Hash.html#hash-and-eq


impl PartialEq for TableScan {
fn eq(&self, other: &Self) -> bool {
self.table_name == other.table_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this also needs to check that the source is equal as well -- I think we can do so via https://doc.rust-lang.org/std/sync/struct.Arc.html#method.ptr_eq

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Arc::ptr_eq might be risky... Arc::ptr_eq and std::ptr::eq say that dyn trait comparisons are unreliable. 😕

Even clippy gives an error vtable_address_comparisons from correctness 🔞 category.

I suggest to reconsider the request about source comparison.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon consideration within a single plan, the table_name should be unique. I was originally worried about the case where two TableScan's that had different sources but all other fields are the same resulting in a false positive.

impl Hash for TableScan {
fn hash<H: Hasher>(&self, state: &mut H) {
self.table_name.hash(state);
self.projection.hash(state);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hash is ok that it doesn't also include source I think


impl Eq for Extension {}

impl Hash for Extension {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about using the textual output for equality comparison as someone who has implemented Extension may have a constant output, for example. I think a safer (though backwards incompatible change) would be to make UserDefinedLogicalNode also be Hash and PartialEq

Like

trait UserDefinedLogicalNode: Hash + PartialEq {
...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't add Hash + PartialEq, because UserDefinedLogicalNode must be object-safe.

Instead added dyn_eq and dyn_hash methods serving the same purpose.


/// Subquery
#[derive(Clone)]
#[derive(Clone, PartialEq, Eq, Hash)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

// instead
let new_plan_str = format!("{}", new_plan.display_indent());
if plan_str == new_plan_str {
if old_plan.as_ref() == &new_plan {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice

@github-actions github-actions bot added the core Core DataFusion crate label Mar 1, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mslapek

@alamb alamb merged commit 61fc514 into apache:main Mar 3, 2023
@ursabot
Copy link

ursabot commented Mar 3, 2023

Benchmark runs are scheduled for baseline = be6efbc and contender = 61fc514. 61fc514 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@mslapek
Copy link
Contributor Author

mslapek commented Mar 3, 2023

@alamb Thanks for the review! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

std::cmp::Eq reflexivity assumptions are violated

3 participants