Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encapsulate EquivalenceClass into a struct #8034

Merged
merged 11 commits into from
Nov 14, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Nov 2, 2023

Which issue does this PR close?

related to #8006

Rationale for this change

As I proposed in #8006 (comment) I think making stucts rather than typedefs leads to easier to maintain code as the functionality is easier to find and it allows / encourages more documentation and encapsulation

This also was a good excuse for me to try and work with the code some

What changes are included in this PR?

  1. Make EquivalenceClass a real structure rather than a typedef
  2. Add docs

I actually think maybe the right thing to do is to call this thing PhysicalExprSet or something as it is more general than just equivalence classes

I plan to move all the functions into a new module rather than calling functions in utils as a follow on PR

Are these changes tested?

Existing tests

Are there any user-facing changes?

yes, since the APIs in question are pub this is technically a breaking API change. However, given they are all quite new (within this week) and not yet released, I don't think there will be as much direct impact.

@alamb alamb marked this pull request as draft November 2, 2023 18:24
@github-actions github-actions bot added development-process Related to development process of DataFusion physical-expr Physical Expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Nov 2, 2023
@alamb alamb force-pushed the alamb/encapsulate_equivalence_class branch from d611a07 to 59c1f5c Compare November 3, 2023 17:27
@github-actions github-actions bot removed development-process Related to development process of DataFusion core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Nov 3, 2023
/// equality predicates (e.g. `a = b`), typically equi-join conditions and
/// equality conditions in filters.
#[derive(Debug, Clone)]
pub struct EquivalenceClass {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ozankabak and @mustafasrepo I am curious what you think about this change to encapsulate free functions into a struct in the equivalence code? If you like it, I can pull most of the rest of the code in datafusion/physical-expr/src/physical_expr.rs to this structure instead of free functions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this style, EquivalenceClass should be its own type. When comparing two equivalence classes, calling eq on this type is much more readable than calling a utility that operates on PhysicalExpr slices directly.

You seem to be keeping the lower-level functional primitives around, which is also good -- we can reuse them to create encapsulations like EquivalenceClass as we build more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to be keeping the lower-level functional primitives around, which is also good -- we can reuse them to create encapsulations like EquivalenceClass as we build more.

I did this mostly to keep the size of the initial diff down to make the proposal easier to review.

It seems to me like EquivalenceClass has very few functions actually related to equivalence calculations -- it is mostly a container of PhysicalExprs -- maybe it would be better named something like PhysicalExprList 🤔 But then the equivalence calculations would be less readable perhaps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. I see two options:

  1. Following your approach where we encapsulate free functions in structs and methods whenever we can define a coherent struct for them, and we also keep these free functions so that similar structs can share them.
  2. Create agnostic structs (like PhysicalExprList) and convert the free functions to their methods, and use type aliases like EquivalenceClass = PhysicalExprList to increase readability depending on the context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to propose with starting with EquivalenceClass into its own struct and we can keep the utility functions around for a while to see if they are reusable

@alamb alamb added the api change Changes the API exposed to users of the crate label Nov 10, 2023
@alamb alamb marked this pull request as ready for review November 10, 2023 19:24
@alamb alamb changed the title Encapsulate EquivalenceClass Encapsulate EquivalenceClass into a struct Nov 10, 2023
@alamb
Copy link
Contributor Author

alamb commented Nov 10, 2023

I believe this PR is ready for reivew @ozankabak and @mustafasrepo -- however, I believe it may cause non trivial merge conflicts with #8107. However, I think the style in this PR would make stuff like #8107 easier to implement 🤔

Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few notes for improvements, thank you. Don't worry about the merge conflicts with #8107, we can handle those.

Comment on lines 49 to 51
fn eq(&self, other: &Self) -> bool {
physical_exprs_equal(&self.inner, &other.inner)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mathematically, an equivalence class is a set of expressions. Therefore, eq implementation should use set equality. AFAIK we only check for list-equality during testing (and if we don't, that could be sign of a bug or a bad design!). Therefore, I suggest changing this eq_strict and implementing eq with set equality (and getting rid of eq_bag).

/// equality conditions in filters.
#[derive(Debug, Clone)]
pub struct EquivalenceClass {
inner: Vec<Arc<dyn PhysicalExpr>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe exprs or items is a better name here as this is the vector containing elements (expressions) that belong to this equivalence class.

datafusion/physical-expr/src/equivalence.rs Outdated Show resolved Hide resolved
}

/// Removes all duplicated exprs in this class
// TODO should we deduplicate on insert?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do a quick contains check on push, and avoid adding the expression if the former check returns true. And you can call deduplicate_physical_exprs in new, and there would be no need for a deduplicate function anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fits well with the definition of this being a set (rather than an ordered check). I will try it

}

// Create a new equivalence class from a pre-existing `Vec`
pub fn new_from_vec(inner: Vec<Arc<dyn PhysicalExpr>>) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a new_empty, why don't we simply call this new(exprs: Vec<Arc<dyn PhysicalExpr>>)?

@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2023

I implemented @ozankabak 's suggestion to treat EquivalenceClass actually like a class (rather than a vec with order). I think it is quite good now.

Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alamb

@alamb
Copy link
Contributor Author

alamb commented Nov 14, 2023

Any concerns with merging this @mustafasrepo ?

@mustafasrepo
Copy link
Contributor

Any concerns with merging this @mustafasrepo ?

Not at all. Thanks @alamb as always for this PR.

Copy link
Contributor

@mustafasrepo mustafasrepo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!.

@alamb alamb merged commit f390f15 into apache:main Nov 14, 2023
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants