-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encapsulate EquivalenceClass
into a struct
#8034
Encapsulate EquivalenceClass
into a struct
#8034
Conversation
d611a07
to
59c1f5c
Compare
/// equality predicates (e.g. `a = b`), typically equi-join conditions and | ||
/// equality conditions in filters. | ||
#[derive(Debug, Clone)] | ||
pub struct EquivalenceClass { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ozankabak and @mustafasrepo I am curious what you think about this change to encapsulate free functions into a struct in the equivalence code? If you like it, I can pull most of the rest of the code in datafusion/physical-expr/src/physical_expr.rs to this structure instead of free functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this style, EquivalenceClass
should be its own type. When comparing two equivalence classes, calling eq
on this type is much more readable than calling a utility that operates on PhysicalExpr
slices directly.
You seem to be keeping the lower-level functional primitives around, which is also good -- we can reuse them to create encapsulations like EquivalenceClass
as we build more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to be keeping the lower-level functional primitives around, which is also good -- we can reuse them to create encapsulations like EquivalenceClass as we build more.
I did this mostly to keep the size of the initial diff down to make the proposal easier to review.
It seems to me like EquivalenceClass
has very few functions actually related to equivalence calculations -- it is mostly a container of PhysicalExpr
s -- maybe it would be better named something like PhysicalExprList
🤔 But then the equivalence calculations would be less readable perhaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. I see two options:
- Following your approach where we encapsulate free functions in structs and methods whenever we can define a coherent struct for them, and we also keep these free functions so that similar structs can share them.
- Create agnostic structs (like
PhysicalExprList
) and convert the free functions to their methods, and use type aliases likeEquivalenceClass
=PhysicalExprList
to increase readability depending on the context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to propose with starting with EquivalenceClass into its own struct and we can keep the utility functions around for a while to see if they are reusable
EquivalenceClass
EquivalenceClass
into a struct
I believe this PR is ready for reivew @ozankabak and @mustafasrepo -- however, I believe it may cause non trivial merge conflicts with #8107. However, I think the style in this PR would make stuff like #8107 easier to implement 🤔 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few notes for improvements, thank you. Don't worry about the merge conflicts with #8107, we can handle those.
fn eq(&self, other: &Self) -> bool { | ||
physical_exprs_equal(&self.inner, &other.inner) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mathematically, an equivalence class is a set of expressions. Therefore, eq
implementation should use set equality. AFAIK we only check for list-equality during testing (and if we don't, that could be sign of a bug or a bad design!). Therefore, I suggest changing this eq_strict
and implementing eq
with set equality (and getting rid of eq_bag
).
/// equality conditions in filters. | ||
#[derive(Debug, Clone)] | ||
pub struct EquivalenceClass { | ||
inner: Vec<Arc<dyn PhysicalExpr>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe exprs
or items
is a better name here as this is the vector containing elements (expressions) that belong to this equivalence class.
} | ||
|
||
/// Removes all duplicated exprs in this class | ||
// TODO should we deduplicate on insert? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do a quick contains
check on push
, and avoid adding the expression if the former check returns true
. And you can call deduplicate_physical_exprs
in new
, and there would be no need for a deduplicate
function anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fits well with the definition of this being a set (rather than an ordered check). I will try it
} | ||
|
||
// Create a new equivalence class from a pre-existing `Vec` | ||
pub fn new_from_vec(inner: Vec<Arc<dyn PhysicalExpr>>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a new_empty
, why don't we simply call this new(exprs: Vec<Arc<dyn PhysicalExpr>>)
?
Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
…b/arrow-datafusion into alamb/encapsulate_equivalence_class
I implemented @ozankabak 's suggestion to treat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @alamb
Any concerns with merging this @mustafasrepo ? |
Not at all. Thanks @alamb as always for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!.
Which issue does this PR close?
related to #8006
Rationale for this change
As I proposed in #8006 (comment) I think making stucts rather than typedefs leads to easier to maintain code as the functionality is easier to find and it allows / encourages more documentation and encapsulation
This also was a good excuse for me to try and work with the code some
What changes are included in this PR?
EquivalenceClass
a real structure rather than a typedefI actually think maybe the right thing to do is to call this thing
PhysicalExprSet
or something as it is more general than just equivalence classesI plan to move all the functions into a new module rather than calling functions in
utils
as a follow on PRAre these changes tested?
Existing tests
Are there any user-facing changes?
yes, since the APIs in question are
pub
this is technically a breaking API change. However, given they are all quite new (within this week) and not yet released, I don't think there will be as much direct impact.