-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Compute Dynamic Filters only when a consumer supports them #18938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Compute Dynamic Filters only when a consumer supports them #18938
Conversation
ebc401a to
3b31893
Compare
| pub fn is_used(self: &Arc<Self>) -> bool { | ||
| // Strong count > 1 means at least one consumer is holding a reference beyond the producer. | ||
| Arc::strong_count(self) > 1 | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure how to test a condition where is_used() returns false without adding too much machinery or making the dynamic_filter attribute from HashJoin public which would make it easy to mess with the Arc reference count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can’t you just make a new DynamicFilterPhysicalExpr and check that is_used is False?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you have a test already?
adriangb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is draft but the current code looks good to me, I’ll approve once it’s ready :)
|
Something funky is going on here with the Arc count, some queries are not pushing down the filter to the probe because Arc count remains at 1 -> It doesn't happen in all the tests though, which is strange Edit: For the tests that fails seems like partition 0 never gets to see a I think this might be because we clone the In any case, seems like the strong_count approach might hit some edge cases here... |
Which issue does this PR close?
Closes #17527
Rationale for this change
Currently, DataFusion computes bounds for all queries that contain a HashJoinExec node whenever the option enable_dynamic_filter_pushdown is set to true (default). It might make sense to compute these bounds only when we explicitly know there is a consumer that will use them.
What changes are included in this PR?
As suggested in #17527 (comment), this PR adds an is_used() method to DynamicFilterPhysicalExpr that checks if any consumers are holding a reference to the filter using Arc::strong_count().
During filter pushdown, consumers that accept the filter in order to replace the node with the current filter and later use it execution, so a reference to Arc is retained. For example here it will keep a reference to the filters in the node.
Are these changes tested?
I added a unit test in dynamic_filters.rs (test_is_used) that verifies the Arc reference counting behavior.
Existing integration tests in datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs validate the end-to-end behavior. These tests verify that dynamic filters are computed and filled when consumers are present.
Are there any user-facing changes?
new is_used() function