-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Reduce Expr copies in ParquetExec
#4283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
97c82a3 to
a26eb57
Compare
a26eb57 to
27a8e06
Compare
| metrics: ExecutionPlanMetricsSet, | ||
| /// Optional predicate for row filtering during parquet scan | ||
| predicate: Option<Expr>, | ||
| predicate: Option<Arc<Expr>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without these Arc these predicates get copied once for each parquet file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these predicates get copied once for each parquet file
Just question: I can not find the code clone for each file🤔, but this improvement is reasonable👍.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that
impl FileOpener for ParquetOpener {
fn open(
&self,
_: Arc<dyn ObjectStore>,
file_meta: FileMeta,
) -> Result<FileOpenFuture> {which clones the predicates, is called once per file
xudong963
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
|
Benchmark runs are scheduled for baseline = 1bcb333 and contender = eac254c. eac254c is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Draft as it builds on #4279
Which issue does this PR close?
Re: #4020
Rationale for this change
IOx makes parquet plans that have 100s of parquet files with non trivial predicates. The current code results in copying each predicate several times per file (in the pruning predicate, in the normal predicate, etc)
#4279 makes it even worse
Let's not do so much copying
What changes are included in this PR?
Sprinkles Arcs around to avoid so many copies.
Are these changes tested?
Yes, by existing tests and the Rust type system
Are there any user-facing changes?
No