-
Notifications
You must be signed in to change notification settings - Fork 1.8k
chore: Enforce lint rule clippy::needless_pass_by_value to datafusion-datasource
#18682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Enforce lint rule clippy::needless_pass_by_value to datafusion-datasource
#18682
Conversation
…on-datasource This commit enforces the `clippy::needless_pass_by_value` lint rule to prevent unnecessary data clones and improve performance in the datafusion-datasource crate. Changes: - Added lint rule to datafusion/datasource/src/mod.rs - Fixed 11 violations across 5 files by changing pass-by-value to pass-by-reference - Updated callers in datafusion-core and datafusion-catalog-listing Fixes apache#18611 Part of apache#18503
clippy::needless_pass_by_value to datafusi…clippy::needless_pass_by_value to datafusion-datasource
2010YOUY01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on it. I suggest to revert several public API fixes, otherwise it looks good.
| pub fn project( | ||
| &mut self, | ||
| file_batch: RecordBatch, | ||
| file_batch: &RecordBatch, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to revert the change and suppress it with #[expect(clippy::needless_pass_by_value)], due to
- This is a public API, it's better to avoid changing it so that the downstream datafusion dependents don't have to update it during ugprades
- Cloning
RecordBatchis a shallow clone for inner heavy payloads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| } | ||
|
|
||
| /// Create a new execution plan from a list of constant values (`ValuesExec`) | ||
| pub fn try_new_as_values( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
datafusion/datasource/src/memory.rs
Outdated
| /// batches are provided. | ||
| pub fn try_new_from_batches( | ||
| schema: SchemaRef, | ||
| schema: &SchemaRef, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| Self::new(&min_max_sort_order, &min_max_schema, &min_batch, &max_batch) | ||
| } | ||
|
|
||
| pub fn new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| /// A tuple containing: | ||
| /// * The processed file groups with their individual statistics attached | ||
| /// * The summary statistics across all file groups, aka all files summary statistics | ||
| pub fn compute_all_files_statistics( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
- Reverted public API changes to maintain stability - Added #[expect(clippy::needless_pass_by_value)] to public methods - Kept all internal/private function improvements - RecordBatch clone is shallow, performance impact is minimal
2010YOUY01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again.
The CI failure on linux might be due to #18692, on MacOS it's passing, so I don't think we should worry about it.
yay all tests were passing on my device!!! |
…on-datasource (apache#18682) ## Rationale for this change This PR enforces the `clippy::needless_pass_by_value` lint rule to prevent unnecessary data clones and improve performance in the `datafusion-datasource` crate. This is part of the effort tracked in apache#18503 to enforce this lint rule across all DataFusion crates. Functions that take ownership of values (pass-by-value) when they only need to read them force callers to `.clone()` data unnecessarily, which degrades performance. By changing these functions to accept references instead, we eliminate these unnecessary clones. ## What changes are included in this PR? - Added lint rule enforcement to `datafusion/datasource/src/mod.rs` - Fixed 11 violations of `clippy::needless_pass_by_value` across 5 files: - `file_scan_config.rs`: 2 fixes - `memory.rs`: 3 fixes - `source.rs`: 1 fix - `statistics.rs`: 4 fixes - `write/demux.rs`: 1 fix - Updated callers in `datafusion-core` and `datafusion-catalog-listing` to pass references ## Are these changes tested? Yes, all changes are tested: - ✅ All 82 unit tests pass (`cargo test -p datafusion-datasource`) - ✅ All 7 doc tests pass - ✅ Strict clippy checks pass with `-D warnings` - ✅ CI lint script passes (`./dev/rust_lint.sh`) - ✅ Dependent crates (`datafusion-catalog-listing`, `datafusion-core`) pass all tests and clippy checks Tests are covered by existing tests as this is a refactoring that changes internal function signatures without changing behavior. ## Are there any user-facing changes? No user-facing changes. All changes are internal to the `datafusion-datasource` crate. The public API remains unchanged - only internal function signatures were modified to accept references instead of owned values. Then at the bottom add: Fixes apache#18611 Part of apache#18503
Rationale for this change
This PR enforces the
clippy::needless_pass_by_valuelint rule to prevent unnecessary data clones and improve performance in thedatafusion-datasourcecrate. This is part of the effort tracked in #18503 to enforce this lint rule across all DataFusion crates.Functions that take ownership of values (pass-by-value) when they only need to read them force callers to
.clone()data unnecessarily, which degrades performance. By changing these functions to accept references instead, we eliminate these unnecessary clones.What changes are included in this PR?
datafusion/datasource/src/mod.rsclippy::needless_pass_by_valueacross 5 files:file_scan_config.rs: 2 fixesmemory.rs: 3 fixessource.rs: 1 fixstatistics.rs: 4 fixeswrite/demux.rs: 1 fixdatafusion-coreanddatafusion-catalog-listingto pass referencesAre these changes tested?
Yes, all changes are tested:
cargo test -p datafusion-datasource)-D warnings./dev/rust_lint.sh)datafusion-catalog-listing,datafusion-core) pass all tests and clippy checksTests are covered by existing tests as this is a refactoring that changes internal function signatures without changing behavior.
Are there any user-facing changes?
No user-facing changes. All changes are internal to the
datafusion-datasourcecrate. The public API remains unchanged - only internal function signatures were modified to accept references instead of owned values.Then at the bottom add:
Fixes #18611
Part of #18503