-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat: allow custom caching via logical node #18688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
datafusion/core/src/dataframe/mod.rs
Outdated
| }) | ||
| } | ||
|
|
||
| /// Cache DataFrame as a memory table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be updated to mention the custom cache path.
Something like:
Cache this DataFrame using the configured cache producer, falling back to an in-memory table when none is set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this. Updated the doc.
| Limit: skip=0, fetch=1 | ||
| TableScan: aggregate_test_100, fetch=1 | ||
| "### | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ); | |
| ); | |
| let df_results = df.collect().await?; | |
| let cached_df_results = cached_df.collect().await?; | |
| assert_eq!(&df_results, &cached_df_results); |
to test the physical plan too
Does it need a custom ExtensionPlanner too for that ?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think we need to cover physical plan for this case, as it would be up to user to provide it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be used as an example how to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we can do it as a follow up in examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, once this PR is approved and merged, I can work on writing up an example.
milenkovicm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looks good, will have a look in more details later.
few comments to start with
| Limit: skip=0, fetch=1 | ||
| TableScan: aggregate_test_100, fetch=1 | ||
| "### | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think we need to cover physical plan for this case, as it would be up to user to provide it
| pub trait CacheProducer: Debug + Sync + Send { | ||
| /// Create a custom logical node for caching | ||
| /// given a logical plan (of DF to cache). | ||
| fn create(&self, plan: LogicalPlan) -> Result<Arc<dyn UserDefinedLogicalNode>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should provide SessionState as parameter as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, updated the method signature to include session state.
| /// Interface for applying a custom caching strategy. | ||
| /// Implement this trait and register via [`SessionState`] | ||
| /// to create a custom logical node for caching. | ||
| pub trait CacheProducer: Debug + Sync + Send { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we explain that when cache producer is provided user should provide its own logical query planner to handle that specific case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, updated the doc for trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for discussion, do we need to create new trait for this or simple closure |SessionState, LogicalPlan| -> LogicalPlan would do?
…pache#18686) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> When suppressing certain clippy lint violations, use `expect` instead of `allow` to ensure there is an actual lint violation. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
…che#18693) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.50 to 2.62.51. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/taiki-e/install-action/releases">taiki-e/install-action's releases</a>.</em></p> <blockquote> <h2>2.62.51</h2> <ul> <li> <p>Update <code>typos@latest</code> to 1.39.2.</p> </li> <li> <p>Update <code>mise@latest</code> to 2025.11.4.</p> </li> <li> <p>Update <code>uv@latest</code> to 0.9.9.</p> </li> <li> <p>Update <code>protoc@latest</code> to 3.33.1.</p> </li> <li> <p>Update <code>just@latest</code> to 1.43.1.</p> </li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md">taiki-e/install-action's changelog</a>.</em></p> <blockquote> <h1>Changelog</h1> <p>All notable changes to this project will be documented in this file.</p> <p>This project adheres to <a href="https://semver.org">Semantic Versioning</a>.</p> <!-- raw HTML omitted --> <h2>[Unreleased]</h2> <h2>[2.62.51] - 2025-11-14</h2> <ul> <li> <p>Update <code>typos@latest</code> to 1.39.2.</p> </li> <li> <p>Update <code>mise@latest</code> to 2025.11.4.</p> </li> <li> <p>Update <code>uv@latest</code> to 0.9.9.</p> </li> <li> <p>Update <code>protoc@latest</code> to 3.33.1.</p> </li> <li> <p>Update <code>just@latest</code> to 1.43.1.</p> </li> </ul> <h2>[2.62.50] - 2025-11-12</h2> <ul> <li> <p>Update <code>wasmtime@latest</code> to 38.0.4.</p> </li> <li> <p>Update <code>coreutils@latest</code> to 0.4.0.</p> </li> </ul> <h2>[2.62.49] - 2025-11-09</h2> <ul> <li> <p>Update <code>cargo-binstall@latest</code> to 1.15.11.</p> </li> <li> <p>Update <code>cargo-auditable@latest</code> to 0.7.2.</p> </li> <li> <p>Update <code>vacuum@latest</code> to 0.20.2.</p> </li> </ul> <h2>[2.62.48] - 2025-11-08</h2> <ul> <li> <p>Update <code>mise@latest</code> to 2025.11.3.</p> </li> <li> <p>Update <code>cargo-audit@latest</code> to 0.22.0.</p> </li> <li> <p>Update <code>vacuum@latest</code> to 0.20.1.</p> </li> <li> <p>Update <code>uv@latest</code> to 0.9.8.</p> </li> <li> <p>Update <code>cargo-udeps@latest</code> to 0.1.60.</p> </li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/taiki-e/install-action/commit/0be4756f42223b67aa4b7df5effad59010cbf4b9"><code>0be4756</code></a> Release 2.62.51</li> <li><a href="https://github.com/taiki-e/install-action/commit/e1c1ebb6a3f58bc641d557ff58d8de1faa37a03d"><code>e1c1ebb</code></a> Update changelog</li> <li><a href="https://github.com/taiki-e/install-action/commit/d78637d17c2c9ab112cbdf9c6277eafa393f3ec0"><code>d78637d</code></a> Update <code>typos@latest</code> to 1.39.2</li> <li><a href="https://github.com/taiki-e/install-action/commit/107556f337c67c5f10c7af70e9324b346e6ac455"><code>107556f</code></a> Update <code>mise@latest</code> to 2025.11.4</li> <li><a href="https://github.com/taiki-e/install-action/commit/2913759b209bf66eb7759a04851e0da7684bf2a7"><code>2913759</code></a> Update <code>uv@latest</code> to 0.9.9</li> <li><a href="https://github.com/taiki-e/install-action/commit/31f5779141e1b4c6c3fbe44984c10450860022d6"><code>31f5779</code></a> Update <code>typos@latest</code> to 1.39.1</li> <li><a href="https://github.com/taiki-e/install-action/commit/146aadaacec340c79628ca7c4822a43924e760ed"><code>146aada</code></a> Update <code>protoc@latest</code> to 3.33.1</li> <li><a href="https://github.com/taiki-e/install-action/commit/8402573e586684a3b1c90042b6bbaecca4183d20"><code>8402573</code></a> Update <code>just@latest</code> to 1.43.1</li> <li>See full diff in <a href="https://github.com/taiki-e/install-action/compare/6cc14f7f2f4b3129aff07a8b071d2d4f2733465d...0be4756f42223b67aa4b7df5effad59010cbf4b9">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.39.1 to 1.39.2. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/crate-ci/typos/releases">crate-ci/typos's releases</a>.</em></p> <blockquote> <h2>v1.39.2</h2> <h2>[1.39.2] - 2025-11-13</h2> <h3>Fixes</h3> <ul> <li>Don't offer <code>entry</code> as a correction for <code>entrys</code></li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/crate-ci/typos/blob/master/CHANGELOG.md">crate-ci/typos's changelog</a>.</em></p> <blockquote> <h1>Change Log</h1> <p>All notable changes to this project will be documented in this file.</p> <p>The format is based on <a href="https://keepachangelog.com/">Keep a Changelog</a> and this project adheres to <a href="https://semver.org/">Semantic Versioning</a>.</p> <!-- raw HTML omitted --> <h2>[Unreleased] - ReleaseDate</h2> <h2>[1.39.2] - 2025-11-13</h2> <h3>Fixes</h3> <ul> <li>Don't offer <code>entry</code> as a correction for <code>entrys</code></li> </ul> <h2>[1.39.1] - 2025-11-12</h2> <h3>Features</h3> <ul> <li>Make <code>--help</code> more vibrant</li> </ul> <h2>[1.39.0] - 2025-10-31</h2> <h3>Features</h3> <ul> <li>Updated the dictionary with the <a href="https://redirect.github.com/crate-ci/typos/issues/1383">October 2025</a> changes</li> </ul> <h3>Fixes</h3> <ul> <li>When a typo is pluralized, prefer pluralized corrections</li> </ul> <h2>[1.38.1] - 2025-10-07</h2> <h3>Fixes</h3> <ul> <li>Ignore common golang identifiers</li> </ul> <h2>[1.38.0] - 2025-10-06</h2> <h3>Features</h3> <ul> <li>Update type list</li> </ul> <h3>Fixes</h3> <ul> <li>Don't correct <code>typ</code></li> <li>Consistently error on unused config fields</li> </ul> <h2>[1.37.3] - 2025-10-06</h2> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/crate-ci/typos/commit/626c4bedb751ce0b7f03262ca97ddda9a076ae1c"><code>626c4be</code></a> chore: Release</li> <li><a href="https://github.com/crate-ci/typos/commit/c6b458db05d00c3037bc9a1102b84febc9fff2f4"><code>c6b458d</code></a> docs: Update changelog</li> <li><a href="https://github.com/crate-ci/typos/commit/eed04198a67af7f32b16141261aa8f911cba1f5f"><code>eed0419</code></a> Merge pull request <a href="https://redirect.github.com/crate-ci/typos/issues/1423">#1423</a> from epage/entrys</li> <li><a href="https://github.com/crate-ci/typos/commit/40383f41a2f90743ef28b8c8b1c2d5a42b7651a0"><code>40383f4</code></a> fix(dict): Don't offer 'entry' as a correction for 'entrys'</li> <li>See full diff in <a href="https://github.com/crate-ci/typos/compare/1af53e3774f068183ffd0c7193eb061a2b65a531...626c4bedb751ce0b7f03262ca97ddda9a076ae1c">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…upport (apache#18630) 0## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes apache#18646 ## Rationale for this change Cleans up the plan by removing `CoalesceBatchesExec`. I do not expect any performance improvement. <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Making plans better to read and avoid useless `CoalesceBatchesExec` Also adds back fetch support by adding it to FilterExec ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
…ource` (apache#18697) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Part of apache#18613 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
…on-datasource (apache#18682) ## Rationale for this change This PR enforces the `clippy::needless_pass_by_value` lint rule to prevent unnecessary data clones and improve performance in the `datafusion-datasource` crate. This is part of the effort tracked in apache#18503 to enforce this lint rule across all DataFusion crates. Functions that take ownership of values (pass-by-value) when they only need to read them force callers to `.clone()` data unnecessarily, which degrades performance. By changing these functions to accept references instead, we eliminate these unnecessary clones. ## What changes are included in this PR? - Added lint rule enforcement to `datafusion/datasource/src/mod.rs` - Fixed 11 violations of `clippy::needless_pass_by_value` across 5 files: - `file_scan_config.rs`: 2 fixes - `memory.rs`: 3 fixes - `source.rs`: 1 fix - `statistics.rs`: 4 fixes - `write/demux.rs`: 1 fix - Updated callers in `datafusion-core` and `datafusion-catalog-listing` to pass references ## Are these changes tested? Yes, all changes are tested: - ✅ All 82 unit tests pass (`cargo test -p datafusion-datasource`) - ✅ All 7 doc tests pass - ✅ Strict clippy checks pass with `-D warnings` - ✅ CI lint script passes (`./dev/rust_lint.sh`) - ✅ Dependent crates (`datafusion-catalog-listing`, `datafusion-core`) pass all tests and clippy checks Tests are covered by existing tests as this is a refactoring that changes internal function signatures without changing behavior. ## Are there any user-facing changes? No user-facing changes. All changes are internal to the `datafusion-datasource` crate. The public API remains unchanged - only internal function signatures were modified to accept references instead of owned values. Then at the bottom add: Fixes apache#18611 Part of apache#18503
…18677) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes apache#18659. ## Rationale for this change Fix an edge case in `corr` and `NaN` <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - part of apache#17558 (comment) - ## Rationale for this change Forward port changes from branch-51 to main: - apache#18705 ## What changes are included in this PR? Bring changes from this PR to main - apache#18705 ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> Part of apache#12725 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Prefer to avoid user_defined for consistency in function definitions. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Refactor signatures of crc32 & sha1 to use coercion API instead of being user defined. Also add support for FixedSizeBinary inputs to these functions. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Existing tests ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> No. <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
## Which issue does this PR close? Partially implements apache#15914 ## Rationale for this change Spark has support for cosecant https://spark.apache.org/docs/latest/api/sql/index.html#csc . This function is not there othere in DB's like postgres, mysql and sqlite3. Hence I have added this function into datafusion-spark ## What changes are included in this PR? Adds support for cosecant in datafusion-spark ## Are these changes tested? Unit tests with inputs/outputs obtained from spark ## Are there any user-facing changes? Yes
…8709) Closes apache#18692 (hopefully) Trying to get CI to pass consistently, try various techniques.
…apache#18715) ## Which issue does this PR close? - Part of parent issue apache#18503 ## What changes are included in this PR? enforce clippy lint `needless_pass_by_value` to `datafusion-proto` ## Are these changes tested? yes ## Are there any user-facing changes? no
…apache#18714) ## Which issue does this PR close? - Part of parent issue apache#18503 ## What changes are included in this PR? enforce clippy lint `needless_pass_by_value` to `datafusion-spark` ## Are these changes tested? yes ## Are there any user-facing changes? no
milenkovicm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @jizezhang i have mostly cosmetic comments, please tell me what you think
| } | ||
|
|
||
| /// Register a [`CacheProducer`] to provide custom caching strategy | ||
| pub fn with_cache_producer(self, cache_producer: Arc<dyn CacheProducer>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to call it cache_factory rather than cache_producer to align with other similar methods like function_factory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to expose this on session context, its not going to be used that often, maybe just having it on session state would be enough ?
| &self, | ||
| plan: LogicalPlan, | ||
| session_state: &SessionState, | ||
| ) -> Result<Arc<dyn UserDefinedLogicalNode>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it would make sense to return LogicalNode here, so we do not limit just on UserDefinedLogicalNode?
| /// It will be invoked on `CREATE FUNCTION` statements. | ||
| /// thus, changing dialect o PostgreSql is required | ||
| function_factory: Option<Arc<dyn FunctionFactory>>, | ||
| cache_producer: Option<Arc<dyn CacheProducer>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same like previous, cache_factory maybe?
| /// Interface for applying a custom caching strategy. | ||
| /// Implement this trait and register via [`SessionState`] | ||
| /// to create a custom logical node for caching. | ||
| pub trait CacheProducer: Debug + Sync + Send { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for discussion, do we need to create new trait for this or simple closure |SessionState, LogicalPlan| -> LogicalPlan would do?
Which issue does this PR close?
DataFrame.cache()does not work in distributed environments #17297.Rationale for this change
See #17297.
What changes are included in this PR?
SessionStateto enable registering aCacheProducerthat creates a logical node for caching.DataFrame::cache()to apply logical node for caching on top of original dataframe plan if cache producer is supplied, otherwise use current implementation as is.Are these changes tested?
Yes
Are there any user-facing changes?
Change in
SessionStateis user-facing.