Skip to content

Conversation

@jizezhang
Copy link
Contributor

@jizezhang jizezhang commented Nov 14, 2025

Which issue does this PR close?

Rationale for this change

See #17297.

What changes are included in this PR?

  • Added methods in SessionState to enable registering a CacheProducer that creates a logical node for caching.
  • Added branching in DataFrame::cache() to apply logical node for caching on top of original dataframe plan if cache producer is supplied, otherwise use current implementation as is.

Are these changes tested?

Yes

Are there any user-facing changes?

Change in SessionState is user-facing.

@github-actions github-actions bot added the core Core DataFusion crate label Nov 14, 2025
@jizezhang jizezhang marked this pull request as ready for review November 14, 2025 05:58
})
}

/// Cache DataFrame as a memory table.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be updated to mention the custom cache path.
Something like:

Cache this DataFrame using the configured cache producer, falling back to an in-memory table when none is set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. Updated the doc.

Limit: skip=0, fetch=1
TableScan: aggregate_test_100, fetch=1
"###
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
);
);
let df_results = df.collect().await?;
let cached_df_results = cached_df.collect().await?;
assert_eq!(&df_results, &cached_df_results);

to test the physical plan too
Does it need a custom ExtensionPlanner too for that ?!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we need to cover physical plan for this case, as it would be up to user to provide it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be used as an example how to do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we can do it as a follow up in examples

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, once this PR is approved and merged, I can work on writing up an example.

Copy link
Contributor

@milenkovicm milenkovicm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looks good, will have a look in more details later.
few comments to start with

Limit: skip=0, fetch=1
TableScan: aggregate_test_100, fetch=1
"###
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we need to cover physical plan for this case, as it would be up to user to provide it

pub trait CacheProducer: Debug + Sync + Send {
/// Create a custom logical node for caching
/// given a logical plan (of DF to cache).
fn create(&self, plan: LogicalPlan) -> Result<Arc<dyn UserDefinedLogicalNode>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should provide SessionState as parameter as well ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, updated the method signature to include session state.

/// Interface for applying a custom caching strategy.
/// Implement this trait and register via [`SessionState`]
/// to create a custom logical node for caching.
pub trait CacheProducer: Debug + Sync + Send {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we explain that when cache producer is provided user should provide its own logical query planner to handle that specific case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, updated the doc for trait.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for discussion, do we need to create new trait for this or simple closure |SessionState, LogicalPlan| -> LogicalPlan would do?

2010YOUY01 and others added 14 commits November 14, 2025 22:44
…pache#18686)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
When suppressing certain clippy lint violations, use `expect` instead of
`allow` to ensure there is an actual lint violation.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…che#18693)

Bumps
[taiki-e/install-action](https://github.com/taiki-e/install-action) from
2.62.50 to 2.62.51.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/taiki-e/install-action/releases">taiki-e/install-action's
releases</a>.</em></p>
<blockquote>
<h2>2.62.51</h2>
<ul>
<li>
<p>Update <code>typos@latest</code> to 1.39.2.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2025.11.4.</p>
</li>
<li>
<p>Update <code>uv@latest</code> to 0.9.9.</p>
</li>
<li>
<p>Update <code>protoc@latest</code> to 3.33.1.</p>
</li>
<li>
<p>Update <code>just@latest</code> to 1.43.1.</p>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md">taiki-e/install-action's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<p>All notable changes to this project will be documented in this
file.</p>
<p>This project adheres to <a href="https://semver.org">Semantic
Versioning</a>.</p>
<!-- raw HTML omitted -->
<h2>[Unreleased]</h2>
<h2>[2.62.51] - 2025-11-14</h2>
<ul>
<li>
<p>Update <code>typos@latest</code> to 1.39.2.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2025.11.4.</p>
</li>
<li>
<p>Update <code>uv@latest</code> to 0.9.9.</p>
</li>
<li>
<p>Update <code>protoc@latest</code> to 3.33.1.</p>
</li>
<li>
<p>Update <code>just@latest</code> to 1.43.1.</p>
</li>
</ul>
<h2>[2.62.50] - 2025-11-12</h2>
<ul>
<li>
<p>Update <code>wasmtime@latest</code> to 38.0.4.</p>
</li>
<li>
<p>Update <code>coreutils@latest</code> to 0.4.0.</p>
</li>
</ul>
<h2>[2.62.49] - 2025-11-09</h2>
<ul>
<li>
<p>Update <code>cargo-binstall@latest</code> to 1.15.11.</p>
</li>
<li>
<p>Update <code>cargo-auditable@latest</code> to 0.7.2.</p>
</li>
<li>
<p>Update <code>vacuum@latest</code> to 0.20.2.</p>
</li>
</ul>
<h2>[2.62.48] - 2025-11-08</h2>
<ul>
<li>
<p>Update <code>mise@latest</code> to 2025.11.3.</p>
</li>
<li>
<p>Update <code>cargo-audit@latest</code> to 0.22.0.</p>
</li>
<li>
<p>Update <code>vacuum@latest</code> to 0.20.1.</p>
</li>
<li>
<p>Update <code>uv@latest</code> to 0.9.8.</p>
</li>
<li>
<p>Update <code>cargo-udeps@latest</code> to 0.1.60.</p>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/taiki-e/install-action/commit/0be4756f42223b67aa4b7df5effad59010cbf4b9"><code>0be4756</code></a>
Release 2.62.51</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/e1c1ebb6a3f58bc641d557ff58d8de1faa37a03d"><code>e1c1ebb</code></a>
Update changelog</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/d78637d17c2c9ab112cbdf9c6277eafa393f3ec0"><code>d78637d</code></a>
Update <code>typos@latest</code> to 1.39.2</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/107556f337c67c5f10c7af70e9324b346e6ac455"><code>107556f</code></a>
Update <code>mise@latest</code> to 2025.11.4</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/2913759b209bf66eb7759a04851e0da7684bf2a7"><code>2913759</code></a>
Update <code>uv@latest</code> to 0.9.9</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/31f5779141e1b4c6c3fbe44984c10450860022d6"><code>31f5779</code></a>
Update <code>typos@latest</code> to 1.39.1</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/146aadaacec340c79628ca7c4822a43924e760ed"><code>146aada</code></a>
Update <code>protoc@latest</code> to 3.33.1</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/8402573e586684a3b1c90042b6bbaecca4183d20"><code>8402573</code></a>
Update <code>just@latest</code> to 1.43.1</li>
<li>See full diff in <a
href="https://github.com/taiki-e/install-action/compare/6cc14f7f2f4b3129aff07a8b071d2d4f2733465d...0be4756f42223b67aa4b7df5effad59010cbf4b9">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=taiki-e/install-action&package-manager=github_actions&previous-version=2.62.50&new-version=2.62.51)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.39.1 to
1.39.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/crate-ci/typos/releases">crate-ci/typos's
releases</a>.</em></p>
<blockquote>
<h2>v1.39.2</h2>
<h2>[1.39.2] - 2025-11-13</h2>
<h3>Fixes</h3>
<ul>
<li>Don't offer <code>entry</code> as a correction for
<code>entrys</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/crate-ci/typos/blob/master/CHANGELOG.md">crate-ci/typos's
changelog</a>.</em></p>
<blockquote>
<h1>Change Log</h1>
<p>All notable changes to this project will be documented in this
file.</p>
<p>The format is based on <a href="https://keepachangelog.com/">Keep a
Changelog</a>
and this project adheres to <a href="https://semver.org/">Semantic
Versioning</a>.</p>
<!-- raw HTML omitted -->
<h2>[Unreleased] - ReleaseDate</h2>
<h2>[1.39.2] - 2025-11-13</h2>
<h3>Fixes</h3>
<ul>
<li>Don't offer <code>entry</code> as a correction for
<code>entrys</code></li>
</ul>
<h2>[1.39.1] - 2025-11-12</h2>
<h3>Features</h3>
<ul>
<li>Make <code>--help</code> more vibrant</li>
</ul>
<h2>[1.39.0] - 2025-10-31</h2>
<h3>Features</h3>
<ul>
<li>Updated the dictionary with the <a
href="https://redirect.github.com/crate-ci/typos/issues/1383">October
2025</a> changes</li>
</ul>
<h3>Fixes</h3>
<ul>
<li>When a typo is pluralized, prefer pluralized corrections</li>
</ul>
<h2>[1.38.1] - 2025-10-07</h2>
<h3>Fixes</h3>
<ul>
<li>Ignore common golang identifiers</li>
</ul>
<h2>[1.38.0] - 2025-10-06</h2>
<h3>Features</h3>
<ul>
<li>Update type list</li>
</ul>
<h3>Fixes</h3>
<ul>
<li>Don't correct <code>typ</code></li>
<li>Consistently error on unused config fields</li>
</ul>
<h2>[1.37.3] - 2025-10-06</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/crate-ci/typos/commit/626c4bedb751ce0b7f03262ca97ddda9a076ae1c"><code>626c4be</code></a>
chore: Release</li>
<li><a
href="https://github.com/crate-ci/typos/commit/c6b458db05d00c3037bc9a1102b84febc9fff2f4"><code>c6b458d</code></a>
docs: Update changelog</li>
<li><a
href="https://github.com/crate-ci/typos/commit/eed04198a67af7f32b16141261aa8f911cba1f5f"><code>eed0419</code></a>
Merge pull request <a
href="https://redirect.github.com/crate-ci/typos/issues/1423">#1423</a>
from epage/entrys</li>
<li><a
href="https://github.com/crate-ci/typos/commit/40383f41a2f90743ef28b8c8b1c2d5a42b7651a0"><code>40383f4</code></a>
fix(dict): Don't offer 'entry' as a correction for 'entrys'</li>
<li>See full diff in <a
href="https://github.com/crate-ci/typos/compare/1af53e3774f068183ffd0c7193eb061a2b65a531...626c4bedb751ce0b7f03262ca97ddda9a076ae1c">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=crate-ci/typos&package-manager=github_actions&previous-version=1.39.1&new-version=1.39.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…upport (apache#18630)

0## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#18646

## Rationale for this change

Cleans up the plan by removing `CoalesceBatchesExec`. I do not expect
any performance improvement.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Making plans better to read and avoid useless `CoalesceBatchesExec`

Also adds back fetch support by adding it to FilterExec

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…ource` (apache#18697)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Part of apache#18613

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…on-datasource (apache#18682)

## Rationale for this change
This PR enforces the `clippy::needless_pass_by_value` lint rule to
prevent unnecessary data clones and improve performance in the
`datafusion-datasource` crate. This is part of the effort tracked in
apache#18503 to enforce this lint rule across all DataFusion crates.
Functions that take ownership of values (pass-by-value) when they only
need to read them force callers to `.clone()` data unnecessarily, which
degrades performance. By changing these functions to accept references
instead, we eliminate these unnecessary clones.

 ## What changes are included in this PR?
- Added lint rule enforcement to `datafusion/datasource/src/mod.rs`
- Fixed 11 violations of `clippy::needless_pass_by_value` across 5
files:
    - `file_scan_config.rs`: 2 fixes
    - `memory.rs`: 3 fixes
    - `source.rs`: 1 fix
    - `statistics.rs`: 4 fixes
    - `write/demux.rs`: 1 fix
- Updated callers in `datafusion-core` and `datafusion-catalog-listing`
to pass references


## Are these changes tested?
Yes, all changes are tested:
  - ✅ All 82 unit tests pass (`cargo test -p datafusion-datasource`)
  - ✅ All 7 doc tests pass
  - ✅ Strict clippy checks pass with `-D warnings`
  - ✅ CI lint script passes (`./dev/rust_lint.sh`)
- ✅ Dependent crates (`datafusion-catalog-listing`, `datafusion-core`)
pass all tests and clippy checks

Tests are covered by existing tests as this is a refactoring that
changes internal function signatures without changing behavior.

## Are there any user-facing changes?
No user-facing changes. All changes are internal to the
`datafusion-datasource` crate. The public API remains unchanged - only
internal function signatures were modified to accept references instead
of owned values.

Then at the bottom add:
Fixes apache#18611
Part of apache#18503
…18677)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#18659.

## Rationale for this change

Fix an edge case in `corr` and `NaN`

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- part of
apache#17558 (comment)
- 

## Rationale for this change
Forward port changes  from branch-51 to main:
- apache#18705

## What changes are included in this PR?

Bring changes from this PR to main
- apache#18705


## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

Part of apache#12725

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Prefer to avoid user_defined for consistency in function definitions.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Refactor signatures of crc32 & sha1 to use coercion API instead of being
user defined.

Also add support for FixedSizeBinary inputs to these functions.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Existing tests

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
## Which issue does this PR close?

Partially implements apache#15914 

## Rationale for this change

Spark has support for cosecant
https://spark.apache.org/docs/latest/api/sql/index.html#csc .

This function is not there othere in DB's like postgres, mysql and
sqlite3. Hence I have added this function into datafusion-spark


## What changes are included in this PR?

Adds support for cosecant in datafusion-spark 


## Are these changes tested?

Unit tests with inputs/outputs obtained from spark

## Are there any user-facing changes?

Yes
…8709)

Closes apache#18692 (hopefully)

Trying to get CI to pass consistently, try various techniques.
…apache#18715)

## Which issue does this PR close?
- Part of parent issue apache#18503

## What changes are included in this PR?
enforce clippy lint `needless_pass_by_value` to `datafusion-proto`

## Are these changes tested?
yes

## Are there any user-facing changes?

no
…apache#18714)

## Which issue does this PR close?
- Part of parent issue apache#18503 

## What changes are included in this PR?
enforce clippy lint `needless_pass_by_value` to `datafusion-spark`

## Are these changes tested?
yes

## Are there any user-facing changes?
no
@github-actions github-actions bot added development-process Related to development process of DataFusion logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) proto Related to proto crate functions Changes to functions implementation datasource Changes to the datasource crate labels Nov 15, 2025
@github-actions github-actions bot added physical-plan Changes to the physical-plan crate spark labels Nov 15, 2025
@github-actions github-actions bot removed development-process Related to development process of DataFusion logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) proto Related to proto crate functions Changes to functions implementation datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate spark labels Nov 15, 2025
Copy link
Contributor

@milenkovicm milenkovicm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @jizezhang i have mostly cosmetic comments, please tell me what you think

}

/// Register a [`CacheProducer`] to provide custom caching strategy
pub fn with_cache_producer(self, cache_producer: Arc<dyn CacheProducer>) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to call it cache_factory rather than cache_producer to align with other similar methods like function_factory?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to expose this on session context, its not going to be used that often, maybe just having it on session state would be enough ?

&self,
plan: LogicalPlan,
session_state: &SessionState,
) -> Result<Arc<dyn UserDefinedLogicalNode>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it would make sense to return LogicalNode here, so we do not limit just on UserDefinedLogicalNode?

/// It will be invoked on `CREATE FUNCTION` statements.
/// thus, changing dialect o PostgreSql is required
function_factory: Option<Arc<dyn FunctionFactory>>,
cache_producer: Option<Arc<dyn CacheProducer>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same like previous, cache_factory maybe?

/// Interface for applying a custom caching strategy.
/// Implement this trait and register via [`SessionState`]
/// to create a custom logical node for caching.
pub trait CacheProducer: Debug + Sync + Send {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for discussion, do we need to create new trait for this or simple closure |SessionState, LogicalPlan| -> LogicalPlan would do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DataFrame.cache() does not work in distributed environments