Skip to content

Conversation

@manasa-manoj-nbr
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

The DataFusion project has an extensive benchmarking infrastructure with many different benchmark types (TPCH, ClickBench, IMDB, H2O.ai, micro-benchmarks, etc.) scattered across README files and code comments. Contributors need a centralized, easily discoverable resource to understand what benchmarks are available, how to use them for validating performance changes, and where to add new benchmark code. This addresses the maintainer's request in issue #17811 to create a dedicated documentation page describing all the benchmark code we have.

What changes are included in this PR?

  • Created docs/source/contributor-guide/benchmarking.md: A comprehensive documentation page covering all DataFusion benchmarks, organized by categories (Performance Benchmarks, Specialized Benchmarks, Micro-benchmarks)
  • Updated docs/source/index.rst: Added the new benchmarking page to the Contributor Guide navigation structure
  • Updated docs/source/contributor-guide/testing.md: Added cross-reference to the new dedicated benchmarking page in the existing benchmarks section

The new documentation consolidates information about:

  • All major benchmark suites (TPCH, ClickBench, IMDB, H2O.ai, Sort, External Aggregation, etc.)
  • Usage instructions for bench.sh script and dfbench binary
  • Configuration options and environment variables
  • Guidelines for adding new benchmarks
  • Troubleshooting common issues

Are these changes tested?

  • Documentation builds successfully without warnings or errors
  • Navigation structure tested - new page appears correctly in Contributor Guide menu
  • Internal links verified - all cross-references and links work properly
  • Content accuracy verified - all benchmark information sourced from official /benchmarks/README.md and existing documentation

Are there any user-facing changes?

No Breaking Changes:
- No changes to APIs, CLIs, or runtime behavior
- No changes to existing benchmark functionality
- Purely additive documentation enhancement

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 29, 2025
manasa-manoj-nbr and others added 28 commits October 30, 2025 01:55
## Which issue does this PR close?

- Closes #17862

## Rationale for this change

Most logical plan expressions now propagate metadata; however,
parameters with extension types or other field metadata cannot
participate in placeholder/parameter binding.

## What changes are included in this PR?

The DataType in the Placeholder struct was replaced with a FieldRef
along with anything that stored the "DataType" of a parameter.

Strictly speaking one could bind parameters with an extension type by
copy/pasting the placeholder replacer, which I figured out towards the
end of this change. I still think this change makes sense and opens up
the door for things like handling UUID in SQL with full parameter
binding support.

## Are these changes tested?

Yes

## Are there any user-facing changes?

Yes, one new function was added to extract the placeholder fields from a
plan.

This is a breaking change for code that specifically interacts with the
pub fields of the modified structs (ParamValues, Placeholder, and
Prepare are the main ones).

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
This logic would previously swallow all errors, including things like
DNS resolution failures. If the path was indeed a path and not a prefix,
that would result in the path being dropped, since the prefix wouldn't
include any further files.

Fixes #18242.
Updated links in the README to point to the correct URLs.
…rquet source (#18196)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

Part of #18116

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
The below configuration can be used to let `EXPLAIN ANALYZE` only show
important high-level insights.
```
set datafusion.explain.analyze_level = summary;
```

This PR sets `summary` level metrics for the parquet data source:
### `summary` level metrics for `DataSourceExec` with `Parquet` source
- File level pruning metrics
- Row-group level pruning metrics
- Bytes scanned
- metadata load time
In
https://github.com/apache/datafusion/blob/155b56e521d75186776a65f1634ee03058899a79/datafusion/datasource-parquet/src/metrics.rs#L29

The remaining metrics are kept in the `dev` level. I'm not sure if the
page level pruning metrics should also be included to the `summary`
level, I'm open to suggestions for this, or any other metrics that
should also be included.

While implementing this, I came up with a few ideas to further improve
metrics tracking in the Parquet scanner. I’ve documented them in
#18195

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->
Set the above metrics to `summary` analyze level

## Are these changes tested?
UTs
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?
No
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
This changes the `decode` built-in function so that SQL like the
following works:

```sql
decode('cXdlcXc=', 'base64');
```

Padding isn't required to decode correctly, but it's surprising to users
if we just reject it outright.
## Which issue does this PR close?

- Follow on to #17986 from
@paleolimbot


## Rationale for this change

As we thread Field through more of the DataFusion APs, making it easy to
convert back and forth with Field will be increasingly important. We
added some helper methods in
#17986, but I think they could
be better documented (I wrote them so this is not a dig at @paleolimbot
!)

Lets add some more documentation and examples so it is clearer what this
code is doing.

## What changes are included in this PR?

1. Add more Documentation and examples so it is clearer what this code
is doing.


## Are these changes tested?

By CI

## Are there any user-facing changes?
More docs, no functional changes
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

Part of #18095

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

In the contributor guide documentation, better explain why large
AI-generated PRs without understanding should be rejected. See more in
the original issue discussion.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
Co-authored-by: Alex Huang <huangweijun1001@gmail.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
## Which issue does this PR close?

- Follow on to #17986 from
@paleolimbot

## Rationale for this change

As we thread Field through more of the DataFusion APs, making it easy to
convert back and forth
will be increasingly important. We added `ScalarAndMetadata` and I think
it is a good idea to add
some helper methods to make it easy to create `ScalarAndMetadata`.

## What changes are included in this PR?

Add some From impls that make conversions easier 

## Are these changes tested?

By CI
## Are there any user-facing changes?

SOme new APIs
…benchmarks (#18209)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change
Allow users to understand some decisions for when to change certain
joins configurations.
<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?
Add readme to joins physical plan
<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…#18238)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #18234.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->


https://discord.com/channels/885562378132000778/1430237388474552380/1430618776751313018

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

```
cargo run -q --bin datafusion-cli
DataFusion CLI v50.3.0
> DESCRIBE SELECT 1;
+-------------+-----------+-------------+
| column_name | data_type | is_nullable |
+-------------+-----------+-------------+
| Int64(1)    | Int64     | NO          |
+-------------+-----------+-------------+
1 row(s) fetched.
Elapsed 0.022 seconds.
```

## Are these changes tested?

No, looking for feedback on approach first... happy to add a test.

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

Yes, it changes to behavior of `DESCRIBE SELECT` from explaining the
physical plan (EXPLAIN) to describing the schema of the query (like
DESCRIBE table).

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…8229)

## Which issue does this PR close?

- Part of  #18160

## Rationale for this change

As we spend more effort optimizing the number of IO requests made during
various scenarios, we need to ensure we have test coverage to:
1. Verify that the optimizations are working as intended
2. Prevent regressions in the future as code changes are made

## What changes are included in this PR?

Add a new integration test that verifies what IO operations happen when
creating and querying listing tables

## Are these changes tested?

It is all tests 
## Are there any user-facing changes?
No, only tests
Removes a downcast match in favor of use of the trait. This mirrors the
changes to DataSourceExec to use partition_statistics instead of
statistics from #15852
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #16244

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Support `output_bytes` in `BaselineMetrics` (a common metrics set for
almost all operators)

```
DataFusion CLI v50.3.0
> explain analyze select * from generate_series(1, 1000000) as t1(v1) order by v1 desc;
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                                                                                                            |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | SortExec: expr=[v1@0 DESC], preserve_partitioning=[false], metrics=[output_rows=1000000, elapsed_compute=96.421534ms, output_bytes=7.6 MB, spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, batches_split=0] |
|                   |   ProjectionExec: expr=[value@0 as v1], metrics=[output_rows=1000000, elapsed_compute=34.125µs, output_bytes=7.7 MB]                                                                                            |
|                   |     LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=1, end=1000000, batch_size=8192], metrics=[output_rows=1000000, elapsed_compute=2.262626ms, output_bytes=7.7 MB]                     |
|                   |                                                                                                                                                                                                                 |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.080 seconds.
```

Note it might overestimate memory due to a well-known issue. See the PR
snippet for details
```rs

    /// Memory usage of all output batches.
    ///
    /// Note: This value may be overestimated. If multiple output `RecordBatch`
    /// instances share underlying memory buffers, their sizes will be counted
    /// multiple times.
    /// Issue: <#16841>
    output_bytes: Count,
```

I think this metric provides valuable insight, so it's better for it to
overestimate than not exist at all.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->
1. Add `output_bytes` to `BaselineMetrics`, and it's set to `summary`
analyze level. (see config `datafusion.explain.analyze_level` for
details)
2. This metrics will be automatically tracked through `record_poll()`
API, which is a common interface most operators uses when a new output
batch is generated.

## Are these changes tested?
UT
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
3. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…/floor (#18265)

## Which issue does this PR close?

<!--
-->

- Closes #18175 

## Rationale for this change
<!--
-->
The Ceil/Floor calls via SQL was being parsed such that they were taking
2 arguments instead of 1, the second of which is not currently needed
and the second argument was being ignored and passed silently.



## What changes are included in this PR?

<!--
-->

The second parameter(`field`) which was being passed if is of the
`CeilFloorKind` enum from `sqlparser` crate . Neither of the enum's two
variants (`DateTimeField` and `Scale`)are being implemented hence they
have been ignored with apporpriate error type and only succeeds if the
`DateTimeField` has `NoDateTime` variant i,e it is treated as empty.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
All Unit Tests pass successfully.

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
#18273)

## Which issue does this PR close?

Closes #18058

## Rationale for this change

When adding the bitmap_count function to Comet, we get the following
error - org.apache.comet.CometNativeException: Error from DataFusion:
bitmap_count expects Binary/BinaryView/FixedSizeBinary/LargeBinary as
argument, got Dictionary(Int32, Binary).

## Are these changes tested?

Added new UT

---------

Co-authored-by: Kazantsev Maksim <mn.kazantsev@gmail.com>
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #18288

## Rationale for this change

`cargo audit` says that the current version of `half` we have in our
Cargo.lock file was yanked




```
Crate:     half
Version:   2.7.0
Warning:   yanked
Dependency tree:
half 2.7.0
```

And indeed it is:
https://crates.io/crates/half/versions
<img width="1193" height="830" alt="Screenshot 2025-10-26 at 7 20 54 AM"
src="https://github.com/user-attachments/assets/ad6944c6-912c-4c56-9d1d-efe760ae85ee"
/>

So let's update to a non yanked version

## What changes are included in this PR?

run `cargo update -p half` and check the result in

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
)

Bumps
[taiki-e/install-action](https://github.com/taiki-e/install-action) from
2.62.36 to 2.62.38.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/taiki-e/install-action/releases">taiki-e/install-action's
releases</a>.</em></p>
<blockquote>
<h2>2.62.38</h2>
<ul>
<li>
<p>Update <code>coreutils@latest</code> to 0.3.0.</p>
</li>
<li>
<p>Update <code>wasmtime@latest</code> to 38.0.3.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2025.10.17.</p>
</li>
<li>
<p>Update <code>cargo-tarpaulin@latest</code> to 0.34.1.</p>
</li>
</ul>
<h2>2.62.37</h2>
<ul>
<li>
<p>Update <code>cargo-binstall@latest</code> to 1.15.8.</p>
</li>
<li>
<p>Update <code>zizmor@latest</code> to 1.16.0.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2025.10.16.</p>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md">taiki-e/install-action's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<p>All notable changes to this project will be documented in this
file.</p>
<p>This project adheres to <a href="https://semver.org">Semantic
Versioning</a>.</p>
<!-- raw HTML omitted -->
<h2>[Unreleased]</h2>
<ul>
<li>Update <code>mise@latest</code> to 2025.10.18.</li>
</ul>
<h2>[2.62.38] - 2025-10-25</h2>
<ul>
<li>
<p>Update <code>coreutils@latest</code> to 0.3.0.</p>
</li>
<li>
<p>Update <code>wasmtime@latest</code> to 38.0.3.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2025.10.17.</p>
</li>
<li>
<p>Update <code>cargo-tarpaulin@latest</code> to 0.34.1.</p>
</li>
</ul>
<h2>[2.62.37] - 2025-10-24</h2>
<ul>
<li>
<p>Update <code>cargo-binstall@latest</code> to 1.15.8.</p>
</li>
<li>
<p>Update <code>zizmor@latest</code> to 1.16.0.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2025.10.16.</p>
</li>
</ul>
<h2>[2.62.36] - 2025-10-23</h2>
<ul>
<li>
<p>Update <code>syft@latest</code> to 1.36.0.</p>
</li>
<li>
<p>Update <code>vacuum@latest</code> to 0.19.0.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2025.10.15.</p>
</li>
</ul>
<h2>[2.62.35] - 2025-10-22</h2>
<ul>
<li>
<p>Update <code>wasmtime@latest</code> to 38.0.2.</p>
</li>
<li>
<p>Update <code>cargo-nextest@latest</code> to 0.9.108.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2025.10.14.</p>
</li>
<li>
<p>Update <code>vacuum@latest</code> to 0.18.9.</p>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/taiki-e/install-action/commit/c5b1b6f479c32f356cc6f4ba672a47f63853b13b"><code>c5b1b6f</code></a>
Release 2.62.38</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/7cd74f6aac6a2a6c13632c29a30ffc0ef8053cf2"><code>7cd74f6</code></a>
Update <code>coreutils@latest</code> to 0.3.0</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/def9901333773abdceeb414c2c2a68cc4276eea9"><code>def9901</code></a>
Update <code>wasmtime@latest</code> to 38.0.3</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/a9d3853729137d6a76fdb344e3fdba064bb51dd5"><code>a9d3853</code></a>
Update coreutils manifest</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/958d48b0c9eb6cf8c0edca899e787eb73a91794c"><code>958d48b</code></a>
Update <code>mise@latest</code> to 2025.10.17</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/fb485991fd79e393a6a4e3715369bdd7a96fc12d"><code>fb48599</code></a>
Update <code>cargo-tarpaulin@latest</code> to 0.34.1</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/1c7b1d35fcc8f6525be0cbdacbf5977079a3f94c"><code>1c7b1d3</code></a>
Release 2.62.37</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/18cba62798fa05dd5849e62a3759a8ef249feefc"><code>18cba62</code></a>
Update <code>cargo-binstall@latest</code> to 1.15.8</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/f3c0c6962aed40004323e265015332d9d9cf90f9"><code>f3c0c69</code></a>
Update <code>zizmor@latest</code> to 1.16.0</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/99fc3e5b1e80c12d05e5bff5af81a035ab4e98b5"><code>99fc3e5</code></a>
Update <code>mise@latest</code> to 2025.10.16</li>
<li>See full diff in <a
href="https://github.com/taiki-e/install-action/compare/ebb229c6baa68383264f2822689b07b4916d9177...c5b1b6f479c32f356cc6f4ba672a47f63853b13b">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=taiki-e/install-action&package-manager=github_actions&previous-version=2.62.36&new-version=2.62.38)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #11336 

Since this is my first contribution, I suppose to mention @alamb ,
author of the Issue #11336

Could you please trigger the CI? Thanks!

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

The Arrow introduction guide (#11336) needed improvements to make it
more accessible for newcomers while providing better navigation to
advanced topics.


## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Issue #11336 requested a gentle introduction to Apache Arrow and
RecordBatches to help DataFusion users understand the foundational
concepts. This PR enhances the existing Arrow introduction guide with
clearer explanations, practical examples, visual aids, and comprehensive
navigation links to make it more accessible for newcomers while
providing pathways to advanced topics.

Was unsure if this fits to `docs/source/user-guide/dataframe.md' 

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

applyed prettier, like described. 

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
Yes - improved documentation for the Arrow introduction guide at
`docs/source/user-guide/arrow-introduction.md`

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Martin <your.email@example.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Bumps [regex](https://github.com/rust-lang/regex) from 1.11.3 to 1.12.2.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/rust-lang/regex/blob/master/CHANGELOG.md">regex's
changelog</a>.</em></p>
<blockquote>
<h1>1.12.2 (2025-10-13)</h1>
<p>This release fixes a <code>cargo doc</code> breakage on nightly when
<code>--cfg docsrs</code> is
enabled. This caused documentation to fail to build on docs.rs.</p>
<p>Bug fixes:</p>
<ul>
<li>[BUG <a
href="https://redirect.github.com/rust-lang/regex/issues/1305">#1305</a>](<a
href="https://redirect.github.com/rust-lang/regex/issues/1305">rust-lang/regex#1305</a>):
Switches the <code>doc_auto_cfg</code> feature to <code>doc_cfg</code>
on nightly for docs.rs builds.</li>
</ul>
<h1>1.12.1 (2025-10-10)</h1>
<p>This release makes a bug fix in the new
<code>regex::Captures::get_match</code> API
introduced in <code>1.12.0</code>. There was an oversight with the
lifetime parameter
for the <code>Match</code> returned. This is technically a breaking
change, but given
that it was caught almost immediately and I've yanked the
<code>1.12.0</code> release,
I think this is fine.</p>
<h1>1.12.0 (2025-10-10)</h1>
<p>This release contains a smattering of bug fixes, a fix for excessive
memory
consumption in some cases and a new
<code>regex::Captures::get_match</code> API.</p>
<p>Improvements:</p>
<ul>
<li>[FEATURE <a
href="https://redirect.github.com/rust-lang/regex/issues/1146">#1146</a>](<a
href="https://redirect.github.com/rust-lang/regex/issues/1146">rust-lang/regex#1146</a>):
Add <code>Capture::get_match</code> for returning the overall match
without <code>unwrap()</code>.</li>
</ul>
<p>Bug fixes:</p>
<ul>
<li>[BUG <a
href="https://redirect.github.com/rust-lang/regex/issues/1083">#1083</a>](<a
href="https://redirect.github.com/rust-lang/regex/issues/1083">rust-lang/regex#1083</a>):
Fixes a panic in the lazy DFA (can only occur for especially large
regexes).</li>
<li>[BUG <a
href="https://redirect.github.com/rust-lang/regex/issues/1116">#1116</a>](<a
href="https://redirect.github.com/rust-lang/regex/issues/1116">rust-lang/regex#1116</a>):
Fixes a memory usage regression for large regexes (introduced in
<code>regex 1.9</code>).</li>
<li>[BUG <a
href="https://redirect.github.com/rust-lang/regex/issues/1195">#1195</a>](<a
href="https://redirect.github.com/rust-lang/regex/issues/1195">rust-lang/regex#1195</a>):
Fix universal start states in sparse DFA.</li>
<li>[BUG <a
href="https://redirect.github.com/rust-lang/regex/issues/1295">#1295</a>](<a
href="https://redirect.github.com/rust-lang/regex/pull/1295">rust-lang/regex#1295</a>):
Fixes a panic when deserializing a corrupted dense DFA.</li>
<li><a
href="https://github.com/rust-lang/regex/commit/8f5d9479d0f1da5726488a530d7fd66a73d05b80">BUG
8f5d9479</a>:
Make <code>regex_automata::meta::Regex::find</code> consistently return
<code>None</code> when
<code>WhichCaptures::None</code> is used.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/rust-lang/regex/commit/5ea3eb1e95f0338e283f5f0b4681f0891a1cd836"><code>5ea3eb1</code></a>
1.12.2</li>
<li><a
href="https://github.com/rust-lang/regex/commit/ab0b07171b82d1d4fdc8359505d12b2e818514d4"><code>ab0b071</code></a>
regex-automata-0.4.13</li>
<li><a
href="https://github.com/rust-lang/regex/commit/691d51457db276bbdf9ca3de2cafe285c662c59f"><code>691d514</code></a>
regex-syntax-0.8.8</li>
<li><a
href="https://github.com/rust-lang/regex/commit/1dd90777791dbc6bbf389157d05ac8176c6ad051"><code>1dd9077</code></a>
docs: swap <code>doc_auto_cfg</code> with <code>doc_cfg</code></li>
<li><a
href="https://github.com/rust-lang/regex/commit/0089034cb37b0bf3785f2e0211f7eca74033f4d1"><code>0089034</code></a>
regex-cli-0.2.3</li>
<li><a
href="https://github.com/rust-lang/regex/commit/140f8949da3f575490bac80ff23dfc29458b82c7"><code>140f894</code></a>
regex-lite-0.1.8</li>
<li><a
href="https://github.com/rust-lang/regex/commit/27d6d65263cb80266a62e3189408a44f201a0975"><code>27d6d65</code></a>
1.12.1</li>
<li><a
href="https://github.com/rust-lang/regex/commit/85398ad5002048bbeaa90f1fe37fbb31df2bc0d6"><code>85398ad</code></a>
changelog: 1.12.1</li>
<li><a
href="https://github.com/rust-lang/regex/commit/764efbd305d3a7b817ec8892ff0a656ec657d660"><code>764efbd</code></a>
api: tweak the lifetime of <code>Captures::get_match</code></li>
<li><a
href="https://github.com/rust-lang/regex/commit/ee6aa55e01786e4d2c11eb1be805835bbb3bfa99"><code>ee6aa55</code></a>
rure-0.2.4</li>
<li>Additional commits viewable in <a
href="https://github.com/rust-lang/regex/compare/1.11.3...1.12.2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=regex&package-manager=cargo&previous-version=1.11.3&new-version=1.12.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [clap](https://github.com/clap-rs/clap) from 4.5.48 to 4.5.50.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/releases">clap's
releases</a>.</em></p>
<blockquote>
<h2>v4.5.50</h2>
<h2>[4.5.50] - 2025-10-20</h2>
<h3>Features</h3>
<ul>
<li>Accept <code>Cow</code> where <code>String</code> and
<code>&amp;str</code> are accepted</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/blob/master/CHANGELOG.md">clap's
changelog</a>.</em></p>
<blockquote>
<h2>[4.5.50] - 2025-10-20</h2>
<h3>Features</h3>
<ul>
<li>Accept <code>Cow</code> where <code>String</code> and
<code>&amp;str</code> are accepted</li>
</ul>
<h2>[4.5.49] - 2025-10-13</h2>
<h3>Fixes</h3>
<ul>
<li><em>(help)</em> Correctly wrap when ANSI escape codes are
present</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/clap-rs/clap/commit/d8acd4729878ca72d305e6cf7adf7acc0da36738"><code>d8acd47</code></a>
chore: Release</li>
<li><a
href="https://github.com/clap-rs/clap/commit/7c2b8d9ad4d22650f969bd455d80b4181a7e25ff"><code>7c2b8d9</code></a>
docs: Update changelog</li>
<li><a
href="https://github.com/clap-rs/clap/commit/e69a2ea55bc9076d95caf60d79e481581f688724"><code>e69a2ea</code></a>
Merge pull request <a
href="https://redirect.github.com/clap-rs/clap/issues/5987">#5987</a>
from mernen/fix-bash-comp-words-loop</li>
<li><a
href="https://github.com/clap-rs/clap/commit/e03cc2e798183e9528f53d42d8b2699f034fc667"><code>e03cc2e</code></a>
Merge pull request <a
href="https://redirect.github.com/clap-rs/clap/issues/5988">#5988</a>
from cordx56/fix-builder-custom-version-docs</li>
<li><a
href="https://github.com/clap-rs/clap/commit/5ab2579844a47a26b4567f77a7b9d198be006f0a"><code>5ab2579</code></a>
fix: Minor fix for builder docs about version</li>
<li><a
href="https://github.com/clap-rs/clap/commit/2f66432721bd24602455dc3e31765195c6107c34"><code>2f66432</code></a>
fix(complete): Only parse arguments before current</li>
<li><a
href="https://github.com/clap-rs/clap/commit/4d9d2100f75693645ea68180ed4b6b3ecacb9923"><code>4d9d210</code></a>
test(complete): Illustrate current behavior in Bash</li>
<li><a
href="https://github.com/clap-rs/clap/commit/6abe2f8c61e31d8d43fee42c18414926c60893be"><code>6abe2f8</code></a>
chore: Release</li>
<li><a
href="https://github.com/clap-rs/clap/commit/d5c74542ce628b57424caec88efee1a231c436a0"><code>d5c7454</code></a>
docs: Update changelog</li>
<li><a
href="https://github.com/clap-rs/clap/commit/5b2e960267b94d4811c9c3b99c62899a87505413"><code>5b2e960</code></a>
Merge pull request <a
href="https://redirect.github.com/clap-rs/clap/issues/5985">#5985</a>
from mernen/bash-cur</li>
<li>Additional commits viewable in <a
href="https://github.com/clap-rs/clap/compare/clap_complete-v4.5.48...clap_complete-v4.5.50">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=clap&package-manager=cargo&previous-version=4.5.48&new-version=4.5.50)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Which issue does this PR close?

- Related to apache/arrow-rs#7835
- Closes #3666

Note while this PR looks massive, a large portion is display updates due
to better display of Fields and DataTypes

## Rationale for this change

Upgrade to the latest arrow

Also, there are several new features in arrow-57 that I want to be able
to test including Variant, arrow-avro, and a new parquet metadata
reader.

## What changes are included in this PR?

1. Update arrow/parquet
2. Update prost
3. Update substrait
4. Update pbjson
5. Make API changes to avoid deprecated APIs

## Are these changes tested?

By CI

## Are there any user-facing changes?
New arrow
Bumps [syn](https://github.com/dtolnay/syn) from 2.0.106 to 2.0.108.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/dtolnay/syn/releases">syn's
releases</a>.</em></p>
<blockquote>
<h2>2.0.108</h2>
<ul>
<li>Parse unrecognized or invalid literals as Lit::Verbatim (<a
href="https://redirect.github.com/dtolnay/syn/issues/1925">#1925</a>)</li>
</ul>
<h2>2.0.107</h2>
<ul>
<li>Improve panic message when constructing a LitInt, LitFloat, or Lit
from invalid syntax (<a
href="https://redirect.github.com/dtolnay/syn/issues/1917">#1917</a>)</li>
<li>Improve panic message on Punctuated index out of bounds (<a
href="https://redirect.github.com/dtolnay/syn/issues/1922">#1922</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/dtolnay/syn/commit/7a7e331255822d49bea01e29c326ee7a5cd5415c"><code>7a7e331</code></a>
Release 2.0.108</li>
<li><a
href="https://github.com/dtolnay/syn/commit/30463afa201abc30e086bd1fb1deb714eb8910f4"><code>30463af</code></a>
Merge pull request <a
href="https://redirect.github.com/dtolnay/syn/issues/1926">#1926</a>
from dtolnay/litfuzz</li>
<li><a
href="https://github.com/dtolnay/syn/commit/1cc9167f60d209865e91bf73a949d25914e6bf18"><code>1cc9167</code></a>
Add fuzzer for literal parsing</li>
<li><a
href="https://github.com/dtolnay/syn/commit/c49e1d3a65ab423beee54ed730ea3f849ec49e0b"><code>c49e1d3</code></a>
Merge pull request <a
href="https://redirect.github.com/dtolnay/syn/issues/1925">#1925</a>
from dtolnay/litparse</li>
<li><a
href="https://github.com/dtolnay/syn/commit/d047536103b7edfb0408dab8ec65cde19e73a88f"><code>d047536</code></a>
Report unexpected verbatim literals in test</li>
<li><a
href="https://github.com/dtolnay/syn/commit/ce9776747974555e30cd890b9e1d3030e02efc13"><code>ce97767</code></a>
Parse unrecognized or invalid literals as Lit::Verbatim</li>
<li><a
href="https://github.com/dtolnay/syn/commit/e4a8957feb1b86e6da4309c9886ca15ddfd7b7ad"><code>e4a8957</code></a>
Release 2.0.107</li>
<li><a
href="https://github.com/dtolnay/syn/commit/1792e83acfcc4810ccca70c22952986a6ea09d7e"><code>1792e83</code></a>
Merge pull request <a
href="https://redirect.github.com/dtolnay/syn/issues/1922">#1922</a>
from dtolnay/outofbounds</li>
<li><a
href="https://github.com/dtolnay/syn/commit/532e4af53355f8c4585251e1507336bed8d39f14"><code>532e4af</code></a>
Improve panic message on Punctuated index out of bounds</li>
<li><a
href="https://github.com/dtolnay/syn/commit/909c2221dd582e18f748988384e8ec4edd7544cf"><code>909c222</code></a>
Add test of Punctuated indexing</li>
<li>Additional commits viewable in <a
href="https://github.com/dtolnay/syn/compare/2.0.106...2.0.108">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=syn&package-manager=cargo&previous-version=2.0.106&new-version=2.0.108)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Which issue does this PR close?

- Related to #14993

## Rationale for this change

To enable expression pushdown to file sources, we need to plumb
expressions through the `FileScanConfig` layer. Currently,
`FileScanConfig` only tracks column indices for projection, which limits
us to simple and naive column selection.

This PR begins expression pushdown implementation by having
`FileScanConfig` own a list of `ProjectionExpr`s, instead of column
indices. This allows file sources to eventually receive and optimize
based on the actual expressions being projected.


## Notes about this PR
- The first commit is based off of
#18231
- To avoid a super large diff and a harder review, I've decided to break
(#14993) into 2 tasks:
- Have the `DataSource` (`FileScanConfig`) actually hold projection
expressions (this PR)
- Flow the projection expressions from `DataSourceExec` all the way to
the `FileSource`

---------

Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Doing some prework for #15881

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

`Range` and `GenSeries` are essentially the same except for whether they
include upper bounds or not; unify their function code to reduce
duplication, making future changes easier.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Remove `GenSeries` struct, folding it into `Range`. Do some more minor
refactoring to their code.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Existing tests (updated some error messages).

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

Not really (updated some error messages).

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…18303)

## Which issue does this PR close?

- Closes #18302

## Rationale for this change

As described in the issue, this is a low-effort QoL fix for now.

## What changes are included in this PR?

Uses the existing function for naming fields to replace the hardcoded
`"is_set"` with a field-dependent name. Example output:

```
Field {
    name: "first_value(records_partitioned.trace_id)[first_value]",
    data_type: Utf8View,
    nullable: true,
    dict_id: 0,
    dict_is_ordered: false,
    metadata: {},
},
Field {
    name: "first_value(records_partitioned.trace_id)[first_value_is_set]",
    data_type: Boolean,
    nullable: true,
    dict_id: 0,
    dict_is_ordered: false,
    metadata: {},
},
Field {
    name: "first_value(records_partitioned.value)[first_value]",
    data_type: Int32,
    nullable: true,
    dict_id: 0,
    dict_is_ordered: false,
    metadata: {},
},
Field {
    name: "first_value(records_partitioned.value)[first_value_is_set]",
    data_type: Boolean,
    nullable: true,
    dict_id: 0,
    dict_is_ordered: false,
    metadata: {},
},
```

## Are these changes tested?

No tests have been added, hopefully it should be covered by existing
changes.

## Are there any user-facing changes?

There should not be any, I assume `is_set` is never user visible.
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

N/A

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Cover missing f16 type for `abs`

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Support `abs` on f16; also do some cleanup.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Added SLT.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pepijnve and others added 12 commits October 30, 2025 01:55
## Which issue does this PR close?

- Improvement in the context of
#18075
- Continues on #17898

## Rationale for this change

Case evaluation currently uses `PhysicalExpr::evaluate_selection` for
each branch of the case expression. This implementation is fine, but
because `evaluate_selection` is not specific to the `case` logic we're
missing some optimisation opportunities. The main consequence is that
too much work is being done filtering record batches and scattering
results. This PR introduces specialised filtering logic and result
interleaving for case.

A more detailed description and diagrams are available at
#18075 (comment)

## What changes are included in this PR?

Rewrite the `case_when_no_expr` and `case_when_with_expr` evaluation
loops to avoid as much unnecessary work as possible. In particular the
remaining rows to be evaluated are retained across loop iterations. This
allows the record batch that needs to be filtered to shrink as the loop
is being evaluated which reduces the number of rows that needs to be
refiltered. If a when predicate does not match any rows at all,
filtering is avoided entirely.

The final result is also not merged every loop iteration. Instead an
index vector is constructed which is used to compose the final result
once using a custom 'multi zip'/'interleave' like operation.

## Are these changes tested?

Covered by existing unit tests and SLTs

## Are there any user-facing changes?

No
## Which issue does this PR close?
Part of #16915

## Rationale for this change
Format code examples in documentation comments to improve readability
and maintain consistent code style across the codebase. This is part of
a multi-PR effort to format all doc comment examples and eventually
enable CI checks to enforce this formatting.

## What changes are included in this PR?
Run `cargo fmt -p datafusion-execution -- --config
format_code_in_doc_comments=true`

## Are these changes tested?
No testing needed - this is purely a formatting change with no
functional modifications.

## Are there any user-facing changes?
No - this only affects documentation formatting.
## Which issue does this PR close?
Part of #16915

## Rationale for this change
Format code examples in documentation comments to improve readability
and maintain consistent code style across the codebase. This is part of
a multi-PR effort to format all doc comment examples and eventually
enable CI checks to enforce this formatting.

## What changes are included in this PR?
Run `cargo fmt -p datafusion-common -- --config
format_code_in_doc_comments=true`

## Are these changes tested?
No testing needed - this is purely a formatting change with no
functional modifications.

## Are there any user-facing changes?
No - this only affects documentation formatting.
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #18323 .

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Adds more detailed metrics, so it is easier to identify which part of
the aggregate streams are actually slow.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Added a metrics struct, and used it in the functions common to the
aggregate streams.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes, added some tests to verify the metrics are actually updated and can
be retrieved.

I've also ran the groupby benchmarks to ensure we don't create timers in
a way that could impact performance, and it seems ok, all the changes
are within what I'd expect as std variation on a local machine.
```
Comparing main and agg-metrics
--------------------
Benchmark h2o.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃ agg-metrics ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 1252.42 ms │  1196.62 ms │     no change │
│ QQuery 2     │ 3976.62 ms │  3392.89 ms │ +1.17x faster │
│ QQuery 3     │ 3448.29 ms │  2918.47 ms │ +1.18x faster │
│ QQuery 4     │ 1909.15 ms │  1632.98 ms │ +1.17x faster │
│ QQuery 5     │ 3056.36 ms │  2831.82 ms │ +1.08x faster │
│ QQuery 6     │ 2663.13 ms │  2594.64 ms │     no change │
│ QQuery 7     │ 2802.28 ms │  2592.43 ms │ +1.08x faster │
│ QQuery 8     │ 4489.29 ms │  4199.00 ms │ +1.07x faster │
│ QQuery 9     │ 7001.75 ms │  6622.98 ms │ +1.06x faster │
│ QQuery 10    │ 4725.80 ms │  4619.37 ms │     no change │
└──────────────┴────────────┴─────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary          ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)          │ 35325.09ms │
│ Total Time (agg-metrics)   │ 32601.19ms │
│ Average Time (main)        │  3532.51ms │
│ Average Time (agg-metrics) │  3260.12ms │
│ Queries Faster             │          7 │
│ Queries Slower             │          0 │
│ Queries with No Change     │          3 │
│ Queries with Failure       │          0 │
└────────────────────────────┴────────────┘

```

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

Nothing that is direct to the user, additional metrics will now be
available, but no breaking changes.

---------

Co-authored-by: Raz Luvaton <16746759+rluvaton@users.noreply.github.com>
Co-authored-by: Eshed Schacham <ashdnazg@gmail.com>
## Which issue does this PR close?
Part of #16915

## Rationale for this change
Format code examples in documentation comments to improve readability
and maintain consistent code style across the codebase. This is part of
a multi-PR effort to format all doc comment examples and eventually
enable CI checks to enforce this formatting.

## What changes are included in this PR?
Run `cargo fmt -p datafusion -- --config
format_code_in_doc_comments=true`

## Are these changes tested?
No testing needed - this is purely a formatting change with no
functional modifications.

## Are there any user-facing changes?
No - this only affects documentation formatting.

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
## Which issue does this PR close?
Part of #16915

## Rationale for this change
Format code examples in documentation comments to improve readability
and maintain consistent code style across the codebase. This is part of
a multi-PR effort to format all doc comment examples and eventually
enable CI checks to enforce this formatting.

## What changes are included in this PR?
Run `cargo fmt -p <crate> -- --config format_code_in_doc_comments=true`
for the following datasource-related crates:
  - `datafusion-expr`
  - `datafusion-expr-common`

## Are these changes tested?
No testing needed - this is purely a formatting change with no
functional modifications.

## Are there any user-facing changes?
No - this only affects documentation formatting.
## Which issue does this PR close?
Part of #16915

## Rationale for this change
Format code examples in documentation comments to improve readability
and maintain consistent code style across the codebase. This is part of
a multi-PR effort to format all doc comment examples and eventually
enable CI checks to enforce this formatting.

## What changes are included in this PR?
Run `cargo fmt -p <crate> -- --config format_code_in_doc_comments=true`
for the following datasource-related crates:
  - `datafusion-datasource`
  - `datafusion-datasource-arrow`
  - `datafusion-datasource-avro`
  - `datafusion-datasource-csv`
  - `datafusion-datasource-json`
  - `datafusion-datasource-parquet`

## Are these changes tested?
No testing needed - this is purely a formatting change with no
functional modifications.

## Are there any user-facing changes?
No - this only affects documentation formatting.

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
- part of #15791

All easy cases from #18185
(that are nicely-ish displayed in git diff).

Note on preserving comments: if it was note about what should happen (or
what will be tested), it's placed on top of the snapshot. If that's
something that comments part of the plan, I put it below the plan
## Which issue does this PR close?

- Closes #18171

## Rationale for this change

Included in the issue

## Are these changes tested?

While I have tested this on local with a local TPCDS-like dataset, I
would appreciate if someone provides me a good way to add tests for the
same 😅

---------

Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
## Which issue does this PR close?
Part of #16915

## Rationale for this change
Format code examples in documentation comments to improve readability
and maintain consistent code style across the codebase. This is part of
a multi-PR effort to format all doc comment examples and eventually
enable CI checks to enforce this formatting.

## What changes are included in this PR?
Run `cargo fmt -p <crate> -- --config format_code_in_doc_comments=true`
for the following datasource-related crates:
  - `datafusion-macros`
  - `datafusion-optimizer`

## Are these changes tested?
No testing needed - this is purely a formatting change with no
functional modifications.

## Are there any user-facing changes?
No - this only affects documentation formatting.
…18358)

## Which issue does this PR close?
Part of #16915

## Rationale for this change
Format code examples in documentation comments to improve readability
and maintain consistent code style across the codebase. This is part of
a multi-PR effort to format all doc comment examples and eventually
enable CI checks to enforce this formatting.

## What changes are included in this PR?
Run `cargo fmt -p <crate> -- --config format_code_in_doc_comments=true`
for the following datasource-related crates:
  - `datafusion-proto`
  - `datafusion-proto-common`
  - `datafusion-pruning`
  - `datafusion-session`

## Are these changes tested?
No testing needed - this is purely a formatting change with no
functional modifications.

## Are there any user-facing changes?
No - this only affects documentation formatting.
@github-actions github-actions bot added sql SQL Planner development-process Related to development process of DataFusion logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate catalog Related to the catalog crate common Related to common crate execution Related to the execution crate proto Related to proto crate functions Changes to functions implementation datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate spark labels Oct 29, 2025
@manasa-manoj-nbr manasa-manoj-nbr deleted the benchmark branch October 29, 2025 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

catalog Related to the catalog crate common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate development-process Related to development process of DataFusion documentation Improvements or additions to documentation execution Related to the execution crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate proto Related to proto crate spark sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a page to describe the bench code we have.