-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat(benchmark) : added a dedicated page for benchmarking in contribu… #18365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## Which issue does this PR close? - Closes #17862 ## Rationale for this change Most logical plan expressions now propagate metadata; however, parameters with extension types or other field metadata cannot participate in placeholder/parameter binding. ## What changes are included in this PR? The DataType in the Placeholder struct was replaced with a FieldRef along with anything that stored the "DataType" of a parameter. Strictly speaking one could bind parameters with an extension type by copy/pasting the placeholder replacer, which I figured out towards the end of this change. I still think this change makes sense and opens up the door for things like handling UUID in SQL with full parameter binding support. ## Are these changes tested? Yes ## Are there any user-facing changes? Yes, one new function was added to extract the placeholder fields from a plan. This is a breaking change for code that specifically interacts with the pub fields of the modified structs (ParamValues, Placeholder, and Prepare are the main ones). --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
This logic would previously swallow all errors, including things like DNS resolution failures. If the path was indeed a path and not a prefix, that would result in the path being dropped, since the prefix wouldn't include any further files. Fixes #18242.
Updated links in the README to point to the correct URLs.
…rquet source (#18196) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> Part of #18116 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> The below configuration can be used to let `EXPLAIN ANALYZE` only show important high-level insights. ``` set datafusion.explain.analyze_level = summary; ``` This PR sets `summary` level metrics for the parquet data source: ### `summary` level metrics for `DataSourceExec` with `Parquet` source - File level pruning metrics - Row-group level pruning metrics - Bytes scanned - metadata load time In https://github.com/apache/datafusion/blob/155b56e521d75186776a65f1634ee03058899a79/datafusion/datasource-parquet/src/metrics.rs#L29 The remaining metrics are kept in the `dev` level. I'm not sure if the page level pruning metrics should also be included to the `summary` level, I'm open to suggestions for this, or any other metrics that should also be included. While implementing this, I came up with a few ideas to further improve metrics tracking in the Parquet scanner. I’ve documented them in #18195 ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Set the above metrics to `summary` analyze level ## Are these changes tested? UTs <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? No <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
This changes the `decode` built-in function so that SQL like the
following works:
```sql
decode('cXdlcXc=', 'base64');
```
Padding isn't required to decode correctly, but it's surprising to users
if we just reject it outright.
## Which issue does this PR close? - Follow on to #17986 from @paleolimbot ## Rationale for this change As we thread Field through more of the DataFusion APs, making it easy to convert back and forth with Field will be increasingly important. We added some helper methods in #17986, but I think they could be better documented (I wrote them so this is not a dig at @paleolimbot !) Lets add some more documentation and examples so it is clearer what this code is doing. ## What changes are included in this PR? 1. Add more Documentation and examples so it is clearer what this code is doing. ## Are these changes tested? By CI ## Are there any user-facing changes? More docs, no functional changes
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> Part of #18095 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> In the contributor guide documentation, better explain why large AI-generated PRs without understanding should be rejected. See more in the original issue discussion. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com> Co-authored-by: Alex Huang <huangweijun1001@gmail.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
## Which issue does this PR close? - Follow on to #17986 from @paleolimbot ## Rationale for this change As we thread Field through more of the DataFusion APs, making it easy to convert back and forth will be increasingly important. We added `ScalarAndMetadata` and I think it is a good idea to add some helper methods to make it easy to create `ScalarAndMetadata`. ## What changes are included in this PR? Add some From impls that make conversions easier ## Are these changes tested? By CI ## Are there any user-facing changes? SOme new APIs
…benchmarks (#18209) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. ## Rationale for this change Allow users to understand some decisions for when to change certain joins configurations. <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? Add readme to joins physical plan <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
…#18238) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #18234. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> https://discord.com/channels/885562378132000778/1430237388474552380/1430618776751313018 ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ``` cargo run -q --bin datafusion-cli DataFusion CLI v50.3.0 > DESCRIBE SELECT 1; +-------------+-----------+-------------+ | column_name | data_type | is_nullable | +-------------+-----------+-------------+ | Int64(1) | Int64 | NO | +-------------+-----------+-------------+ 1 row(s) fetched. Elapsed 0.022 seconds. ``` ## Are these changes tested? No, looking for feedback on approach first... happy to add a test. <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? Yes, it changes to behavior of `DESCRIBE SELECT` from explaining the physical plan (EXPLAIN) to describing the schema of the query (like DESCRIBE table). <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
…8229) ## Which issue does this PR close? - Part of #18160 ## Rationale for this change As we spend more effort optimizing the number of IO requests made during various scenarios, we need to ensure we have test coverage to: 1. Verify that the optimizations are working as intended 2. Prevent regressions in the future as code changes are made ## What changes are included in this PR? Add a new integration test that verifies what IO operations happen when creating and querying listing tables ## Are these changes tested? It is all tests ## Are there any user-facing changes? No, only tests
Removes a downcast match in favor of use of the trait. This mirrors the changes to DataSourceExec to use partition_statistics instead of statistics from #15852
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #16244 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Support `output_bytes` in `BaselineMetrics` (a common metrics set for almost all operators) ``` DataFusion CLI v50.3.0 > explain analyze select * from generate_series(1, 1000000) as t1(v1) order by v1 desc; +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Plan with Metrics | SortExec: expr=[v1@0 DESC], preserve_partitioning=[false], metrics=[output_rows=1000000, elapsed_compute=96.421534ms, output_bytes=7.6 MB, spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, batches_split=0] | | | ProjectionExec: expr=[value@0 as v1], metrics=[output_rows=1000000, elapsed_compute=34.125µs, output_bytes=7.7 MB] | | | LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=1, end=1000000, batch_size=8192], metrics=[output_rows=1000000, elapsed_compute=2.262626ms, output_bytes=7.7 MB] | | | | +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row(s) fetched. Elapsed 0.080 seconds. ``` Note it might overestimate memory due to a well-known issue. See the PR snippet for details ```rs /// Memory usage of all output batches. /// /// Note: This value may be overestimated. If multiple output `RecordBatch` /// instances share underlying memory buffers, their sizes will be counted /// multiple times. /// Issue: <#16841> output_bytes: Count, ``` I think this metric provides valuable insight, so it's better for it to overestimate than not exist at all. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> 1. Add `output_bytes` to `BaselineMetrics`, and it's set to `summary` analyze level. (see config `datafusion.explain.analyze_level` for details) 2. This metrics will be automatically tracked through `record_poll()` API, which is a common interface most operators uses when a new output batch is generated. ## Are these changes tested? UT <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 3. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
…/floor (#18265) ## Which issue does this PR close? <!-- --> - Closes #18175 ## Rationale for this change <!-- --> The Ceil/Floor calls via SQL was being parsed such that they were taking 2 arguments instead of 1, the second of which is not currently needed and the second argument was being ignored and passed silently. ## What changes are included in this PR? <!-- --> The second parameter(`field`) which was being passed if is of the `CeilFloorKind` enum from `sqlparser` crate . Neither of the enum's two variants (`DateTimeField` and `Scale`)are being implemented hence they have been ignored with apporpriate error type and only succeeds if the `DateTimeField` has `NoDateTime` variant i,e it is treated as empty. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> All Unit Tests pass successfully. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
#18273) ## Which issue does this PR close? Closes #18058 ## Rationale for this change When adding the bitmap_count function to Comet, we get the following error - org.apache.comet.CometNativeException: Error from DataFusion: bitmap_count expects Binary/BinaryView/FixedSizeBinary/LargeBinary as argument, got Dictionary(Int32, Binary). ## Are these changes tested? Added new UT --------- Co-authored-by: Kazantsev Maksim <mn.kazantsev@gmail.com>
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #18288 ## Rationale for this change `cargo audit` says that the current version of `half` we have in our Cargo.lock file was yanked ``` Crate: half Version: 2.7.0 Warning: yanked Dependency tree: half 2.7.0 ``` And indeed it is: https://crates.io/crates/half/versions <img width="1193" height="830" alt="Screenshot 2025-10-26 at 7 20 54 AM" src="https://github.com/user-attachments/assets/ad6944c6-912c-4c56-9d1d-efe760ae85ee" /> So let's update to a non yanked version ## What changes are included in this PR? run `cargo update -p half` and check the result in ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.36 to 2.62.38. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/taiki-e/install-action/releases">taiki-e/install-action's releases</a>.</em></p> <blockquote> <h2>2.62.38</h2> <ul> <li> <p>Update <code>coreutils@latest</code> to 0.3.0.</p> </li> <li> <p>Update <code>wasmtime@latest</code> to 38.0.3.</p> </li> <li> <p>Update <code>mise@latest</code> to 2025.10.17.</p> </li> <li> <p>Update <code>cargo-tarpaulin@latest</code> to 0.34.1.</p> </li> </ul> <h2>2.62.37</h2> <ul> <li> <p>Update <code>cargo-binstall@latest</code> to 1.15.8.</p> </li> <li> <p>Update <code>zizmor@latest</code> to 1.16.0.</p> </li> <li> <p>Update <code>mise@latest</code> to 2025.10.16.</p> </li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md">taiki-e/install-action's changelog</a>.</em></p> <blockquote> <h1>Changelog</h1> <p>All notable changes to this project will be documented in this file.</p> <p>This project adheres to <a href="https://semver.org">Semantic Versioning</a>.</p> <!-- raw HTML omitted --> <h2>[Unreleased]</h2> <ul> <li>Update <code>mise@latest</code> to 2025.10.18.</li> </ul> <h2>[2.62.38] - 2025-10-25</h2> <ul> <li> <p>Update <code>coreutils@latest</code> to 0.3.0.</p> </li> <li> <p>Update <code>wasmtime@latest</code> to 38.0.3.</p> </li> <li> <p>Update <code>mise@latest</code> to 2025.10.17.</p> </li> <li> <p>Update <code>cargo-tarpaulin@latest</code> to 0.34.1.</p> </li> </ul> <h2>[2.62.37] - 2025-10-24</h2> <ul> <li> <p>Update <code>cargo-binstall@latest</code> to 1.15.8.</p> </li> <li> <p>Update <code>zizmor@latest</code> to 1.16.0.</p> </li> <li> <p>Update <code>mise@latest</code> to 2025.10.16.</p> </li> </ul> <h2>[2.62.36] - 2025-10-23</h2> <ul> <li> <p>Update <code>syft@latest</code> to 1.36.0.</p> </li> <li> <p>Update <code>vacuum@latest</code> to 0.19.0.</p> </li> <li> <p>Update <code>mise@latest</code> to 2025.10.15.</p> </li> </ul> <h2>[2.62.35] - 2025-10-22</h2> <ul> <li> <p>Update <code>wasmtime@latest</code> to 38.0.2.</p> </li> <li> <p>Update <code>cargo-nextest@latest</code> to 0.9.108.</p> </li> <li> <p>Update <code>mise@latest</code> to 2025.10.14.</p> </li> <li> <p>Update <code>vacuum@latest</code> to 0.18.9.</p> </li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/taiki-e/install-action/commit/c5b1b6f479c32f356cc6f4ba672a47f63853b13b"><code>c5b1b6f</code></a> Release 2.62.38</li> <li><a href="https://github.com/taiki-e/install-action/commit/7cd74f6aac6a2a6c13632c29a30ffc0ef8053cf2"><code>7cd74f6</code></a> Update <code>coreutils@latest</code> to 0.3.0</li> <li><a href="https://github.com/taiki-e/install-action/commit/def9901333773abdceeb414c2c2a68cc4276eea9"><code>def9901</code></a> Update <code>wasmtime@latest</code> to 38.0.3</li> <li><a href="https://github.com/taiki-e/install-action/commit/a9d3853729137d6a76fdb344e3fdba064bb51dd5"><code>a9d3853</code></a> Update coreutils manifest</li> <li><a href="https://github.com/taiki-e/install-action/commit/958d48b0c9eb6cf8c0edca899e787eb73a91794c"><code>958d48b</code></a> Update <code>mise@latest</code> to 2025.10.17</li> <li><a href="https://github.com/taiki-e/install-action/commit/fb485991fd79e393a6a4e3715369bdd7a96fc12d"><code>fb48599</code></a> Update <code>cargo-tarpaulin@latest</code> to 0.34.1</li> <li><a href="https://github.com/taiki-e/install-action/commit/1c7b1d35fcc8f6525be0cbdacbf5977079a3f94c"><code>1c7b1d3</code></a> Release 2.62.37</li> <li><a href="https://github.com/taiki-e/install-action/commit/18cba62798fa05dd5849e62a3759a8ef249feefc"><code>18cba62</code></a> Update <code>cargo-binstall@latest</code> to 1.15.8</li> <li><a href="https://github.com/taiki-e/install-action/commit/f3c0c6962aed40004323e265015332d9d9cf90f9"><code>f3c0c69</code></a> Update <code>zizmor@latest</code> to 1.16.0</li> <li><a href="https://github.com/taiki-e/install-action/commit/99fc3e5b1e80c12d05e5bff5af81a035ab4e98b5"><code>99fc3e5</code></a> Update <code>mise@latest</code> to 2025.10.16</li> <li>See full diff in <a href="https://github.com/taiki-e/install-action/compare/ebb229c6baa68383264f2822689b07b4916d9177...c5b1b6f479c32f356cc6f4ba672a47f63853b13b">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #11336 Since this is my first contribution, I suppose to mention @alamb , author of the Issue #11336 Could you please trigger the CI? Thanks! ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> The Arrow introduction guide (#11336) needed improvements to make it more accessible for newcomers while providing better navigation to advanced topics. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Issue #11336 requested a gentle introduction to Apache Arrow and RecordBatches to help DataFusion users understand the foundational concepts. This PR enhances the existing Arrow introduction guide with clearer explanations, practical examples, visual aids, and comprehensive navigation links to make it more accessible for newcomers while providing pathways to advanced topics. Was unsure if this fits to `docs/source/user-guide/dataframe.md' ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> applyed prettier, like described. ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> Yes - improved documentation for the Arrow introduction guide at `docs/source/user-guide/arrow-introduction.md` <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Martin <your.email@example.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Bumps [regex](https://github.com/rust-lang/regex) from 1.11.3 to 1.12.2. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/rust-lang/regex/blob/master/CHANGELOG.md">regex's changelog</a>.</em></p> <blockquote> <h1>1.12.2 (2025-10-13)</h1> <p>This release fixes a <code>cargo doc</code> breakage on nightly when <code>--cfg docsrs</code> is enabled. This caused documentation to fail to build on docs.rs.</p> <p>Bug fixes:</p> <ul> <li>[BUG <a href="https://redirect.github.com/rust-lang/regex/issues/1305">#1305</a>](<a href="https://redirect.github.com/rust-lang/regex/issues/1305">rust-lang/regex#1305</a>): Switches the <code>doc_auto_cfg</code> feature to <code>doc_cfg</code> on nightly for docs.rs builds.</li> </ul> <h1>1.12.1 (2025-10-10)</h1> <p>This release makes a bug fix in the new <code>regex::Captures::get_match</code> API introduced in <code>1.12.0</code>. There was an oversight with the lifetime parameter for the <code>Match</code> returned. This is technically a breaking change, but given that it was caught almost immediately and I've yanked the <code>1.12.0</code> release, I think this is fine.</p> <h1>1.12.0 (2025-10-10)</h1> <p>This release contains a smattering of bug fixes, a fix for excessive memory consumption in some cases and a new <code>regex::Captures::get_match</code> API.</p> <p>Improvements:</p> <ul> <li>[FEATURE <a href="https://redirect.github.com/rust-lang/regex/issues/1146">#1146</a>](<a href="https://redirect.github.com/rust-lang/regex/issues/1146">rust-lang/regex#1146</a>): Add <code>Capture::get_match</code> for returning the overall match without <code>unwrap()</code>.</li> </ul> <p>Bug fixes:</p> <ul> <li>[BUG <a href="https://redirect.github.com/rust-lang/regex/issues/1083">#1083</a>](<a href="https://redirect.github.com/rust-lang/regex/issues/1083">rust-lang/regex#1083</a>): Fixes a panic in the lazy DFA (can only occur for especially large regexes).</li> <li>[BUG <a href="https://redirect.github.com/rust-lang/regex/issues/1116">#1116</a>](<a href="https://redirect.github.com/rust-lang/regex/issues/1116">rust-lang/regex#1116</a>): Fixes a memory usage regression for large regexes (introduced in <code>regex 1.9</code>).</li> <li>[BUG <a href="https://redirect.github.com/rust-lang/regex/issues/1195">#1195</a>](<a href="https://redirect.github.com/rust-lang/regex/issues/1195">rust-lang/regex#1195</a>): Fix universal start states in sparse DFA.</li> <li>[BUG <a href="https://redirect.github.com/rust-lang/regex/issues/1295">#1295</a>](<a href="https://redirect.github.com/rust-lang/regex/pull/1295">rust-lang/regex#1295</a>): Fixes a panic when deserializing a corrupted dense DFA.</li> <li><a href="https://github.com/rust-lang/regex/commit/8f5d9479d0f1da5726488a530d7fd66a73d05b80">BUG 8f5d9479</a>: Make <code>regex_automata::meta::Regex::find</code> consistently return <code>None</code> when <code>WhichCaptures::None</code> is used.</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/rust-lang/regex/commit/5ea3eb1e95f0338e283f5f0b4681f0891a1cd836"><code>5ea3eb1</code></a> 1.12.2</li> <li><a href="https://github.com/rust-lang/regex/commit/ab0b07171b82d1d4fdc8359505d12b2e818514d4"><code>ab0b071</code></a> regex-automata-0.4.13</li> <li><a href="https://github.com/rust-lang/regex/commit/691d51457db276bbdf9ca3de2cafe285c662c59f"><code>691d514</code></a> regex-syntax-0.8.8</li> <li><a href="https://github.com/rust-lang/regex/commit/1dd90777791dbc6bbf389157d05ac8176c6ad051"><code>1dd9077</code></a> docs: swap <code>doc_auto_cfg</code> with <code>doc_cfg</code></li> <li><a href="https://github.com/rust-lang/regex/commit/0089034cb37b0bf3785f2e0211f7eca74033f4d1"><code>0089034</code></a> regex-cli-0.2.3</li> <li><a href="https://github.com/rust-lang/regex/commit/140f8949da3f575490bac80ff23dfc29458b82c7"><code>140f894</code></a> regex-lite-0.1.8</li> <li><a href="https://github.com/rust-lang/regex/commit/27d6d65263cb80266a62e3189408a44f201a0975"><code>27d6d65</code></a> 1.12.1</li> <li><a href="https://github.com/rust-lang/regex/commit/85398ad5002048bbeaa90f1fe37fbb31df2bc0d6"><code>85398ad</code></a> changelog: 1.12.1</li> <li><a href="https://github.com/rust-lang/regex/commit/764efbd305d3a7b817ec8892ff0a656ec657d660"><code>764efbd</code></a> api: tweak the lifetime of <code>Captures::get_match</code></li> <li><a href="https://github.com/rust-lang/regex/commit/ee6aa55e01786e4d2c11eb1be805835bbb3bfa99"><code>ee6aa55</code></a> rure-0.2.4</li> <li>Additional commits viewable in <a href="https://github.com/rust-lang/regex/compare/1.11.3...1.12.2">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [clap](https://github.com/clap-rs/clap) from 4.5.48 to 4.5.50. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/clap-rs/clap/releases">clap's releases</a>.</em></p> <blockquote> <h2>v4.5.50</h2> <h2>[4.5.50] - 2025-10-20</h2> <h3>Features</h3> <ul> <li>Accept <code>Cow</code> where <code>String</code> and <code>&str</code> are accepted</li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/clap-rs/clap/blob/master/CHANGELOG.md">clap's changelog</a>.</em></p> <blockquote> <h2>[4.5.50] - 2025-10-20</h2> <h3>Features</h3> <ul> <li>Accept <code>Cow</code> where <code>String</code> and <code>&str</code> are accepted</li> </ul> <h2>[4.5.49] - 2025-10-13</h2> <h3>Fixes</h3> <ul> <li><em>(help)</em> Correctly wrap when ANSI escape codes are present</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/clap-rs/clap/commit/d8acd4729878ca72d305e6cf7adf7acc0da36738"><code>d8acd47</code></a> chore: Release</li> <li><a href="https://github.com/clap-rs/clap/commit/7c2b8d9ad4d22650f969bd455d80b4181a7e25ff"><code>7c2b8d9</code></a> docs: Update changelog</li> <li><a href="https://github.com/clap-rs/clap/commit/e69a2ea55bc9076d95caf60d79e481581f688724"><code>e69a2ea</code></a> Merge pull request <a href="https://redirect.github.com/clap-rs/clap/issues/5987">#5987</a> from mernen/fix-bash-comp-words-loop</li> <li><a href="https://github.com/clap-rs/clap/commit/e03cc2e798183e9528f53d42d8b2699f034fc667"><code>e03cc2e</code></a> Merge pull request <a href="https://redirect.github.com/clap-rs/clap/issues/5988">#5988</a> from cordx56/fix-builder-custom-version-docs</li> <li><a href="https://github.com/clap-rs/clap/commit/5ab2579844a47a26b4567f77a7b9d198be006f0a"><code>5ab2579</code></a> fix: Minor fix for builder docs about version</li> <li><a href="https://github.com/clap-rs/clap/commit/2f66432721bd24602455dc3e31765195c6107c34"><code>2f66432</code></a> fix(complete): Only parse arguments before current</li> <li><a href="https://github.com/clap-rs/clap/commit/4d9d2100f75693645ea68180ed4b6b3ecacb9923"><code>4d9d210</code></a> test(complete): Illustrate current behavior in Bash</li> <li><a href="https://github.com/clap-rs/clap/commit/6abe2f8c61e31d8d43fee42c18414926c60893be"><code>6abe2f8</code></a> chore: Release</li> <li><a href="https://github.com/clap-rs/clap/commit/d5c74542ce628b57424caec88efee1a231c436a0"><code>d5c7454</code></a> docs: Update changelog</li> <li><a href="https://github.com/clap-rs/clap/commit/5b2e960267b94d4811c9c3b99c62899a87505413"><code>5b2e960</code></a> Merge pull request <a href="https://redirect.github.com/clap-rs/clap/issues/5985">#5985</a> from mernen/bash-cur</li> <li>Additional commits viewable in <a href="https://github.com/clap-rs/clap/compare/clap_complete-v4.5.48...clap_complete-v4.5.50">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Which issue does this PR close? - Related to apache/arrow-rs#7835 - Closes #3666 Note while this PR looks massive, a large portion is display updates due to better display of Fields and DataTypes ## Rationale for this change Upgrade to the latest arrow Also, there are several new features in arrow-57 that I want to be able to test including Variant, arrow-avro, and a new parquet metadata reader. ## What changes are included in this PR? 1. Update arrow/parquet 2. Update prost 3. Update substrait 4. Update pbjson 5. Make API changes to avoid deprecated APIs ## Are these changes tested? By CI ## Are there any user-facing changes? New arrow
Bumps [syn](https://github.com/dtolnay/syn) from 2.0.106 to 2.0.108. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/dtolnay/syn/releases">syn's releases</a>.</em></p> <blockquote> <h2>2.0.108</h2> <ul> <li>Parse unrecognized or invalid literals as Lit::Verbatim (<a href="https://redirect.github.com/dtolnay/syn/issues/1925">#1925</a>)</li> </ul> <h2>2.0.107</h2> <ul> <li>Improve panic message when constructing a LitInt, LitFloat, or Lit from invalid syntax (<a href="https://redirect.github.com/dtolnay/syn/issues/1917">#1917</a>)</li> <li>Improve panic message on Punctuated index out of bounds (<a href="https://redirect.github.com/dtolnay/syn/issues/1922">#1922</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/dtolnay/syn/commit/7a7e331255822d49bea01e29c326ee7a5cd5415c"><code>7a7e331</code></a> Release 2.0.108</li> <li><a href="https://github.com/dtolnay/syn/commit/30463afa201abc30e086bd1fb1deb714eb8910f4"><code>30463af</code></a> Merge pull request <a href="https://redirect.github.com/dtolnay/syn/issues/1926">#1926</a> from dtolnay/litfuzz</li> <li><a href="https://github.com/dtolnay/syn/commit/1cc9167f60d209865e91bf73a949d25914e6bf18"><code>1cc9167</code></a> Add fuzzer for literal parsing</li> <li><a href="https://github.com/dtolnay/syn/commit/c49e1d3a65ab423beee54ed730ea3f849ec49e0b"><code>c49e1d3</code></a> Merge pull request <a href="https://redirect.github.com/dtolnay/syn/issues/1925">#1925</a> from dtolnay/litparse</li> <li><a href="https://github.com/dtolnay/syn/commit/d047536103b7edfb0408dab8ec65cde19e73a88f"><code>d047536</code></a> Report unexpected verbatim literals in test</li> <li><a href="https://github.com/dtolnay/syn/commit/ce9776747974555e30cd890b9e1d3030e02efc13"><code>ce97767</code></a> Parse unrecognized or invalid literals as Lit::Verbatim</li> <li><a href="https://github.com/dtolnay/syn/commit/e4a8957feb1b86e6da4309c9886ca15ddfd7b7ad"><code>e4a8957</code></a> Release 2.0.107</li> <li><a href="https://github.com/dtolnay/syn/commit/1792e83acfcc4810ccca70c22952986a6ea09d7e"><code>1792e83</code></a> Merge pull request <a href="https://redirect.github.com/dtolnay/syn/issues/1922">#1922</a> from dtolnay/outofbounds</li> <li><a href="https://github.com/dtolnay/syn/commit/532e4af53355f8c4585251e1507336bed8d39f14"><code>532e4af</code></a> Improve panic message on Punctuated index out of bounds</li> <li><a href="https://github.com/dtolnay/syn/commit/909c2221dd582e18f748988384e8ec4edd7544cf"><code>909c222</code></a> Add test of Punctuated indexing</li> <li>Additional commits viewable in <a href="https://github.com/dtolnay/syn/compare/2.0.106...2.0.108">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Which issue does this PR close? - Related to #14993 ## Rationale for this change To enable expression pushdown to file sources, we need to plumb expressions through the `FileScanConfig` layer. Currently, `FileScanConfig` only tracks column indices for projection, which limits us to simple and naive column selection. This PR begins expression pushdown implementation by having `FileScanConfig` own a list of `ProjectionExpr`s, instead of column indices. This allows file sources to eventually receive and optimize based on the actual expressions being projected. ## Notes about this PR - The first commit is based off of #18231 - To avoid a super large diff and a harder review, I've decided to break (#14993) into 2 tasks: - Have the `DataSource` (`FileScanConfig`) actually hold projection expressions (this PR) - Flow the projection expressions from `DataSourceExec` all the way to the `FileSource` --------- Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Doing some prework for #15881 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> `Range` and `GenSeries` are essentially the same except for whether they include upper bounds or not; unify their function code to reduce duplication, making future changes easier. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Remove `GenSeries` struct, folding it into `Range`. Do some more minor refactoring to their code. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Existing tests (updated some error messages). ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> Not really (updated some error messages). <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
…18303) ## Which issue does this PR close? - Closes #18302 ## Rationale for this change As described in the issue, this is a low-effort QoL fix for now. ## What changes are included in this PR? Uses the existing function for naming fields to replace the hardcoded `"is_set"` with a field-dependent name. Example output: ``` Field { name: "first_value(records_partitioned.trace_id)[first_value]", data_type: Utf8View, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {}, }, Field { name: "first_value(records_partitioned.trace_id)[first_value_is_set]", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {}, }, Field { name: "first_value(records_partitioned.value)[first_value]", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {}, }, Field { name: "first_value(records_partitioned.value)[first_value_is_set]", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {}, }, ``` ## Are these changes tested? No tests have been added, hopefully it should be covered by existing changes. ## Are there any user-facing changes? There should not be any, I assume `is_set` is never user visible.
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> N/A ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Cover missing f16 type for `abs` ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Support `abs` on f16; also do some cleanup. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Added SLT. ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> No. <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
## Which issue does this PR close? - Improvement in the context of #18075 - Continues on #17898 ## Rationale for this change Case evaluation currently uses `PhysicalExpr::evaluate_selection` for each branch of the case expression. This implementation is fine, but because `evaluate_selection` is not specific to the `case` logic we're missing some optimisation opportunities. The main consequence is that too much work is being done filtering record batches and scattering results. This PR introduces specialised filtering logic and result interleaving for case. A more detailed description and diagrams are available at #18075 (comment) ## What changes are included in this PR? Rewrite the `case_when_no_expr` and `case_when_with_expr` evaluation loops to avoid as much unnecessary work as possible. In particular the remaining rows to be evaluated are retained across loop iterations. This allows the record batch that needs to be filtered to shrink as the loop is being evaluated which reduces the number of rows that needs to be refiltered. If a when predicate does not match any rows at all, filtering is avoided entirely. The final result is also not merged every loop iteration. Instead an index vector is constructed which is used to compose the final result once using a custom 'multi zip'/'interleave' like operation. ## Are these changes tested? Covered by existing unit tests and SLTs ## Are there any user-facing changes? No
## Which issue does this PR close? Part of #16915 ## Rationale for this change Format code examples in documentation comments to improve readability and maintain consistent code style across the codebase. This is part of a multi-PR effort to format all doc comment examples and eventually enable CI checks to enforce this formatting. ## What changes are included in this PR? Run `cargo fmt -p datafusion-execution -- --config format_code_in_doc_comments=true` ## Are these changes tested? No testing needed - this is purely a formatting change with no functional modifications. ## Are there any user-facing changes? No - this only affects documentation formatting.
## Which issue does this PR close? Part of #16915 ## Rationale for this change Format code examples in documentation comments to improve readability and maintain consistent code style across the codebase. This is part of a multi-PR effort to format all doc comment examples and eventually enable CI checks to enforce this formatting. ## What changes are included in this PR? Run `cargo fmt -p datafusion-common -- --config format_code_in_doc_comments=true` ## Are these changes tested? No testing needed - this is purely a formatting change with no functional modifications. ## Are there any user-facing changes? No - this only affects documentation formatting.
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #18323 . ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Adds more detailed metrics, so it is easier to identify which part of the aggregate streams are actually slow. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Added a metrics struct, and used it in the functions common to the aggregate streams. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes, added some tests to verify the metrics are actually updated and can be retrieved. I've also ran the groupby benchmarks to ensure we don't create timers in a way that could impact performance, and it seems ok, all the changes are within what I'd expect as std variation on a local machine. ``` Comparing main and agg-metrics -------------------- Benchmark h2o.json -------------------- ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Query ┃ main ┃ agg-metrics ┃ Change ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ QQuery 1 │ 1252.42 ms │ 1196.62 ms │ no change │ │ QQuery 2 │ 3976.62 ms │ 3392.89 ms │ +1.17x faster │ │ QQuery 3 │ 3448.29 ms │ 2918.47 ms │ +1.18x faster │ │ QQuery 4 │ 1909.15 ms │ 1632.98 ms │ +1.17x faster │ │ QQuery 5 │ 3056.36 ms │ 2831.82 ms │ +1.08x faster │ │ QQuery 6 │ 2663.13 ms │ 2594.64 ms │ no change │ │ QQuery 7 │ 2802.28 ms │ 2592.43 ms │ +1.08x faster │ │ QQuery 8 │ 4489.29 ms │ 4199.00 ms │ +1.07x faster │ │ QQuery 9 │ 7001.75 ms │ 6622.98 ms │ +1.06x faster │ │ QQuery 10 │ 4725.80 ms │ 4619.37 ms │ no change │ └──────────────┴────────────┴─────────────┴───────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓ ┃ Benchmark Summary ┃ ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩ │ Total Time (main) │ 35325.09ms │ │ Total Time (agg-metrics) │ 32601.19ms │ │ Average Time (main) │ 3532.51ms │ │ Average Time (agg-metrics) │ 3260.12ms │ │ Queries Faster │ 7 │ │ Queries Slower │ 0 │ │ Queries with No Change │ 3 │ │ Queries with Failure │ 0 │ └────────────────────────────┴────────────┘ ``` ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> Nothing that is direct to the user, additional metrics will now be available, but no breaking changes. --------- Co-authored-by: Raz Luvaton <16746759+rluvaton@users.noreply.github.com> Co-authored-by: Eshed Schacham <ashdnazg@gmail.com>
## Which issue does this PR close? Part of #16915 ## Rationale for this change Format code examples in documentation comments to improve readability and maintain consistent code style across the codebase. This is part of a multi-PR effort to format all doc comment examples and eventually enable CI checks to enforce this formatting. ## What changes are included in this PR? Run `cargo fmt -p datafusion -- --config format_code_in_doc_comments=true` ## Are these changes tested? No testing needed - this is purely a formatting change with no functional modifications. ## Are there any user-facing changes? No - this only affects documentation formatting. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
## Which issue does this PR close? Part of #16915 ## Rationale for this change Format code examples in documentation comments to improve readability and maintain consistent code style across the codebase. This is part of a multi-PR effort to format all doc comment examples and eventually enable CI checks to enforce this formatting. ## What changes are included in this PR? Run `cargo fmt -p <crate> -- --config format_code_in_doc_comments=true` for the following datasource-related crates: - `datafusion-expr` - `datafusion-expr-common` ## Are these changes tested? No testing needed - this is purely a formatting change with no functional modifications. ## Are there any user-facing changes? No - this only affects documentation formatting.
## Which issue does this PR close? Part of #16915 ## Rationale for this change Format code examples in documentation comments to improve readability and maintain consistent code style across the codebase. This is part of a multi-PR effort to format all doc comment examples and eventually enable CI checks to enforce this formatting. ## What changes are included in this PR? Run `cargo fmt -p <crate> -- --config format_code_in_doc_comments=true` for the following datasource-related crates: - `datafusion-datasource` - `datafusion-datasource-arrow` - `datafusion-datasource-avro` - `datafusion-datasource-csv` - `datafusion-datasource-json` - `datafusion-datasource-parquet` ## Are these changes tested? No testing needed - this is purely a formatting change with no functional modifications. ## Are there any user-facing changes? No - this only affects documentation formatting. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
## Which issue does this PR close? - Closes #18171 ## Rationale for this change Included in the issue ## Are these changes tested? While I have tested this on local with a local TPCDS-like dataset, I would appreciate if someone provides me a good way to add tests for the same 😅 --------- Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>
## Which issue does this PR close? Part of #16915 ## Rationale for this change Format code examples in documentation comments to improve readability and maintain consistent code style across the codebase. This is part of a multi-PR effort to format all doc comment examples and eventually enable CI checks to enforce this formatting. ## What changes are included in this PR? Run `cargo fmt -p <crate> -- --config format_code_in_doc_comments=true` for the following datasource-related crates: - `datafusion-macros` - `datafusion-optimizer` ## Are these changes tested? No testing needed - this is purely a formatting change with no functional modifications. ## Are there any user-facing changes? No - this only affects documentation formatting.
…18358) ## Which issue does this PR close? Part of #16915 ## Rationale for this change Format code examples in documentation comments to improve readability and maintain consistent code style across the codebase. This is part of a multi-PR effort to format all doc comment examples and eventually enable CI checks to enforce this formatting. ## What changes are included in this PR? Run `cargo fmt -p <crate> -- --config format_code_in_doc_comments=true` for the following datasource-related crates: - `datafusion-proto` - `datafusion-proto-common` - `datafusion-pruning` - `datafusion-session` ## Are these changes tested? No testing needed - this is purely a formatting change with no functional modifications. ## Are there any user-facing changes? No - this only affects documentation formatting.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
catalog
Related to the catalog crate
common
Related to common crate
core
Core DataFusion crate
datasource
Changes to the datasource crate
development-process
Related to development process of DataFusion
documentation
Improvements or additions to documentation
execution
Related to the execution crate
functions
Changes to functions implementation
logical-expr
Logical plan and expressions
optimizer
Optimizer rules
physical-expr
Changes to the physical-expr crates
physical-plan
Changes to the physical-plan crate
proto
Related to proto crate
spark
sql
SQL Planner
sqllogictest
SQL Logic Tests (.slt)
substrait
Changes to the substrait crate
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
The DataFusion project has an extensive benchmarking infrastructure with many different benchmark types (TPCH, ClickBench, IMDB, H2O.ai, micro-benchmarks, etc.) scattered across README files and code comments. Contributors need a centralized, easily discoverable resource to understand what benchmarks are available, how to use them for validating performance changes, and where to add new benchmark code. This addresses the maintainer's request in issue #17811 to create a dedicated documentation page describing all the benchmark code we have.
What changes are included in this PR?
The new documentation consolidates information about:
Are these changes tested?
Are there any user-facing changes?
No Breaking Changes:
- No changes to APIs, CLIs, or runtime behavior
- No changes to existing benchmark functionality
- Purely additive documentation enhancement