(test benchmark runnner) morsel driven execution by adriangb · Pull Request #6 · adriangb/datafusion

adriangb · 2026-03-09T05:31:54Z

No description provided.

This PR implements morsel-driven execution for Parquet files in DataFusion, enabling row-group level work sharing across partitions to mitigate data skew. Key changes: - Introduced `WorkQueue` in `datafusion/datasource/src/file_stream.rs` for shared pool of work. - Added `morselize` method to `FileOpener` trait to allow dynamic splitting of files into morsels. - Implemented `morselize` for `ParquetOpener` to split files into individual row groups. - Cached `ParquetMetaData` in `ParquetMorsel` extensions to avoid redundant I/O. - Modified `FileStream` to support work stealing from the shared queue. - Implemented `Weak` pointer pattern for `WorkQueue` in `FileScanConfig` to support plan re-executability. - Added `MorselizingGuard` to ensure shared state consistency on cancellation. - Added `allow_morsel_driven` configuration option (enabled by default for Parquet). - Implemented row-group pruning during the morselization phase for better efficiency. Tests: - Added `parquet_morsel_driven_execution` test to verify work distribution and re-executability. - Added `parquet_morsel_driven_enabled_by_default` to verify the default configuration. Co-authored-by: Dandandan <163737+Dandandan@users.noreply.github.com>

…en-execution-237164415184908839

## Which issue does this PR close? - Closes apache#17035 ## Rationale for this change Now that we have proper `FFI_ConfigOptions` we can pass these to scalar UDFs via FFI. ## What changes are included in this PR? Instead of passing default options, pass in converted config options from the input. Also did a drive by cleanup of switching to using FFI_ColumnarValue since it is now available. ## Are these changes tested? Unit test added. ## Are there any user-facing changes? This is a breaking API change, but not one that users will interact with directly. It breaks the ABI for FFI libraries, which is currently unstable.

@alamb

…cking in instrumentedObjectStore (apache#20802) ## Which issue does this PR close? Related to apache#18138 but does not close any issue. ## Rationale for this change TimeToFirstItemStream held an Arc<Mutex<Vec<RequestDetails>>> and a request_index to write back the duration into the shared request list. I saw @alamb and @BlakeOrth's reviews on the PR apache#19127 about the improvements and wanted to change. ## What changes are included in this PR? - Replace Arc<Mutex<Vec<RequestDetails>>> + index in TimeToFirstItemStream with a per-request Arc<AtomicU64> - Store duration as nanoseconds in AtomicU64 (0 = not yet set) with Release/Acquire ordering - Start the timer lazily on the first poll_next call instead of at stream creation, so only actual storage latency is measured ## Are these changes tested? Existing tests and I've also added time comparison ## Are there any user-facing changes? No

## Which issue does this PR close? - Part of apache#20585  ## Rationale for this change String UDFs should preserve string representation where feasible. `translate` previously accepted Utf8View input but emitted Utf8, causing an unnecessary type downgrade. This aligns `translate` with the expected behavior of returning the same string type as its primary input.  ## What changes are included in this PR? 1. Updated `translate` return type inference to emit Utf8View when input is Utf8View, while preserving existing behavior for Utf8 and LargeUtf8. 2. Refactored `translate` and `translate_with_map` to use explicit string builders (via a local `TranslateOutput` helper trait) instead of `.collect::<GenericStringArray<T>>()`, so the correct output array type is produced for each input type. 3. Added unit tests for Utf8View input (basic, null, non-ASCII) and sqllogictests verifying `arrow_typeof` output for all three string types.  ## Are these changes tested? Yes. Unit tests and sqllogictests are included.  ## Are there any user-facing changes? No.

…pache#20822) ## Which issue does this PR close? - Related to apache#20603 ## Rationale for this change This PR enables Parquet row-level filter pushdown for struct field access expressions, which previously fell back to a full scan followed by a separate filtering pass, a significant perf penalty for queries filtering on struct fields in large Parquet files (like Variant types!) Filters on struct fields like `WHERE s['foo'] > 67` were not being pushed into the Parquet decoder. This is because `PushdownChecker` sees the underlying `Column("s")` has a `Struct` type and unconditionally rejects it, without considering that `get_field` resolves to a primitive leaf. With this change, deeply nested access like `s['outer']['inner']` will also get pushed down because the logical simplifier flattens it before it reaches the physical plan Note: this does not address the projection side and should not be blocked by it. `SELECT s['foo']` still reads the entire struct rather than just the needed leaf column. That requires separate changes to how the opener builds its projection mask.

…bytes (apache#20719) ## Which issue does this PR close? - Closes apache#19569. ## Rationale for this change This was the latest usage as far as I can see so I've changed it. I think this is not on the hot path so if you want we can close the PR and issue with it. ## What changes are included in this PR? Instead of using write! format string write hex with using constant char mapping ## Are these changes tested? Runned debug display tests: ``` running 1 test test scalar::tests::test_binary_display ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 364 filtered out; finished in 0.00s ``` ## Are there any user-facing changes? No

## Which issue does this PR close?  - Closes apache#16281 ## Rationale for this change The sqllogictest for the substrait was failing for subquery. ``` query failed: DataFusion error: This feature is not implemented: Cannot convert <subquery> to Substrait ```  ## What changes are included in this PR? - added support for `ScalarSubquery` and `Exists` expressions in the Substrait producer.  ## Are these changes tested? Yes  ## Are there any user-facing changes?

…er proves zero selectivity (apache#20743) ## Which issue does this PR close?  - Closes apache#20742 ## Rationale for this change - see apache#20742 ## What changes are included in this PR? In `collect_new_statistics`, when a filter proves no rows can match, use a typed null (e.g., ScalarValue::Int32(None)) instead of untyped ScalarValue::Null for column min/max/sum values. The column's data type is looked up from the schema so that downstream interval analysis can still intersect intervals of the same type. ## Are these changes tested? add one test case ## Are there any user-facing changes?

…verhead (apache#20623) ## Which issue does this PR close? - Closes apache#20622. ## Rationale for this change Several array set operations (e.g., `array_distinct`, `array_union`, `array_intersect`, `array_except`) share a similar structure: * Convert the input(s) using `RowConverter`, ideally in bulk * Apply the set operation as appropriate, which involves adding or removing elements from the candidate set of result `Rows` * Convert the final set of `Rows` back into `ArrayRef` We can do better for the final step: instead of converting from `Rows` back into `ArrayRef`, we can just track which indices in the input(s) correspond to the values we want to return. We can then grab those values with a single `take`, which avoids the `Row` -> `ArrayRef` deserialization overhead. This is a 5-20% performance win, depending on the set operation and the characteristics of the input. The only wrinkle is that for `intersect` and `union`, because there are multiple inputs we need to concatenate the inputs together so that we have a single index space. It turns out that this optimization is a win, even incurring the `concat` overhead. ## What changes are included in this PR? * Add a benchmark for `array_except` * Implement this optimization for `array_distinct`, `array_union`, `array_intersect`, `array_except` ## Are these changes tested? Yes, and benchmarked. ## Are there any user-facing changes? No.

…che#20842) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.68.16 to 2.68.25. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/taiki-e/install-action/releases">taiki-e/install-action's releases</a>.</em></p> <blockquote> <h2>2.68.25</h2> <ul> <li> <p>Update <code>zizmor@latest</code> to 1.23.1.</p> </li> <li> <p>Update <code>tombi@latest</code> to 0.9.4.</p> </li> <li> <p>Update <code>cargo-semver-checks@latest</code> to 0.47.0.</p> </li> </ul> <h2>2.68.24</h2> <ul> <li>Avoid triggering <a href="https://docs.zizmor.sh/audits/#ref-confusion">zizmor ref-confusion</a> when using this action in form of <code>uses: taiki-e/install-action@v2</code> or <code>uses: taiki-e/install-action@<tool_name></code>.</li> </ul> <h2>2.68.23</h2> <ul> <li> <p>Update <code>zizmor@latest</code> to 1.23.0.</p> </li> <li> <p>Update <code>tombi@latest</code> to 0.9.3.</p> </li> <li> <p>Update <code>mise@latest</code> to 2026.3.5.</p> </li> </ul> <h2>2.68.22</h2> <ul> <li> <p>Update <code>release-plz@latest</code> to 0.3.157.</p> </li> <li> <p>Update <code>cargo-binstall@latest</code> to 1.17.7.</p> </li> <li> <p>Update <code>mise@latest</code> to 2026.3.4.</p> </li> </ul> <h2>2.68.21</h2> <ul> <li> <p>Update <code>tombi@latest</code> to 0.9.2.</p> </li> <li> <p>Update <code>uv@latest</code> to 0.10.9.</p> </li> <li> <p>Update <code>rclone@latest</code> to 1.73.2.</p> </li> <li> <p>Update <code>cargo-sort@latest</code> to 2.1.1.</p> </li> </ul> <h2>2.68.20</h2> <ul> <li> <p>Update <code>tombi@latest</code> to 0.9.1.</p> </li> <li> <p>Update <code>cargo-neat@latest</code> to 0.3.2.</p> </li> </ul> <h2>2.68.19</h2> <ul> <li> <p>Update <code>mise@latest</code> to 2026.3.3.</p> </li> <li> <p>Update <code>cargo-auditable@latest</code> to 0.7.4.</p> </li> <li> <p>Update <code>cargo-sort@latest</code> to 2.1.0.</p> </li> </ul> <h2>2.68.18</h2> <ul> <li> <p>Update <code>uv@latest</code> to 0.10.8.</p> </li> <li> <p>Update <code>grcov@latest</code> to 0.10.7.</p> </li> </ul>  </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md">taiki-e/install-action's changelog</a>.</em></p> <blockquote> <h1>Changelog</h1> <p>All notable changes to this project will be documented in this file.</p> <p>This project adheres to <a href="https://semver.org">Semantic Versioning</a>.</p>  <h2>[Unreleased]</h2> <h2>[2.68.25] - 2026-03-08</h2> <ul> <li> <p>Update <code>zizmor@latest</code> to 1.23.1.</p> </li> <li> <p>Update <code>tombi@latest</code> to 0.9.4.</p> </li> <li> <p>Update <code>cargo-semver-checks@latest</code> to 0.47.0.</p> </li> </ul> <h2>[2.68.24] - 2026-03-08</h2> <ul> <li>Avoid triggering <a href="https://docs.zizmor.sh/audits/#ref-confusion">zizmor ref-confusion</a> when using this action in form of <code>uses: taiki-e/install-action@v2</code> or <code>uses: taiki-e/install-action@<tool_name></code>.</li> </ul> <h2>[2.68.23] - 2026-03-08</h2> <ul> <li> <p>Update <code>zizmor@latest</code> to 1.23.0.</p> </li> <li> <p>Update <code>tombi@latest</code> to 0.9.3.</p> </li> <li> <p>Update <code>mise@latest</code> to 2026.3.5.</p> </li> </ul> <h2>[2.68.22] - 2026-03-07</h2> <ul> <li> <p>Update <code>release-plz@latest</code> to 0.3.157.</p> </li> <li> <p>Update <code>cargo-binstall@latest</code> to 1.17.7.</p> </li> <li> <p>Update <code>mise@latest</code> to 2026.3.4.</p> </li> </ul> <h2>[2.68.21] - 2026-03-07</h2> <ul> <li> <p>Update <code>tombi@latest</code> to 0.9.2.</p> </li> <li> <p>Update <code>uv@latest</code> to 0.10.9.</p> </li> <li> <p>Update <code>rclone@latest</code> to 1.73.2.</p> </li> <li> <p>Update <code>cargo-sort@latest</code> to 2.1.1.</p> </li> </ul>  </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/taiki-e/install-action/commit/a37010ded18ff788be4440302bd6830b1ae50d8b"><code>a37010d</code></a> Release 2.68.25</li> <li><a href="https://github.com/taiki-e/install-action/commit/ffc2b1c2fffc4b45c7ec734311b1717dc5b6c320"><code>ffc2b1c</code></a> Update <code>zizmor@latest</code> to 1.23.1</li> <li><a href="https://github.com/taiki-e/install-action/commit/8f3b52a0c24fe848f8b36e972306155b0f668242"><code>8f3b52a</code></a> Update <code>tombi@latest</code> to 0.9.4</li> <li><a href="https://github.com/taiki-e/install-action/commit/df9c07a392aaa960ce05346ab056a01a0d3b4dd0"><code>df9c07a</code></a> Update <code>cargo-semver-checks@latest</code> to 0.47.0</li> <li><a href="https://github.com/taiki-e/install-action/commit/3c19ebdd96e392c121ba23d56d739b7c23e79dc1"><code>3c19ebd</code></a> zizmor: Enable ref-confusion</li> <li><a href="https://github.com/taiki-e/install-action/commit/b18b9d93a43496aeda12369e7563d9251abc2fe1"><code>b18b9d9</code></a> Release 2.68.24</li> <li><a href="https://github.com/taiki-e/install-action/commit/5ccf6295e62df96d2686cb3c579301c6d3da6a89"><code>5ccf629</code></a> codegen: Avoid allocation in workspace_root()</li> <li><a href="https://github.com/taiki-e/install-action/commit/93ea0b33c357ab5e56584967e551b351d558ff99"><code>93ea0b3</code></a> Avoid triggering zizmor ref-confusion</li> <li><a href="https://github.com/taiki-e/install-action/commit/7c8485f1068cb2274a4b805d3d8ec77237d4fdf3"><code>7c8485f</code></a> Update script and CI config</li> <li><a href="https://github.com/taiki-e/install-action/commit/fc2a2b349fea94690f6a04dcec522c55f51fe2fd"><code>fc2a2b3</code></a> Release 2.68.23</li> <li>Additional commits viewable in <a href="https://github.com/taiki-e/install-action/compare/d6e286fa45544157a02d45a43742857ebbc25d12...a37010ded18ff788be4440302bd6830b1ae50d8b">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=taiki-e/install-action&package-manager=github_actions&previous-version=2.68.16&new-version=2.68.25)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…20843) Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.32.5 to 4.32.6. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/github/codeql-action/releases">github/codeql-action's releases</a>.</em></p> <blockquote> <h2>v4.32.6</h2> <ul> <li>Update default CodeQL bundle version to <a href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.24.3">2.24.3</a>. <a href="https://redirect.github.com/github/codeql-action/pull/3548">#3548</a></li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/github/codeql-action/blob/main/CHANGELOG.md">github/codeql-action's changelog</a>.</em></p> <blockquote> <h1>CodeQL Action Changelog</h1> <p>See the <a href="https://github.com/github/codeql-action/releases">releases page</a> for the relevant changes to the CodeQL CLI and language packs.</p> <h2>[UNRELEASED]</h2> <ul> <li>Fixed <a href="https://redirect.github.com/github/codeql-action/issues/3555">a bug</a> which caused the CodeQL Action to fail loading repository properties if a "Multi select" repository property was configured for the repository. <a href="https://redirect.github.com/github/codeql-action/pull/3557">#3557</a></li> <li>The CodeQL Action now loads <a href="https://docs.github.com/en/organizations/managing-organization-settings/managing-custom-properties-for-repositories-in-your-organization">custom repository properties</a> on GitHub Enterprise Server, enabling the customization of features such as <code>github-codeql-disable-overlay</code> that was previously only available on GitHub.com. <a href="https://redirect.github.com/github/codeql-action/pull/3559">#3559</a></li> </ul> <h2>4.32.6 - 05 Mar 2026</h2> <ul> <li>Update default CodeQL bundle version to <a href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.24.3">2.24.3</a>. <a href="https://redirect.github.com/github/codeql-action/pull/3548">#3548</a></li> </ul> <h2>4.32.5 - 02 Mar 2026</h2> <ul> <li>Repositories owned by an organization can now set up the <code>github-codeql-disable-overlay</code> custom repository property to disable <a href="https://redirect.github.com/github/roadmap/issues/1158">improved incremental analysis for CodeQL</a>. First, create a custom repository property with the name <code>github-codeql-disable-overlay</code> and the type "True/false" in the organization's settings. Then in the repository's settings, set this property to <code>true</code> to disable improved incremental analysis. For more information, see <a href="https://docs.github.com/en/organizations/managing-organization-settings/managing-custom-properties-for-repositories-in-your-organization">Managing custom properties for repositories in your organization</a>. This feature is not yet available on GitHub Enterprise Server. <a href="https://redirect.github.com/github/codeql-action/pull/3507">#3507</a></li> <li>Added an experimental change so that when <a href="https://redirect.github.com/github/roadmap/issues/1158">improved incremental analysis</a> fails on a runner — potentially due to insufficient disk space — the failure is recorded in the Actions cache so that subsequent runs will automatically skip improved incremental analysis until something changes (e.g. a larger runner is provisioned or a new CodeQL version is released). We expect to roll this change out to everyone in March. <a href="https://redirect.github.com/github/codeql-action/pull/3487">#3487</a></li> <li>The minimum memory check for improved incremental analysis is now skipped for CodeQL 2.24.3 and later, which has reduced peak RAM usage. <a href="https://redirect.github.com/github/codeql-action/pull/3515">#3515</a></li> <li>Reduced log levels for best-effort private package registry connection check failures to reduce noise from workflow annotations. <a href="https://redirect.github.com/github/codeql-action/pull/3516">#3516</a></li> <li>Added an experimental change which lowers the minimum disk space requirement for <a href="https://redirect.github.com/github/roadmap/issues/1158">improved incremental analysis</a>, enabling it to run on standard GitHub Actions runners. We expect to roll this change out to everyone in March. <a href="https://redirect.github.com/github/codeql-action/pull/3498">#3498</a></li> <li>Added an experimental change which allows the <code>start-proxy</code> action to resolve the CodeQL CLI version from feature flags instead of using the linked CLI bundle version. We expect to roll this change out to everyone in March. <a href="https://redirect.github.com/github/codeql-action/pull/3512">#3512</a></li> <li>The previously experimental changes from versions 4.32.3, 4.32.4, 3.32.3 and 3.32.4 are now enabled by default. <a href="https://redirect.github.com/github/codeql-action/pull/3503">#3503</a>, <a href="https://redirect.github.com/github/codeql-action/pull/3504">#3504</a></li> </ul> <h2>4.32.4 - 20 Feb 2026</h2> <ul> <li>Update default CodeQL bundle version to <a href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.24.2">2.24.2</a>. <a href="https://redirect.github.com/github/codeql-action/pull/3493">#3493</a></li> <li>Added an experimental change which improves how certificates are generated for the authentication proxy that is used by the CodeQL Action in Default Setup when <a href="https://docs.github.com/en/code-security/how-tos/secure-at-scale/configure-organization-security/manage-usage-and-access/giving-org-access-private-registries">private package registries are configured</a>. This is expected to generate more widely compatible certificates and should have no impact on analyses which are working correctly already. We expect to roll this change out to everyone in February. <a href="https://redirect.github.com/github/codeql-action/pull/3473">#3473</a></li> <li>When the CodeQL Action is run <a href="https://docs.github.com/en/code-security/how-tos/scan-code-for-vulnerabilities/troubleshooting/troubleshooting-analysis-errors/logs-not-detailed-enough#creating-codeql-debugging-artifacts-for-codeql-default-setup">with debugging enabled in Default Setup</a> and <a href="https://docs.github.com/en/code-security/how-tos/secure-at-scale/configure-organization-security/manage-usage-and-access/giving-org-access-private-registries">private package registries are configured</a>, the "Setup proxy for registries" step will output additional diagnostic information that can be used for troubleshooting. <a href="https://redirect.github.com/github/codeql-action/pull/3486">#3486</a></li> <li>Added a setting which allows the CodeQL Action to enable network debugging for Java programs. This will help GitHub staff support customers with troubleshooting issues in GitHub-managed CodeQL workflows, such as Default Setup. This setting can only be enabled by GitHub staff. <a href="https://redirect.github.com/github/codeql-action/pull/3485">#3485</a></li> <li>Added a setting which enables GitHub-managed workflows, such as Default Setup, to use a <a href="https://github.com/dsp-testing/codeql-cli-nightlies">nightly CodeQL CLI release</a> instead of the latest, stable release that is used by default. This will help GitHub staff support customers whose analyses for a given repository or organization require early access to a change in an upcoming CodeQL CLI release. This setting can only be enabled by GitHub staff. <a href="https://redirect.github.com/github/codeql-action/pull/3484">#3484</a></li> </ul> <h2>4.32.3 - 13 Feb 2026</h2> <ul> <li>Added experimental support for testing connections to <a href="https://docs.github.com/en/code-security/how-tos/secure-at-scale/configure-organization-security/manage-usage-and-access/giving-org-access-private-registries">private package registries</a>. This feature is not currently enabled for any analysis. In the future, it may be enabled by default for Default Setup. <a href="https://redirect.github.com/github/codeql-action/pull/3466">#3466</a></li> </ul> <h2>4.32.2 - 05 Feb 2026</h2> <ul> <li>Update default CodeQL bundle version to <a href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.24.1">2.24.1</a>. <a href="https://redirect.github.com/github/codeql-action/pull/3460">#3460</a></li> </ul> <h2>4.32.1 - 02 Feb 2026</h2> <ul> <li>A warning is now shown in Default Setup workflow logs if a <a href="https://docs.github.com/en/code-security/how-tos/secure-at-scale/configure-organization-security/manage-usage-and-access/giving-org-access-private-registries">private package registry is configured</a> using a GitHub Personal Access Token (PAT), but no username is configured. <a href="https://redirect.github.com/github/codeql-action/pull/3422">#3422</a></li> <li>Fixed a bug which caused the CodeQL Action to fail when repository properties cannot successfully be retrieved. <a href="https://redirect.github.com/github/codeql-action/pull/3421">#3421</a></li> </ul> <h2>4.32.0 - 26 Jan 2026</h2> <ul> <li>Update default CodeQL bundle version to <a href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.24.0">2.24.0</a>. <a href="https://redirect.github.com/github/codeql-action/pull/3425">#3425</a></li> </ul> <h2>4.31.11 - 23 Jan 2026</h2>  </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/github/codeql-action/commit/0d579ffd059c29b07949a3cce3983f0780820c98"><code>0d579ff</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3551">#3551</a> from github/update-v4.32.6-72d2d850d</li> <li><a href="https://github.com/github/codeql-action/commit/d4c6be7cf1c47a33a06fa9183269e133e6863574"><code>d4c6be7</code></a> Update changelog for v4.32.6</li> <li><a href="https://github.com/github/codeql-action/commit/72d2d850d1f91d4e1e024f4cf4276fd16bb68462"><code>72d2d85</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3548">#3548</a> from github/update-bundle/codeql-bundle-v2.24.3</li> <li><a href="https://github.com/github/codeql-action/commit/23f983ce00d9a853697a6aaa9eae8d5abbf14849"><code>23f983c</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3544">#3544</a> from github/dependabot/github_actions/dot-github/wor...</li> <li><a href="https://github.com/github/codeql-action/commit/832e97ccad228ef72e06ffee26f6251bceeb7e5f"><code>832e97c</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3545">#3545</a> from github/dependabot/github_actions/dot-github/wor...</li> <li><a href="https://github.com/github/codeql-action/commit/5ef38c0b13c2f0f5ce928cb7706f5fb19fc97ae2"><code>5ef38c0</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3546">#3546</a> from github/dependabot/npm_and_yarn/tar-7.5.10</li> <li><a href="https://github.com/github/codeql-action/commit/80c9cda73902bba67939606c4bf3a1d9606bb150"><code>80c9cda</code></a> Add changelog note</li> <li><a href="https://github.com/github/codeql-action/commit/f2669dd916c673b2811839169929a8ba71bb7634"><code>f2669dd</code></a> Update default bundle to codeql-bundle-v2.24.3</li> <li><a href="https://github.com/github/codeql-action/commit/bd03c44cf40965f5476f66fad404194e4cb35710"><code>bd03c44</code></a> Merge branch 'main' into dependabot/github_actions/dot-github/workflows/actio...</li> <li><a href="https://github.com/github/codeql-action/commit/102d7627b63c066871badf0743c11b2f6dd9c9e9"><code>102d762</code></a> Bump tar from 7.5.7 to 7.5.10</li> <li>Additional commits viewable in <a href="https://github.com/github/codeql-action/compare/c793b717bc78562f491db7b0e93a3a178b099162...0d579ffd059c29b07949a3cce3983f0780820c98">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github/codeql-action&package-manager=github_actions&previous-version=4.32.5&new-version=4.32.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

## Which issue does this PR close?  - Closes #. ## Rationale for this change ``` Crate: paste Version: 1.0.15 Warning: unmaintained Title: paste - no longer maintained Date: 2024-10-07 ID: RUSTSEC-2024-0436 ``` We also need to remove `paste` from the project  ## What changes are included in this PR?  ## Are these changes tested?  ## Are there any user-facing changes?

…he#20231) ## Which issue does this PR close? Closes apache#20194 ## Rationale for this change A query with `ROW_NUMBER() OVER (... ORDER BY CASE WHEN col='0' THEN 1 ELSE 0 END)` combined with a filter `nvl(t2.value_2_3,'0')='0'` fails with a `SanityCheckPlan` error. This worked in 50.3.0 but broke in 52.1.0. ## What changes are included in this PR? **Root cause**: `collect_columns_from_predicate_inner` was extracting equality pairs where neither side was a `Column` (e.g. `nvl(col, '0') = '0'`), creating equivalence classes between complex expressions and literals. `normalize_expr`'s deep traversal would then replace the literal `'0'` inside unrelated sort/window CASE WHEN expressions with the complex NVL expression, corrupting the sort ordering and causing a mismatch between `SortExec`'s reported output ordering and `BoundedWindowAggExec`'s expected ordering. **Fix** (two changes in `filter.rs`): 1. **`collect_columns_from_predicate_inner`**: Only extract equality pairs where at least one side is a `Column` reference. This matches the function's documented intent ("Column-Pairs") and prevents complex-expression-to-literal equivalence classes from being created. 2. **`extend_constants`**: Recognize `Literal` expressions as inherently constant (previously only checked `is_expr_constant` on the input's equivalence properties, which doesn't know about literals). This ensures constant propagation still works for `complex_expr = literal` predicates — e.g. `nvl(col, '0')` is properly marked as constant after the filter. ## How was this tested? - Unit test `test_collect_columns_skips_non_column_pairs` verifying the filtering logic - Sqllogictest reproducing the exact query from the issue - Full test suites: equivalence tests (51 passed), physical-plan tests (1255 passed), physical-optimizer tests (20 passed) - Manual verification with datafusion-cli running the reproduction query ## Test plan - [x] Unit test for `collect_columns_from_predicate_inner` column filtering - [x] Sqllogictest regression test for apache#20194 - [x] Existing test suites pass - [x] Manual reproduction query succeeds --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Bumps [quinn-proto](https://github.com/quinn-rs/quinn) from 0.11.13 to 0.11.14. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/quinn-rs/quinn/releases">quinn-proto's releases</a>.</em></p> <blockquote> <h2>quinn-proto 0.11.14</h2> <p><a href="https://github.com/jxs"><code>@jxs</code></a> reported a denial of service issue in quinn-proto 5 days ago:</p> <ul> <li><a href="https://github.com/quinn-rs/quinn/security/advisories/GHSA-6xvm-j4wr-6v98">https://github.com/quinn-rs/quinn/security/advisories/GHSA-6xvm-j4wr-6v98</a></li> </ul> <p>We coordinated with them to release this version to patch the issue. Unfortunately the maintainers missed these issues during code review and we did not have enough fuzzing coverage -- we regret the oversight and have added an additional fuzzing target.</p> <p>Organizations that want to participate in coordinated disclosure can contact us privately to discuss terms.</p> <h2>What's Changed</h2> <ul> <li>Fix over-permissive proto dependency edge by <a href="https://github.com/Ralith"><code>@Ralith</code></a> in <a href="https://redirect.github.com/quinn-rs/quinn/pull/2385">quinn-rs/quinn#2385</a></li> <li>0.11.x: avoid unwrapping VarInt decoding during parameter parsing by <a href="https://github.com/djc"><code>@djc</code></a> in <a href="https://redirect.github.com/quinn-rs/quinn/pull/2559">quinn-rs/quinn#2559</a></li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/quinn-rs/quinn/commit/2c315aa7f9c2a6c1db87f8f51f40623a427c78fd"><code>2c315aa</code></a> proto: bump version to 0.11.14</li> <li><a href="https://github.com/quinn-rs/quinn/commit/8ad47f431e7deb82c08b09c2e33ef85aa88fd212"><code>8ad47f4</code></a> Use newer rustls-pki-types PEM parser API</li> <li><a href="https://github.com/quinn-rs/quinn/commit/c81c0289abe30d8437ccbf9b6304e2bc9c707cea"><code>c81c028</code></a> ci: fix workflow syntax</li> <li><a href="https://github.com/quinn-rs/quinn/commit/0050172969f7e69e136c433181330da7790d8d73"><code>0050172</code></a> ci: pin wasm-bindgen-cli version</li> <li><a href="https://github.com/quinn-rs/quinn/commit/8a6f82c58d1c565eab78f986e614223e6ed76a85"><code>8a6f82c</code></a> Take semver-compatible dependency updates</li> <li><a href="https://github.com/quinn-rs/quinn/commit/e52db4ad8df0f9720e7b0e32ecc0e48c9a93de0f"><code>e52db4a</code></a> Apply suggestions from clippy 1.91</li> <li><a href="https://github.com/quinn-rs/quinn/commit/6df7275c582ca9b7225e0ccf9f9871a55eb73155"><code>6df7275</code></a> chore: Fix <code>unnecessary_unwrap</code> clippy</li> <li><a href="https://github.com/quinn-rs/quinn/commit/c8eefa07e087b06d8f2b78ff262ce8ac952994f1"><code>c8eefa0</code></a> proto: avoid unwrapping varint decoding during parameters parsing</li> <li><a href="https://github.com/quinn-rs/quinn/commit/9723a977754c8662001b0fef97aab8f3ddf1df92"><code>9723a97</code></a> fuzz: add fuzzing target for parsing transport parameters</li> <li><a href="https://github.com/quinn-rs/quinn/commit/eaf0ef30252cef4acec21f150427e604cd4271c9"><code>eaf0ef3</code></a> Fix over-permissive proto dependency edge (<a href="https://redirect.github.com/quinn-rs/quinn/issues/2385">#2385</a>)</li> <li>Additional commits viewable in <a href="https://github.com/quinn-rs/quinn/compare/quinn-proto-0.11.13...quinn-proto-0.11.14">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=quinn-proto&package-manager=cargo&previous-version=0.11.13&new-version=0.11.14)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/apache/datafusion/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

## Which issue does this PR close? - Closes apache#20841 ## Rationale for this change We want to split IO and CPU to allow for more (NUMA-aware) parallelism and utilizing IO and CPU better. This allows for e.g. more coalescing, prefetching, parallel IO, more parallel / incremental decoding etc. Also this allows doing morsels only on a CPU level and not doing IO multiple times for each morsel. ## What changes are included in this PR? Just refactor `ParquetOpener` to use `ParquetPushDecoder`. I used claude to rewrite it and to keep changes small. ## Are these changes tested? Existing tests. Nothing should change, the arrow-rs code also uses `ParquetPushDecoder`. ## Are there any user-facing changes? --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…ache#20780) ## Which issue does this PR close? - Closes #. ## Rationale for this change ClickBench quueries (Q7, Q15, Q16, Q18) have some redundant projections for sorting based on count. Probably not a (measurable) improvement, but the plan looks better (in case of non-TopK it could probably be measurable). ## What changes are included in this PR? ## Are these changes tested? Existing tests. ## Are there any user-facing changes? --------- Co-authored-by: Claude <noreply@anthropic.com>

## Which issue does this PR close?  - Closes #. ## Rationale for this change  Besides pushing `LimitExec` down the query plan, there is another optimization that allows plan nodes to *absorb* a limit, so it can potentially stop early. I’ve noticed that this form of limit absorption has not been implemented by many operators. This suggests the optimization is non-obvious, so I’d like to improve the documentation for it. A recent PR that implements this optimization is: - apache#20228 ## What changes are included in this PR?  ## Are these changes tested?  ## Are there any user-facing changes?

## Which issue does this PR close?  - Closes apache#20797 ## Rationale for this change - see apache#20797 ## What changes are included in this PR? impl ser/de for preserve_order in RepartitionExec ## Are these changes tested? add one test case ## Are there any user-facing changes?

…ache#20627) ## Summary - include synchronous `start_next_file()` / `FileOpener::open()` setup time in `time_elapsed_scanning_total` - keep existing `time_opening` and scanning timers lifecycle intact - avoid timer overlap by scoping the temporary timer before calling `time_scanning_total.start()` ## Details In `FileStreamState::Open`, `start_next_file()` is invoked before `time_scanning_total.start()`. If `open()` performs synchronous work before returning the future, that time was previously unaccounted for in `time_elapsed_scanning_total`. This change wraps the `start_next_file()` call in a scoped timer on the same `time_scanning_total` metric so the missing segment is recorded. - Fixes apache#20571 ## Validation I tested by reading CSV files via AWS S3. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

## Which issue does this PR close?  - Closes #. ``` Crate: generational-arena Version: 0.2.9 Warning: unmaintained Title: `generational-arena` is unmaintained Date: 2024-02-11 ID: RUSTSEC-2024-0014 URL: https://rustsec.org/advisories/RUSTSEC-2024-0014 ``` ## Rationale for this change  ## What changes are included in this PR?  ## Are these changes tested?  ## Are there any user-facing changes?

## Which issue does this PR close?  - Closes #. ## Rationale for this change Move dependencies from main area to dev  ## What changes are included in this PR?  ## Are these changes tested?  ## Are there any user-facing changes?

…oder Resolve merge conflicts in opener.rs, clickbench.slt, and projection_pushdown.slt. Adapt the morsel-driven bloom filter pruning in open() to use a separate ParquetRecordBatchStreamBuilder (as upstream now does) since prune_by_bloom_filters requires that type, not the new ParquetPushDecoderBuilder. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This reverts commit 4fc8edd.

google-labs-jules bot and others added 30 commits February 22, 2026 13:12

Proto

a67f9ac

Proto

d0da5da

Fmt

32eec3c

Merge remote-tracking branch 'upstream/main' into parquet-morsel-driv…

5dc895c

…en-execution-237164415184908839

Proto

cc73788

Fix

d517b5d

Fix

de1606d

Clippy

950f6db

Refactor

7f57317

WIP

fd6d7fd

WIP

37126bf

WIP

2d3c33e

WIP

98f0ea9

WIP

a389b02

Update

4065448

Update

415315d

Config

13b4977

Test

a30c3f8

Refactor

8b32ca8

Update test

876c296

Update test

d2df36b

Autofix

869b7d3

Prune files

67ea9ab

Update test

e845675

Update test

6885981

Update morsel_driven

3384b8f

Update morsel_driven

211d4fc

fmt

2db61f1

move pruning

c859d6a

timsaucer and others added 21 commits March 10, 2026 11:34

github-actions bot added functions development-process sql logical-expr ffi optimizer spark labels Mar 12, 2026

Dandandan added 2 commits March 12, 2026 17:57

Add the exp. prefetching

4fc8edd

Revert "Add the exp. prefetching"

b02a9c0

This reverts commit 4fc8edd.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(test benchmark runnner) morsel driven execution#6

(test benchmark runnner) morsel driven execution#6
adriangb wants to merge 146 commits intoadriangb:mainfrom
Dandandan:parquet-morsel-driven-execution-237164415184908839

adriangb commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

adriangb commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants