(test benchmark runnner) morsel driven execution#6
Open
adriangb wants to merge 146 commits intoadriangb:mainfrom
Open
(test benchmark runnner) morsel driven execution#6adriangb wants to merge 146 commits intoadriangb:mainfrom
adriangb wants to merge 146 commits intoadriangb:mainfrom
Conversation
This PR implements morsel-driven execution for Parquet files in DataFusion, enabling row-group level work sharing across partitions to mitigate data skew. Key changes: - Introduced `WorkQueue` in `datafusion/datasource/src/file_stream.rs` for shared pool of work. - Added `morselize` method to `FileOpener` trait to allow dynamic splitting of files into morsels. - Implemented `morselize` for `ParquetOpener` to split files into individual row groups. - Cached `ParquetMetaData` in `ParquetMorsel` extensions to avoid redundant I/O. - Modified `FileStream` to support work stealing from the shared queue. - Implemented `Weak` pointer pattern for `WorkQueue` in `FileScanConfig` to support plan re-executability. - Added `MorselizingGuard` to ensure shared state consistency on cancellation. - Added `allow_morsel_driven` configuration option (enabled by default for Parquet). - Implemented row-group pruning during the morselization phase for better efficiency. Tests: - Added `parquet_morsel_driven_execution` test to verify work distribution and re-executability. - Added `parquet_morsel_driven_enabled_by_default` to verify the default configuration. Co-authored-by: Dandandan <163737+Dandandan@users.noreply.github.com>
…en-execution-237164415184908839
## Which issue does this PR close? - Closes apache#17035 ## Rationale for this change Now that we have proper `FFI_ConfigOptions` we can pass these to scalar UDFs via FFI. ## What changes are included in this PR? Instead of passing default options, pass in converted config options from the input. Also did a drive by cleanup of switching to using FFI_ColumnarValue since it is now available. ## Are these changes tested? Unit test added. ## Are there any user-facing changes? This is a breaking API change, but not one that users will interact with directly. It breaks the ABI for FFI libraries, which is currently unstable.
…cking in instrumentedObjectStore (apache#20802) ## Which issue does this PR close? Related to apache#18138 but does not close any issue. ## Rationale for this change TimeToFirstItemStream held an Arc<Mutex<Vec<RequestDetails>>> and a request_index to write back the duration into the shared request list. I saw @alamb and @BlakeOrth's reviews on the PR apache#19127 about the improvements and wanted to change. ## What changes are included in this PR? - Replace Arc<Mutex<Vec<RequestDetails>>> + index in TimeToFirstItemStream with a per-request Arc<AtomicU64> - Store duration as nanoseconds in AtomicU64 (0 = not yet set) with Release/Acquire ordering - Start the timer lazily on the first poll_next call instead of at stream creation, so only actual storage latency is measured ## Are these changes tested? Existing tests and I've also added time comparison ## Are there any user-facing changes? No
## Which issue does this PR close? - Part of apache#20585 <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> ## Rationale for this change String UDFs should preserve string representation where feasible. `translate` previously accepted Utf8View input but emitted Utf8, causing an unnecessary type downgrade. This aligns `translate` with the expected behavior of returning the same string type as its primary input. <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? 1. Updated `translate` return type inference to emit Utf8View when input is Utf8View, while preserving existing behavior for Utf8 and LargeUtf8. 2. Refactored `translate` and `translate_with_map` to use explicit string builders (via a local `TranslateOutput` helper trait) instead of `.collect::<GenericStringArray<T>>()`, so the correct output array type is produced for each input type. 3. Added unit tests for Utf8View input (basic, null, non-ASCII) and sqllogictests verifying `arrow_typeof` output for all three string types. <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? Yes. Unit tests and sqllogictests are included. <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? No. <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
…pache#20822) ## Which issue does this PR close? - Related to apache#20603 ## Rationale for this change This PR enables Parquet row-level filter pushdown for struct field access expressions, which previously fell back to a full scan followed by a separate filtering pass, a significant perf penalty for queries filtering on struct fields in large Parquet files (like Variant types!) Filters on struct fields like `WHERE s['foo'] > 67` were not being pushed into the Parquet decoder. This is because `PushdownChecker` sees the underlying `Column("s")` has a `Struct` type and unconditionally rejects it, without considering that `get_field` resolves to a primitive leaf. With this change, deeply nested access like `s['outer']['inner']` will also get pushed down because the logical simplifier flattens it before it reaches the physical plan Note: this does not address the projection side and should not be blocked by it. `SELECT s['foo']` still reads the entire struct rather than just the needed leaf column. That requires separate changes to how the opener builds its projection mask.
…bytes (apache#20719) ## Which issue does this PR close? - Closes apache#19569. ## Rationale for this change This was the latest usage as far as I can see so I've changed it. I think this is not on the hot path so if you want we can close the PR and issue with it. ## What changes are included in this PR? Instead of using write! format string write hex with using constant char mapping ## Are these changes tested? Runned debug display tests: ``` running 1 test test scalar::tests::test_binary_display ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 364 filtered out; finished in 0.00s ``` ## Are there any user-facing changes? No
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#16281 ## Rationale for this change The sqllogictest for the substrait was failing for subquery. ``` query failed: DataFusion error: This feature is not implemented: Cannot convert <subquery> to Substrait ``` <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? - added support for `ScalarSubquery` and `Exists` expressions in the Substrait producer. <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? Yes <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
…er proves zero selectivity (apache#20743) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#20742 ## Rationale for this change - see apache#20742 ## What changes are included in this PR? In `collect_new_statistics`, when a filter proves no rows can match, use a typed null (e.g., ScalarValue::Int32(None)) instead of untyped ScalarValue::Null for column min/max/sum values. The column's data type is looked up from the schema so that downstream interval analysis can still intersect intervals of the same type. ## Are these changes tested? add one test case ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
…verhead (apache#20623) ## Which issue does this PR close? - Closes apache#20622. ## Rationale for this change Several array set operations (e.g., `array_distinct`, `array_union`, `array_intersect`, `array_except`) share a similar structure: * Convert the input(s) using `RowConverter`, ideally in bulk * Apply the set operation as appropriate, which involves adding or removing elements from the candidate set of result `Rows` * Convert the final set of `Rows` back into `ArrayRef` We can do better for the final step: instead of converting from `Rows` back into `ArrayRef`, we can just track which indices in the input(s) correspond to the values we want to return. We can then grab those values with a single `take`, which avoids the `Row` -> `ArrayRef` deserialization overhead. This is a 5-20% performance win, depending on the set operation and the characteristics of the input. The only wrinkle is that for `intersect` and `union`, because there are multiple inputs we need to concatenate the inputs together so that we have a single index space. It turns out that this optimization is a win, even incurring the `concat` overhead. ## What changes are included in this PR? * Add a benchmark for `array_except` * Implement this optimization for `array_distinct`, `array_union`, `array_intersect`, `array_except` ## Are these changes tested? Yes, and benchmarked. ## Are there any user-facing changes? No.
…che#20842) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.68.16 to 2.68.25. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/taiki-e/install-action/releases">taiki-e/install-action's releases</a>.</em></p> <blockquote> <h2>2.68.25</h2> <ul> <li> <p>Update <code>zizmor@latest</code> to 1.23.1.</p> </li> <li> <p>Update <code>tombi@latest</code> to 0.9.4.</p> </li> <li> <p>Update <code>cargo-semver-checks@latest</code> to 0.47.0.</p> </li> </ul> <h2>2.68.24</h2> <ul> <li>Avoid triggering <a href="https://docs.zizmor.sh/audits/#ref-confusion">zizmor ref-confusion</a> when using this action in form of <code>uses: taiki-e/install-action@v2</code> or <code>uses: taiki-e/install-action@<tool_name></code>.</li> </ul> <h2>2.68.23</h2> <ul> <li> <p>Update <code>zizmor@latest</code> to 1.23.0.</p> </li> <li> <p>Update <code>tombi@latest</code> to 0.9.3.</p> </li> <li> <p>Update <code>mise@latest</code> to 2026.3.5.</p> </li> </ul> <h2>2.68.22</h2> <ul> <li> <p>Update <code>release-plz@latest</code> to 0.3.157.</p> </li> <li> <p>Update <code>cargo-binstall@latest</code> to 1.17.7.</p> </li> <li> <p>Update <code>mise@latest</code> to 2026.3.4.</p> </li> </ul> <h2>2.68.21</h2> <ul> <li> <p>Update <code>tombi@latest</code> to 0.9.2.</p> </li> <li> <p>Update <code>uv@latest</code> to 0.10.9.</p> </li> <li> <p>Update <code>rclone@latest</code> to 1.73.2.</p> </li> <li> <p>Update <code>cargo-sort@latest</code> to 2.1.1.</p> </li> </ul> <h2>2.68.20</h2> <ul> <li> <p>Update <code>tombi@latest</code> to 0.9.1.</p> </li> <li> <p>Update <code>cargo-neat@latest</code> to 0.3.2.</p> </li> </ul> <h2>2.68.19</h2> <ul> <li> <p>Update <code>mise@latest</code> to 2026.3.3.</p> </li> <li> <p>Update <code>cargo-auditable@latest</code> to 0.7.4.</p> </li> <li> <p>Update <code>cargo-sort@latest</code> to 2.1.0.</p> </li> </ul> <h2>2.68.18</h2> <ul> <li> <p>Update <code>uv@latest</code> to 0.10.8.</p> </li> <li> <p>Update <code>grcov@latest</code> to 0.10.7.</p> </li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md">taiki-e/install-action's changelog</a>.</em></p> <blockquote> <h1>Changelog</h1> <p>All notable changes to this project will be documented in this file.</p> <p>This project adheres to <a href="https://semver.org">Semantic Versioning</a>.</p> <!-- raw HTML omitted --> <h2>[Unreleased]</h2> <h2>[2.68.25] - 2026-03-08</h2> <ul> <li> <p>Update <code>zizmor@latest</code> to 1.23.1.</p> </li> <li> <p>Update <code>tombi@latest</code> to 0.9.4.</p> </li> <li> <p>Update <code>cargo-semver-checks@latest</code> to 0.47.0.</p> </li> </ul> <h2>[2.68.24] - 2026-03-08</h2> <ul> <li>Avoid triggering <a href="https://docs.zizmor.sh/audits/#ref-confusion">zizmor ref-confusion</a> when using this action in form of <code>uses: taiki-e/install-action@v2</code> or <code>uses: taiki-e/install-action@<tool_name></code>.</li> </ul> <h2>[2.68.23] - 2026-03-08</h2> <ul> <li> <p>Update <code>zizmor@latest</code> to 1.23.0.</p> </li> <li> <p>Update <code>tombi@latest</code> to 0.9.3.</p> </li> <li> <p>Update <code>mise@latest</code> to 2026.3.5.</p> </li> </ul> <h2>[2.68.22] - 2026-03-07</h2> <ul> <li> <p>Update <code>release-plz@latest</code> to 0.3.157.</p> </li> <li> <p>Update <code>cargo-binstall@latest</code> to 1.17.7.</p> </li> <li> <p>Update <code>mise@latest</code> to 2026.3.4.</p> </li> </ul> <h2>[2.68.21] - 2026-03-07</h2> <ul> <li> <p>Update <code>tombi@latest</code> to 0.9.2.</p> </li> <li> <p>Update <code>uv@latest</code> to 0.10.9.</p> </li> <li> <p>Update <code>rclone@latest</code> to 1.73.2.</p> </li> <li> <p>Update <code>cargo-sort@latest</code> to 2.1.1.</p> </li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/taiki-e/install-action/commit/a37010ded18ff788be4440302bd6830b1ae50d8b"><code>a37010d</code></a> Release 2.68.25</li> <li><a href="https://github.com/taiki-e/install-action/commit/ffc2b1c2fffc4b45c7ec734311b1717dc5b6c320"><code>ffc2b1c</code></a> Update <code>zizmor@latest</code> to 1.23.1</li> <li><a href="https://github.com/taiki-e/install-action/commit/8f3b52a0c24fe848f8b36e972306155b0f668242"><code>8f3b52a</code></a> Update <code>tombi@latest</code> to 0.9.4</li> <li><a href="https://github.com/taiki-e/install-action/commit/df9c07a392aaa960ce05346ab056a01a0d3b4dd0"><code>df9c07a</code></a> Update <code>cargo-semver-checks@latest</code> to 0.47.0</li> <li><a href="https://github.com/taiki-e/install-action/commit/3c19ebdd96e392c121ba23d56d739b7c23e79dc1"><code>3c19ebd</code></a> zizmor: Enable ref-confusion</li> <li><a href="https://github.com/taiki-e/install-action/commit/b18b9d93a43496aeda12369e7563d9251abc2fe1"><code>b18b9d9</code></a> Release 2.68.24</li> <li><a href="https://github.com/taiki-e/install-action/commit/5ccf6295e62df96d2686cb3c579301c6d3da6a89"><code>5ccf629</code></a> codegen: Avoid allocation in workspace_root()</li> <li><a href="https://github.com/taiki-e/install-action/commit/93ea0b33c357ab5e56584967e551b351d558ff99"><code>93ea0b3</code></a> Avoid triggering zizmor ref-confusion</li> <li><a href="https://github.com/taiki-e/install-action/commit/7c8485f1068cb2274a4b805d3d8ec77237d4fdf3"><code>7c8485f</code></a> Update script and CI config</li> <li><a href="https://github.com/taiki-e/install-action/commit/fc2a2b349fea94690f6a04dcec522c55f51fe2fd"><code>fc2a2b3</code></a> Release 2.68.23</li> <li>Additional commits viewable in <a href="https://github.com/taiki-e/install-action/compare/d6e286fa45544157a02d45a43742857ebbc25d12...a37010ded18ff788be4440302bd6830b1ae50d8b">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…20843) Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.32.5 to 4.32.6. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/github/codeql-action/releases">github/codeql-action's releases</a>.</em></p> <blockquote> <h2>v4.32.6</h2> <ul> <li>Update default CodeQL bundle version to <a href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.24.3">2.24.3</a>. <a href="https://redirect.github.com/github/codeql-action/pull/3548">#3548</a></li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/github/codeql-action/blob/main/CHANGELOG.md">github/codeql-action's changelog</a>.</em></p> <blockquote> <h1>CodeQL Action Changelog</h1> <p>See the <a href="https://github.com/github/codeql-action/releases">releases page</a> for the relevant changes to the CodeQL CLI and language packs.</p> <h2>[UNRELEASED]</h2> <ul> <li>Fixed <a href="https://redirect.github.com/github/codeql-action/issues/3555">a bug</a> which caused the CodeQL Action to fail loading repository properties if a "Multi select" repository property was configured for the repository. <a href="https://redirect.github.com/github/codeql-action/pull/3557">#3557</a></li> <li>The CodeQL Action now loads <a href="https://docs.github.com/en/organizations/managing-organization-settings/managing-custom-properties-for-repositories-in-your-organization">custom repository properties</a> on GitHub Enterprise Server, enabling the customization of features such as <code>github-codeql-disable-overlay</code> that was previously only available on GitHub.com. <a href="https://redirect.github.com/github/codeql-action/pull/3559">#3559</a></li> </ul> <h2>4.32.6 - 05 Mar 2026</h2> <ul> <li>Update default CodeQL bundle version to <a href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.24.3">2.24.3</a>. <a href="https://redirect.github.com/github/codeql-action/pull/3548">#3548</a></li> </ul> <h2>4.32.5 - 02 Mar 2026</h2> <ul> <li>Repositories owned by an organization can now set up the <code>github-codeql-disable-overlay</code> custom repository property to disable <a href="https://redirect.github.com/github/roadmap/issues/1158">improved incremental analysis for CodeQL</a>. First, create a custom repository property with the name <code>github-codeql-disable-overlay</code> and the type "True/false" in the organization's settings. Then in the repository's settings, set this property to <code>true</code> to disable improved incremental analysis. For more information, see <a href="https://docs.github.com/en/organizations/managing-organization-settings/managing-custom-properties-for-repositories-in-your-organization">Managing custom properties for repositories in your organization</a>. This feature is not yet available on GitHub Enterprise Server. <a href="https://redirect.github.com/github/codeql-action/pull/3507">#3507</a></li> <li>Added an experimental change so that when <a href="https://redirect.github.com/github/roadmap/issues/1158">improved incremental analysis</a> fails on a runner — potentially due to insufficient disk space — the failure is recorded in the Actions cache so that subsequent runs will automatically skip improved incremental analysis until something changes (e.g. a larger runner is provisioned or a new CodeQL version is released). We expect to roll this change out to everyone in March. <a href="https://redirect.github.com/github/codeql-action/pull/3487">#3487</a></li> <li>The minimum memory check for improved incremental analysis is now skipped for CodeQL 2.24.3 and later, which has reduced peak RAM usage. <a href="https://redirect.github.com/github/codeql-action/pull/3515">#3515</a></li> <li>Reduced log levels for best-effort private package registry connection check failures to reduce noise from workflow annotations. <a href="https://redirect.github.com/github/codeql-action/pull/3516">#3516</a></li> <li>Added an experimental change which lowers the minimum disk space requirement for <a href="https://redirect.github.com/github/roadmap/issues/1158">improved incremental analysis</a>, enabling it to run on standard GitHub Actions runners. We expect to roll this change out to everyone in March. <a href="https://redirect.github.com/github/codeql-action/pull/3498">#3498</a></li> <li>Added an experimental change which allows the <code>start-proxy</code> action to resolve the CodeQL CLI version from feature flags instead of using the linked CLI bundle version. We expect to roll this change out to everyone in March. <a href="https://redirect.github.com/github/codeql-action/pull/3512">#3512</a></li> <li>The previously experimental changes from versions 4.32.3, 4.32.4, 3.32.3 and 3.32.4 are now enabled by default. <a href="https://redirect.github.com/github/codeql-action/pull/3503">#3503</a>, <a href="https://redirect.github.com/github/codeql-action/pull/3504">#3504</a></li> </ul> <h2>4.32.4 - 20 Feb 2026</h2> <ul> <li>Update default CodeQL bundle version to <a href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.24.2">2.24.2</a>. <a href="https://redirect.github.com/github/codeql-action/pull/3493">#3493</a></li> <li>Added an experimental change which improves how certificates are generated for the authentication proxy that is used by the CodeQL Action in Default Setup when <a href="https://docs.github.com/en/code-security/how-tos/secure-at-scale/configure-organization-security/manage-usage-and-access/giving-org-access-private-registries">private package registries are configured</a>. This is expected to generate more widely compatible certificates and should have no impact on analyses which are working correctly already. We expect to roll this change out to everyone in February. <a href="https://redirect.github.com/github/codeql-action/pull/3473">#3473</a></li> <li>When the CodeQL Action is run <a href="https://docs.github.com/en/code-security/how-tos/scan-code-for-vulnerabilities/troubleshooting/troubleshooting-analysis-errors/logs-not-detailed-enough#creating-codeql-debugging-artifacts-for-codeql-default-setup">with debugging enabled in Default Setup</a> and <a href="https://docs.github.com/en/code-security/how-tos/secure-at-scale/configure-organization-security/manage-usage-and-access/giving-org-access-private-registries">private package registries are configured</a>, the "Setup proxy for registries" step will output additional diagnostic information that can be used for troubleshooting. <a href="https://redirect.github.com/github/codeql-action/pull/3486">#3486</a></li> <li>Added a setting which allows the CodeQL Action to enable network debugging for Java programs. This will help GitHub staff support customers with troubleshooting issues in GitHub-managed CodeQL workflows, such as Default Setup. This setting can only be enabled by GitHub staff. <a href="https://redirect.github.com/github/codeql-action/pull/3485">#3485</a></li> <li>Added a setting which enables GitHub-managed workflows, such as Default Setup, to use a <a href="https://github.com/dsp-testing/codeql-cli-nightlies">nightly CodeQL CLI release</a> instead of the latest, stable release that is used by default. This will help GitHub staff support customers whose analyses for a given repository or organization require early access to a change in an upcoming CodeQL CLI release. This setting can only be enabled by GitHub staff. <a href="https://redirect.github.com/github/codeql-action/pull/3484">#3484</a></li> </ul> <h2>4.32.3 - 13 Feb 2026</h2> <ul> <li>Added experimental support for testing connections to <a href="https://docs.github.com/en/code-security/how-tos/secure-at-scale/configure-organization-security/manage-usage-and-access/giving-org-access-private-registries">private package registries</a>. This feature is not currently enabled for any analysis. In the future, it may be enabled by default for Default Setup. <a href="https://redirect.github.com/github/codeql-action/pull/3466">#3466</a></li> </ul> <h2>4.32.2 - 05 Feb 2026</h2> <ul> <li>Update default CodeQL bundle version to <a href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.24.1">2.24.1</a>. <a href="https://redirect.github.com/github/codeql-action/pull/3460">#3460</a></li> </ul> <h2>4.32.1 - 02 Feb 2026</h2> <ul> <li>A warning is now shown in Default Setup workflow logs if a <a href="https://docs.github.com/en/code-security/how-tos/secure-at-scale/configure-organization-security/manage-usage-and-access/giving-org-access-private-registries">private package registry is configured</a> using a GitHub Personal Access Token (PAT), but no username is configured. <a href="https://redirect.github.com/github/codeql-action/pull/3422">#3422</a></li> <li>Fixed a bug which caused the CodeQL Action to fail when repository properties cannot successfully be retrieved. <a href="https://redirect.github.com/github/codeql-action/pull/3421">#3421</a></li> </ul> <h2>4.32.0 - 26 Jan 2026</h2> <ul> <li>Update default CodeQL bundle version to <a href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.24.0">2.24.0</a>. <a href="https://redirect.github.com/github/codeql-action/pull/3425">#3425</a></li> </ul> <h2>4.31.11 - 23 Jan 2026</h2> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/github/codeql-action/commit/0d579ffd059c29b07949a3cce3983f0780820c98"><code>0d579ff</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3551">#3551</a> from github/update-v4.32.6-72d2d850d</li> <li><a href="https://github.com/github/codeql-action/commit/d4c6be7cf1c47a33a06fa9183269e133e6863574"><code>d4c6be7</code></a> Update changelog for v4.32.6</li> <li><a href="https://github.com/github/codeql-action/commit/72d2d850d1f91d4e1e024f4cf4276fd16bb68462"><code>72d2d85</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3548">#3548</a> from github/update-bundle/codeql-bundle-v2.24.3</li> <li><a href="https://github.com/github/codeql-action/commit/23f983ce00d9a853697a6aaa9eae8d5abbf14849"><code>23f983c</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3544">#3544</a> from github/dependabot/github_actions/dot-github/wor...</li> <li><a href="https://github.com/github/codeql-action/commit/832e97ccad228ef72e06ffee26f6251bceeb7e5f"><code>832e97c</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3545">#3545</a> from github/dependabot/github_actions/dot-github/wor...</li> <li><a href="https://github.com/github/codeql-action/commit/5ef38c0b13c2f0f5ce928cb7706f5fb19fc97ae2"><code>5ef38c0</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3546">#3546</a> from github/dependabot/npm_and_yarn/tar-7.5.10</li> <li><a href="https://github.com/github/codeql-action/commit/80c9cda73902bba67939606c4bf3a1d9606bb150"><code>80c9cda</code></a> Add changelog note</li> <li><a href="https://github.com/github/codeql-action/commit/f2669dd916c673b2811839169929a8ba71bb7634"><code>f2669dd</code></a> Update default bundle to codeql-bundle-v2.24.3</li> <li><a href="https://github.com/github/codeql-action/commit/bd03c44cf40965f5476f66fad404194e4cb35710"><code>bd03c44</code></a> Merge branch 'main' into dependabot/github_actions/dot-github/workflows/actio...</li> <li><a href="https://github.com/github/codeql-action/commit/102d7627b63c066871badf0743c11b2f6dd9c9e9"><code>102d762</code></a> Bump tar from 7.5.7 to 7.5.10</li> <li>Additional commits viewable in <a href="https://github.com/github/codeql-action/compare/c793b717bc78562f491db7b0e93a3a178b099162...0d579ffd059c29b07949a3cce3983f0780820c98">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. ## Rationale for this change ``` Crate: paste Version: 1.0.15 Warning: unmaintained Title: paste - no longer maintained Date: 2024-10-07 ID: RUSTSEC-2024-0436 ``` We also need to remove `paste` from the project <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
…he#20231) ## Which issue does this PR close? Closes apache#20194 ## Rationale for this change A query with `ROW_NUMBER() OVER (... ORDER BY CASE WHEN col='0' THEN 1 ELSE 0 END)` combined with a filter `nvl(t2.value_2_3,'0')='0'` fails with a `SanityCheckPlan` error. This worked in 50.3.0 but broke in 52.1.0. ## What changes are included in this PR? **Root cause**: `collect_columns_from_predicate_inner` was extracting equality pairs where neither side was a `Column` (e.g. `nvl(col, '0') = '0'`), creating equivalence classes between complex expressions and literals. `normalize_expr`'s deep traversal would then replace the literal `'0'` inside unrelated sort/window CASE WHEN expressions with the complex NVL expression, corrupting the sort ordering and causing a mismatch between `SortExec`'s reported output ordering and `BoundedWindowAggExec`'s expected ordering. **Fix** (two changes in `filter.rs`): 1. **`collect_columns_from_predicate_inner`**: Only extract equality pairs where at least one side is a `Column` reference. This matches the function's documented intent ("Column-Pairs") and prevents complex-expression-to-literal equivalence classes from being created. 2. **`extend_constants`**: Recognize `Literal` expressions as inherently constant (previously only checked `is_expr_constant` on the input's equivalence properties, which doesn't know about literals). This ensures constant propagation still works for `complex_expr = literal` predicates — e.g. `nvl(col, '0')` is properly marked as constant after the filter. ## How was this tested? - Unit test `test_collect_columns_skips_non_column_pairs` verifying the filtering logic - Sqllogictest reproducing the exact query from the issue - Full test suites: equivalence tests (51 passed), physical-plan tests (1255 passed), physical-optimizer tests (20 passed) - Manual verification with datafusion-cli running the reproduction query ## Test plan - [x] Unit test for `collect_columns_from_predicate_inner` column filtering - [x] Sqllogictest regression test for apache#20194 - [x] Existing test suites pass - [x] Manual reproduction query succeeds --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Bumps [quinn-proto](https://github.com/quinn-rs/quinn) from 0.11.13 to 0.11.14. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/quinn-rs/quinn/releases">quinn-proto's releases</a>.</em></p> <blockquote> <h2>quinn-proto 0.11.14</h2> <p><a href="https://github.com/jxs"><code>@jxs</code></a> reported a denial of service issue in quinn-proto 5 days ago:</p> <ul> <li><a href="https://github.com/quinn-rs/quinn/security/advisories/GHSA-6xvm-j4wr-6v98">https://github.com/quinn-rs/quinn/security/advisories/GHSA-6xvm-j4wr-6v98</a></li> </ul> <p>We coordinated with them to release this version to patch the issue. Unfortunately the maintainers missed these issues during code review and we did not have enough fuzzing coverage -- we regret the oversight and have added an additional fuzzing target.</p> <p>Organizations that want to participate in coordinated disclosure can contact us privately to discuss terms.</p> <h2>What's Changed</h2> <ul> <li>Fix over-permissive proto dependency edge by <a href="https://github.com/Ralith"><code>@Ralith</code></a> in <a href="https://redirect.github.com/quinn-rs/quinn/pull/2385">quinn-rs/quinn#2385</a></li> <li>0.11.x: avoid unwrapping VarInt decoding during parameter parsing by <a href="https://github.com/djc"><code>@djc</code></a> in <a href="https://redirect.github.com/quinn-rs/quinn/pull/2559">quinn-rs/quinn#2559</a></li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/quinn-rs/quinn/commit/2c315aa7f9c2a6c1db87f8f51f40623a427c78fd"><code>2c315aa</code></a> proto: bump version to 0.11.14</li> <li><a href="https://github.com/quinn-rs/quinn/commit/8ad47f431e7deb82c08b09c2e33ef85aa88fd212"><code>8ad47f4</code></a> Use newer rustls-pki-types PEM parser API</li> <li><a href="https://github.com/quinn-rs/quinn/commit/c81c0289abe30d8437ccbf9b6304e2bc9c707cea"><code>c81c028</code></a> ci: fix workflow syntax</li> <li><a href="https://github.com/quinn-rs/quinn/commit/0050172969f7e69e136c433181330da7790d8d73"><code>0050172</code></a> ci: pin wasm-bindgen-cli version</li> <li><a href="https://github.com/quinn-rs/quinn/commit/8a6f82c58d1c565eab78f986e614223e6ed76a85"><code>8a6f82c</code></a> Take semver-compatible dependency updates</li> <li><a href="https://github.com/quinn-rs/quinn/commit/e52db4ad8df0f9720e7b0e32ecc0e48c9a93de0f"><code>e52db4a</code></a> Apply suggestions from clippy 1.91</li> <li><a href="https://github.com/quinn-rs/quinn/commit/6df7275c582ca9b7225e0ccf9f9871a55eb73155"><code>6df7275</code></a> chore: Fix <code>unnecessary_unwrap</code> clippy</li> <li><a href="https://github.com/quinn-rs/quinn/commit/c8eefa07e087b06d8f2b78ff262ce8ac952994f1"><code>c8eefa0</code></a> proto: avoid unwrapping varint decoding during parameters parsing</li> <li><a href="https://github.com/quinn-rs/quinn/commit/9723a977754c8662001b0fef97aab8f3ddf1df92"><code>9723a97</code></a> fuzz: add fuzzing target for parsing transport parameters</li> <li><a href="https://github.com/quinn-rs/quinn/commit/eaf0ef30252cef4acec21f150427e604cd4271c9"><code>eaf0ef3</code></a> Fix over-permissive proto dependency edge (<a href="https://redirect.github.com/quinn-rs/quinn/issues/2385">#2385</a>)</li> <li>Additional commits viewable in <a href="https://github.com/quinn-rs/quinn/compare/quinn-proto-0.11.13...quinn-proto-0.11.14">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/apache/datafusion/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Which issue does this PR close? - Closes apache#20841 ## Rationale for this change We want to split IO and CPU to allow for more (NUMA-aware) parallelism and utilizing IO and CPU better. This allows for e.g. more coalescing, prefetching, parallel IO, more parallel / incremental decoding etc. Also this allows doing morsels only on a CPU level and not doing IO multiple times for each morsel. ## What changes are included in this PR? Just refactor `ParquetOpener` to use `ParquetPushDecoder`. I used claude to rewrite it and to keep changes small. ## Are these changes tested? Existing tests. Nothing should change, the arrow-rs code also uses `ParquetPushDecoder`. ## Are there any user-facing changes? --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ache#20780) ## Which issue does this PR close? - Closes #. ## Rationale for this change ClickBench quueries (Q7, Q15, Q16, Q18) have some redundant projections for sorting based on count. Probably not a (measurable) improvement, but the plan looks better (in case of non-TopK it could probably be measurable). ## What changes are included in this PR? ## Are these changes tested? Existing tests. ## Are there any user-facing changes? --------- Co-authored-by: Claude <noreply@anthropic.com>
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Besides pushing `LimitExec` down the query plan, there is another optimization that allows plan nodes to *absorb* a limit, so it can potentially stop early. I’ve noticed that this form of limit absorption has not been implemented by many operators. This suggests the optimization is non-obvious, so I’d like to improve the documentation for it. A recent PR that implements this optimization is: - apache#20228 ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#20797 ## Rationale for this change - see apache#20797 ## What changes are included in this PR? impl ser/de for preserve_order in RepartitionExec ## Are these changes tested? add one test case ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
…ache#20627) ## Summary - include synchronous `start_next_file()` / `FileOpener::open()` setup time in `time_elapsed_scanning_total` - keep existing `time_opening` and scanning timers lifecycle intact - avoid timer overlap by scoping the temporary timer before calling `time_scanning_total.start()` ## Details In `FileStreamState::Open`, `start_next_file()` is invoked before `time_scanning_total.start()`. If `open()` performs synchronous work before returning the future, that time was previously unaccounted for in `time_elapsed_scanning_total`. This change wraps the `start_next_file()` call in a scoped timer on the same `time_scanning_total` metric so the missing segment is recorded. - Fixes apache#20571 ## Validation I tested by reading CSV files via AWS S3. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. ``` Crate: generational-arena Version: 0.2.9 Warning: unmaintained Title: `generational-arena` is unmaintained Date: 2024-02-11 ID: RUSTSEC-2024-0014 URL: https://rustsec.org/advisories/RUSTSEC-2024-0014 ``` ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. ## Rationale for this change Move dependencies from main area to dev <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
…oder Resolve merge conflicts in opener.rs, clickbench.slt, and projection_pushdown.slt. Adapt the morsel-driven bloom filter pruning in open() to use a separate ParquetRecordBatchStreamBuilder (as upstream now does) since prune_by_bloom_filters requires that type, not the new ParquetPushDecoderBuilder. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This reverts commit 4fc8edd.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.