Closed
Conversation
…STSEC-2026-0001` (apache#19666) (apache#19686) ## Which issue does this PR close? - part of apache#18566 ## Rationale for this change Let's resolve the audit workflow on branch-52 by removing our dependency on the `rust_decimal` library, per @xudong963 's request here apache#19666 (review) ## What changes are included in this PR? - Backport apache#19666 to branch-52 ## Are these changes tested? By CI ## Are there any user-facing changes? No, this is an internal dev tool change
… constructors (apache#19681) Backport of apache#19668 to branch-52. This PR cherry-picks commit 680ddcc from main. Includes: - Split of BatchPartitioner::try_new into hash and round-robin constructors - Documentation improvements - No behavior changes part of apache#18566 Co-authored-by: Your Name <youremail@example.com>
…sCache table scoped (apache#19704) ## Which issue does this PR close? - part of apache#18566 ## Rationale for this change Backport the fix for this regression into 52 release branch: - apache#19573 ## What changes are included in this PR? Backport these two commits to `branch-52` (cherry-pick was clean) - 1037f0a / apache#19388 - e6049de / apache#19616 <details><summary>Commands</summary> <p> ```shell andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion$ git cherry-pick 1037f0a [branch-52 1fc70ac] feat: add list_files_cache table function for `datafusion-cli` (apache#19388) Author: jizezhang <jizez@uw.edu> Date: Tue Jan 6 05:23:39 2026 -0800 5 files changed, 446 insertions(+), 31 deletions(-) andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion$ git cherry-pick e6049de Auto-merging datafusion/core/src/execution/context/mod.rs [branch-52 aa3d413] Make default ListingFilesCache table scoped (apache#19616) Author: jizezhang <jizez@uw.edu> Date: Thu Jan 8 06:34:10 2026 -0800 10 files changed, 474 insertions(+), 184 deletions(-) ``` </p> </details> ## Are these changes tested? By CI and new tests ## Are there any user-facing changes? A new datafusion-cli function and dropping a external table now clears the listing cache --------- Co-authored-by: jizezhang <jizez@uw.edu>
…pache#19659) (apache#19705) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - part of apache#18566 ## Rationale for this change I propose back porting the fix for apache#19641 to 52 release ## What changes are included in this PR? - Backport apache#19659 ## Are these changes tested? eYes ## Are there any user-facing changes? bug fix Co-authored-by: Kumar Ujjawal <ujjawalpathak6@gmail.com>
…he#19661) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Part of apache#18566 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - part of apache#19784. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> This is a follow-up of apache#19573 ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Backport - apache#19804 ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Existing tests ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> no <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
…itedBatchCoalescer (apache#19785) (apache#19836) ## Which issue does this PR close? - part of apache#19784 - related to apache#19785 - backport fix of apache#19781 ## Rationale for this change Backport a regression found by @bert-beyondloops into the 52 release line ## What changes are included in this PR? Backport fix for apache#19781, apache#19785 to branch-52 ## Are these changes tested? Yes ## Are there any user-facing changes? bug fix Co-authored-by: Bert Vermeiren <103956021+bert-beyondloops@users.noreply.github.com> Co-authored-by: Bert Vermeiren <bert.vermeiren@datadobi.com>
Brings apache#19853 into `branch-52` Co-authored-by: Pierre Lacave <pierre.lacave@datadoghq.com>
…pache#19877) - part of apache#19784 - Brings apache#19716 from @adriangb into `branch-52` Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
## Which issue does this PR close? - part of apache#19784 ## Rationale for this change Prepare for release ## What changes are included in this PR? Update release version to 52.1.0 and add changelog ## Are these changes tested? By CI ## Are there any user-facing changes? New version
(cherry picked from commit f6450d6) Co-authored-by: Gabriel Musat Mestre <gabriel.musatmestre@datadoghq.com>
* Fix dynamic filter is_used function (apache#19734) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes apache#19715. ## Rationale for this change The:is_used() API incorrectly returned false for custom `DataSource` implementations that didn't call reassign_expr_columns() -> with_new_children() . This caused `HashJoinExec` to skip computing dynamic filters even when they were actually being used. ## What changes are included in this PR? Updated is_used() to check both outer and inner Arc counts ## Are these changes tested? Functionality is covered by existing test `test_hashjoin_dynamic_filter_pushdown_is_used`. I was not sure if to add a repro since it would require adding a custom `DataSource`, the current tests in datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs use `FileScanConfig` ## Are there any user-facing changes? no (cherry picked from commit 278950a) * Simplify wait_complete function (apache#19937) ## Which issue does this PR close? ## Rationale for this change The current v52 signature `pub async fn wait_complete(self: &Arc<Self>)` (introduced in apache#19546) is a bit unergonomic. The method requires `&Arc<DynamicFilterPhysicalExpr>`, but when working with `Arc<dyn PhysicalExpr>`, downcasting only gives you `&DynamicFilterPhysicalExpr`. Since you can't convert `&DynamicFilterPhysicalExpr` to `Arc<DynamicFilterPhysicalExpr>`, the method becomes impossible to call. The `&Arc<Self>` param was used to check` is_used()` via Arc strong count, but this was overly defensive. ## What changes are included in this PR? - Changed `DynamicFilterPhysicalExpr::wait_complete` signature from `pub async fn wait_complete(self: &Arc<Self>)` to `pub async fn wait_complete(&self)`. - Removed the `is_used()` check from `wait_complete()` - this method, like `wait_update()`, should only be called on filters that have consumers. If the caller doesn't know whether the filter has consumers, they should call `is_used()` first to avoid waiting indefinitely. This approach avoids complex signatures and dependencies between the APIs methods. ## Are these changes tested? Yes, existing tests cover this functionality, I removed the "mock" consumer from `test_hash_join_marks_filter_complete_empty_build_side` and `test_hash_join_marks_filter_complete` since the fix in apache#19734 makes is_used check the outer struct `strong_count` as well. ## Are there any user-facing changes? The signature of `wait_complete` changed. (cherry picked from commit bef1368)
…output` (apache#19019)" (#73) This reverts commit 2626fc2.
(cherry picked from commit 09c29d7)
(cherry picked from commit 5bab65b)
fix: datatype_is_logically_equal for Dictionaries
This reverts commit 9f3ddce.
- Closes apache#17508 The previous implementation used UUID-based aliasing as a workaround to prevent duplicate names for literals in Substrait plans. This approach had several drawbacks: - Non-deterministic plan names that made testing difficult (requiring UUID regex filters) - Only addressed literal naming conflicts, not the broader issue of name deduplication - Added unnecessary dependency on the `uuid` crate - Didn't properly handle cases where the same qualified name could appear with different schema representations 1. Enhanced NameTracker: Refactored to detect two types of conflicts: - Duplicate schema names: Tracked via schema_name() to prevent validate_unique_names failures (e.g., two Utf8(NULL) literals) - Ambiguous references: Tracked via qualified_name() to prevent DFSchema::check_names failures when a qualified field (e.g., left.Utf8(NULL)) and unqualified field (e.g., Utf8(NULL)) share the same column name 2. **Removed UUID dependency**: Eliminated the `uuid` crate from `datafusion/substrait` 3. **Removed literal-specific aliasing**: The UUID-based workaround in `project_rel.rs` is no longer needed as the improved NameTracker handles all naming conflicts consistently 4. **Deterministic naming**: Name conflicts now use predictable `__temp__N` suffixes instead of random UUIDs Note: This doesn't fully fix all the issues in apache#17508 which allow some special casing of `CAST` which are not included here. Yes: - Updated snapshot tests to reflect the new deterministic naming (e.g., `Utf8("people")__temp__0` instead of UUID-based names) - Modified some roundtrip tests to verify semantic equivalence (schema matching and execution) rather than exact string matching, which is more robust - All existing integration tests pass with the new naming scheme Minimal. The generated plan names are now deterministic and more readable (using `__temp__N` suffixes instead of UUIDs), but this is primarily an internal representation change. The functional behavior and query results remain unchanged. (cherry picked from commit d59cdfe) Co-authored-by: Xander <zander181@googlemail.com>
…ostgres compatible (apache#20499) (#89) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes apache#20502 ## Rationale for this change The following query is failing with the following error: `SELECT date_bin(NULL, TIMESTAMP '2023-01-01 12:30:00', TIMESTAMP '2023-01-01 12:00:00') ` `Error: Error during planning: Failed to coerce arguments to satisfy a call to 'date_bin' function: coercion from Null, Timestamp(ns), Timestamp(ns) to the signature OneOf([....])` ## What changes are included in this PR? Fix `date_bin(NULL, ...)` to return `NULL` instead of a planning error by allowing Nulls to coerce to Interva. ## Are these changes tested? I added a sqllogictest case to verify the query executes and returns `NULL`. ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> Yes, previously `date_bin(NULL, ...) `returned a planning error. It now returns NULL. (cherry picked from commit e937cad)
…he#20542) (#88) - Closes apache#20503 `split_part` did not handle empty delimiters in a PostgreSQL-compatible way (`split("")` in Rust creates leading/trailing empty fields). This could return unexpected results for positions like `1` / `-1` and out-of-range values. This PR aligns behavior with Postgres semantics for empty delimiters. Small change in how we treat the 1, -1 Indeed! Yes, behavior is now more consistent with PostgreSQL for `split_part(str, '', n)`. No API changes. (cherry picked from commit 3ab1301) Co-authored-by: Gabriel Ferraté <gabriel.ferrate@datadoghq.com>
…che#19369) ## Which issue does this PR close? Closes apache#19348 ## Rationale for this change Previously, pow() on decimal types would error for negative exponents and non-integer exponents with messages like: - Arrow error: Arithmetic overflow: Unsupported exp value: -5 - Compute error: Cannot use non-integer exp - This was a regression from when decimals were cast to float before pow(). The efficient integer-based algorithm for computing power on scaled integers cannot handle these cases. ## What changes are included in this PR? - Modified pow_decimal_int to fallback to pow_decimal_float for negative exponents - Modified pow_decimal_float to use an efficient integer path for non-negative integer exponents, otherwise fallback to f64 computation Added pow_decimal_float_fallback function that: - Converts the decimal to f64 - Computes powf(exp) - Converts back to the original decimal type with proper scaling - Added decimal_from_i128 helper to convert i128 results back to generic decimal types (needed for Decimal256 support) - Updated sqllogictests to expect success for negative/non-integer exponents ## Are these changes tested? Yes: Unit tests for pow_decimal_float_fallback covering negative exponents, fractional exponents, cube roots Updated SQL logic tests in decimal.slt ## Are there any user-facing changes? Yes. The following queries now work instead of returning errors: ```sql -- Negative exponent SELECT power(4::decimal(38, 5), -1); -- Returns 0.25 -- Non-integer exponent SELECT power(2.5, 4.2); -- Returns 46.9 -- Square root via power SELECT power(4::decimal, 0.5); -- Returns 2 (cherry picked from commit 1d5d63c)
Feat: Allow pow with negative & non-integer exponent on decimals (#19… Co-authored-by: Yuvraj-cyborg <124506015+Yuvraj-cyborg@users.noreply.github.com> Co-authored-by: ben.bellick <ben.bellick@datadoghq.com>
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> `ExecutionPlan::with_new_state()` allows devs to inject custom information in their nodes that they can use for tweaking their `ExecutionPlan` implementations. This mechanism does not work today if the `ExecutionPlan` is a `DataSourceExec`, as this one does not implement the `with_new_state()` method from `ExecutionPlan`. In order to let people use this also for their own `DataSource` implementations, this PR adds this method to it. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Propagates `with_new_state()` to the `DataSource` trait, so that custom `DataSourceExec` can also benefit from it. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> As it's just method plumbing, my impression is that having a test for it is overkill. ## Are there any user-facing changes? Users can now implement their `with_new_state()` also in `DataSource`, not only in `ExecutionPlan` <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> (cherry picked from commit 46ac990)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Turn off submodules