Skip to content

Brian.myers/fix submodule stuff#20889

Closed
bcmyers wants to merge 26 commits intoapache:mainfrom
DataDog:brian.myers/fix-submodule-stuff
Closed

Brian.myers/fix submodule stuff#20889
bcmyers wants to merge 26 commits intoapache:mainfrom
DataDog:brian.myers/fix-submodule-stuff

Conversation

@bcmyers
Copy link

@bcmyers bcmyers commented Mar 11, 2026

Turn off submodules

alamb and others added 26 commits January 8, 2026 11:26
…STSEC-2026-0001` (apache#19666) (apache#19686)

## Which issue does this PR close?

- part of apache#18566

## Rationale for this change

Let's resolve the audit workflow on branch-52 by removing our dependency
on the `rust_decimal` library, per @xudong963 's request here
apache#19666 (review)

## What changes are included in this PR?

- Backport  apache#19666 to branch-52

## Are these changes tested?

By CI

## Are there any user-facing changes?
No, this is an internal dev tool change
… constructors (apache#19681)

Backport of apache#19668 to branch-52.

This PR cherry-picks commit 680ddcc from main.

Includes:
- Split of BatchPartitioner::try_new into hash and round-robin
constructors
- Documentation improvements
- No behavior changes


part of apache#18566

Co-authored-by: Your Name <youremail@example.com>
…sCache table scoped (apache#19704)

## Which issue does this PR close?

- part of apache#18566

## Rationale for this change

Backport the fix for this regression into 52 release branch:
-  apache#19573 

## What changes are included in this PR?

Backport these two commits to `branch-52` (cherry-pick was clean)
- 1037f0a / apache#19388
- e6049de / apache#19616

<details><summary>Commands</summary>
<p>

```shell
andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion$ git cherry-pick 1037f0a
[branch-52 1fc70ac] feat: add list_files_cache table function for `datafusion-cli` (apache#19388)
 Author: jizezhang <jizez@uw.edu>
 Date: Tue Jan 6 05:23:39 2026 -0800
 5 files changed, 446 insertions(+), 31 deletions(-)
andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion$ git cherry-pick  e6049de
Auto-merging datafusion/core/src/execution/context/mod.rs
[branch-52 aa3d413] Make default ListingFilesCache table scoped (apache#19616)
 Author: jizezhang <jizez@uw.edu>
 Date: Thu Jan 8 06:34:10 2026 -0800
 10 files changed, 474 insertions(+), 184 deletions(-)
```

</p>
</details> 

## Are these changes tested?

By CI and new tests

## Are there any user-facing changes?

A new datafusion-cli function and dropping a external table now clears
the listing cache

---------

Co-authored-by: jizezhang <jizez@uw.edu>
…pache#19659) (apache#19705)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- part of apache#18566

## Rationale for this change

I propose back porting the fix for
apache#19641 to 52 release

## What changes are included in this PR?

- Backport apache#19659

## Are these changes tested?
eYes

## Are there any user-facing changes?

bug fix

Co-authored-by: Kumar Ujjawal <ujjawalpathak6@gmail.com>
…he#19661)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Part of apache#18566

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- part of apache#19784.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

This is a follow-up of apache#19573

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Backport 
- apache#19804

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Existing tests

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
no
<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…itedBatchCoalescer (apache#19785) (apache#19836)

## Which issue does this PR close?

- part of apache#19784 
- related to apache#19785
- backport fix of apache#19781

## Rationale for this change

Backport a regression found by @bert-beyondloops into the 52 release
line

## What changes are included in this PR?
Backport fix for apache#19781,
apache#19785 to branch-52

## Are these changes tested?

Yes
## Are there any user-facing changes?

bug fix

Co-authored-by: Bert Vermeiren <103956021+bert-beyondloops@users.noreply.github.com>
Co-authored-by: Bert Vermeiren <bert.vermeiren@datadobi.com>
Brings apache#19853 into `branch-52`

Co-authored-by: Pierre Lacave <pierre.lacave@datadoghq.com>
…pache#19877)

- part of apache#19784

- Brings apache#19716 from @adriangb
into `branch-52`

Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
## Which issue does this PR close?

- part of apache#19784
## Rationale for this change

Prepare for release

## What changes are included in this PR?

Update release version to 52.1.0 and add changelog

## Are these changes tested?

By CI
## Are there any user-facing changes?

New version
(cherry picked from commit f6450d6)

Co-authored-by: Gabriel Musat Mestre <gabriel.musatmestre@datadoghq.com>
* downgrade substrait

(cherry picked from commit 40242b4)

* downgrade prost

(cherry picked from commit 3ae6613)

* downgrade prost for ffi

(cherry picked from commit 42c8585)

* Fix clippy warning

---------

Co-authored-by: Ahmed Mezghani <ahmed.mezghani@datadoghq.com>
* Fix dynamic filter is_used function (apache#19734)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#19715.

## Rationale for this change

The:is_used() API incorrectly returned false for custom `DataSource`
implementations that didn't call reassign_expr_columns() ->
with_new_children() . This caused `HashJoinExec` to skip computing
dynamic filters even when they were actually being used.

## What changes are included in this PR?

Updated is_used() to check both outer and inner Arc counts

## Are these changes tested?

Functionality is covered by existing test
`test_hashjoin_dynamic_filter_pushdown_is_used`. I was not sure if to
add a repro since it would require adding a custom `DataSource`, the
current tests in
datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs use
`FileScanConfig`

## Are there any user-facing changes?

no

(cherry picked from commit 278950a)

* Simplify wait_complete function (apache#19937)

## Which issue does this PR close?

## Rationale for this change

The current v52 signature `pub async fn wait_complete(self: &Arc<Self>)`
(introduced in apache#19546) is a bit unergonomic. The method requires
`&Arc<DynamicFilterPhysicalExpr>`, but when working with `Arc<dyn
PhysicalExpr>`, downcasting only gives you `&DynamicFilterPhysicalExpr`.
Since you can't convert `&DynamicFilterPhysicalExpr` to
`Arc<DynamicFilterPhysicalExpr>`, the method becomes impossible to call.

The `&Arc<Self>` param was used to check` is_used()` via Arc strong
count, but this was overly defensive.

## What changes are included in this PR?

- Changed `DynamicFilterPhysicalExpr::wait_complete` signature from `pub
async fn wait_complete(self: &Arc<Self>)` to `pub async fn
wait_complete(&self)`.

- Removed the `is_used()` check from `wait_complete()` - this method,
like `wait_update()`, should only be called on filters that have
consumers. If the caller doesn't know whether the filter has consumers,
they should call `is_used()` first to avoid waiting indefinitely. This
approach avoids complex signatures and dependencies between the APIs
methods.

## Are these changes tested?

Yes, existing tests cover this functionality, I removed the "mock"
consumer from `test_hash_join_marks_filter_complete_empty_build_side`
and `test_hash_join_marks_filter_complete` since the fix in
apache#19734 makes is_used check the
outer struct `strong_count` as well.

## Are there any user-facing changes?

The signature of `wait_complete` changed.

(cherry picked from commit bef1368)
fix: datatype_is_logically_equal for Dictionaries
- Closes apache#17508

The previous implementation used UUID-based aliasing as a workaround to
prevent duplicate names for literals in Substrait plans. This approach
had several drawbacks:
- Non-deterministic plan names that made testing difficult (requiring
UUID regex filters)
- Only addressed literal naming conflicts, not the broader issue of name
deduplication
- Added unnecessary dependency on the `uuid` crate
- Didn't properly handle cases where the same qualified name could
appear with different schema representations

  1. Enhanced NameTracker: Refactored to detect two types of conflicts:
- Duplicate schema names: Tracked via schema_name() to prevent
validate_unique_names failures (e.g., two Utf8(NULL) literals)
- Ambiguous references: Tracked via qualified_name() to prevent
DFSchema::check_names failures when a qualified field (e.g.,
left.Utf8(NULL)) and unqualified field (e.g., Utf8(NULL)) share the same
column name
2. **Removed UUID dependency**: Eliminated the `uuid` crate from
`datafusion/substrait`
3. **Removed literal-specific aliasing**: The UUID-based workaround in
`project_rel.rs` is no longer needed as the improved NameTracker handles
all naming conflicts consistently
4. **Deterministic naming**: Name conflicts now use predictable
`__temp__N` suffixes instead of random UUIDs

Note: This doesn't fully fix all the issues in apache#17508 which allow some
special casing of `CAST` which are not included here.

Yes:
- Updated snapshot tests to reflect the new deterministic naming (e.g.,
`Utf8("people")__temp__0` instead of UUID-based names)
- Modified some roundtrip tests to verify semantic equivalence (schema
matching and execution) rather than exact string matching, which is more
robust
- All existing integration tests pass with the new naming scheme

Minimal. The generated plan names are now deterministic and more
readable (using `__temp__N` suffixes instead of UUIDs), but this is
primarily an internal representation change. The functional behavior and
query results remain unchanged.

(cherry picked from commit d59cdfe)

Co-authored-by: Xander <zander181@googlemail.com>
…ostgres compatible (apache#20499) (#89)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#20502

## Rationale for this change

The following query is failing with the following error:

`SELECT date_bin(NULL, TIMESTAMP '2023-01-01 12:30:00', TIMESTAMP
'2023-01-01 12:00:00')
`

`Error: Error during planning: Failed to coerce arguments to satisfy a
call to 'date_bin' function: coercion from Null, Timestamp(ns),
Timestamp(ns) to the signature OneOf([....])`

## What changes are included in this PR?

Fix `date_bin(NULL, ...)` to return `NULL` instead of a planning error
by allowing Nulls to coerce to Interva.

## Are these changes tested?

I added a sqllogictest case to verify the query executes and returns
`NULL`.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

Yes, previously `date_bin(NULL, ...) `returned a planning error. It now
returns NULL.

(cherry picked from commit e937cad)
…he#20542) (#88)

- Closes apache#20503

`split_part` did not handle empty delimiters in a PostgreSQL-compatible
way (`split("")` in Rust creates leading/trailing empty fields).
This could return unexpected results for positions like `1` / `-1` and
out-of-range values.
This PR aligns behavior with Postgres semantics for empty delimiters.

Small change in how we treat the 1, -1

Indeed!

Yes, behavior is now more consistent with PostgreSQL for
`split_part(str, '', n)`.
No API changes.

(cherry picked from commit 3ab1301)

Co-authored-by: Gabriel Ferraté <gabriel.ferrate@datadoghq.com>
…che#19369)

## Which issue does this PR close?
Closes apache#19348

## Rationale for this change
Previously, pow() on decimal types would error for negative exponents
and non-integer exponents with messages like:

- Arrow error: Arithmetic overflow: Unsupported exp value: -5
- Compute error: Cannot use non-integer exp
- This was a regression from when decimals were cast to float before
pow(). The efficient integer-based algorithm for computing power on
scaled integers cannot handle these cases.

## What changes are included in this PR?
- Modified pow_decimal_int to fallback to pow_decimal_float for negative
exponents
- Modified pow_decimal_float to use an efficient integer path for
non-negative integer exponents, otherwise fallback to f64 computation

Added pow_decimal_float_fallback function that:
- Converts the decimal to f64
- Computes powf(exp)
- Converts back to the original decimal type with proper scaling
- Added decimal_from_i128 helper to convert i128 results back to generic
decimal types (needed for Decimal256 support)
- Updated sqllogictests to expect success for negative/non-integer
exponents

## Are these changes tested?
Yes:

Unit tests for pow_decimal_float_fallback covering negative exponents,
fractional exponents, cube roots
Updated SQL logic tests in decimal.slt

## Are there any user-facing changes?
Yes. The following queries now work instead of returning errors:

```sql
-- Negative exponent
SELECT power(4::decimal(38, 5), -1);  -- Returns 0.25

-- Non-integer exponent
SELECT power(2.5, 4.2);  -- Returns 46.9

-- Square root via power
SELECT power(4::decimal, 0.5);  -- Returns 2

(cherry picked from commit 1d5d63c)
Feat: Allow pow with negative & non-integer exponent on decimals (#19…

Co-authored-by: Yuvraj-cyborg <124506015+Yuvraj-cyborg@users.noreply.github.com>
Co-authored-by: ben.bellick <ben.bellick@datadoghq.com>
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

`ExecutionPlan::with_new_state()` allows devs to inject custom
information in their nodes that they can use for tweaking their
`ExecutionPlan` implementations.

This mechanism does not work today if the `ExecutionPlan` is a
`DataSourceExec`, as this one does not implement the `with_new_state()`
method from `ExecutionPlan`.

In order to let people use this also for their own `DataSource`
implementations, this PR adds this method to it.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Propagates `with_new_state()` to the `DataSource` trait, so that custom
`DataSourceExec` can also benefit from it.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

As it's just method plumbing, my impression is that having a test for it
is overkill.

## Are there any user-facing changes?

Users can now implement their `with_new_state()` also in `DataSource`,
not only in `ExecutionPlan`

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

(cherry picked from commit 46ac990)
@bcmyers bcmyers closed this Mar 11, 2026
@github-actions github-actions bot added documentation Improvements or additions to documentation development-process Related to development process of DataFusion logical-expr Logical plan and expressions labels Mar 11, 2026
@github-actions github-actions bot added physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate catalog Related to the catalog crate common Related to common crate execution Related to the execution crate functions Changes to functions implementation datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate labels Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

catalog Related to the catalog crate common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate development-process Related to development process of DataFusion documentation Improvements or additions to documentation execution Related to the execution crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants