Najib.boutaib/cherry pick commit 5765a08 #18801

N-Boutaib · 2025-11-18T12:39:14Z

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

…ds instead of computing them after data is accumulated (apache#17444) (#46) (cherry picked from commit 5b833b9) Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>

* Downgrade sha2 from 0.10.9 to 0.10.8 (#29) (cherry picked from commit ca48c4d) * retrigger checks --------- Co-authored-by: Gabriel <45515538+gabotechs@users.noreply.github.com>

(cherry picked from commit 368d6f4) Co-authored-by: Gabriel Musat Mestre <gabriel.musatmestre@datadoghq.com>

* Revert "Upgrade arrow/parquet to 56.0.0 (apache#16690)" This reverts commit fa1f8c1. * Revert "refactor: use upstream inline_key_fast (apache#17044)" This reverts commit 71b92bc. * Revert "fix: respect inexact flags in row group metadata (apache#16412)" This reverts commit afc90f7. * Revert "Test grouping by FixedSizeList (apache#17415)" This reverts commit 03f39e5. * Spelling (got reverted) * Also allow Byt from tests * Adjust sqllogictests

…#51) * fix(SubqueryAlias): use maybe_project_redundant_column Fixes apache#17405 * chore: format * ci: retry * chore(SubqueryAlias): restructore duplicate detection and add tests * docs: add examples and context to the reproducer (cherry picked from commit c910db4) Co-authored-by: Filippo Rossi <12383260+notfilippo@users.noreply.github.com>

* Revert "Revert arrow upgrade and related changes (#50)" This reverts commit 5506e69. * Allow typo

apache#18099) ## Which issue does this PR close?  Fixes comparison errors when using dictionary-encoded types with comparison functions like NULLIF. ## Rationale for this change When using dictionary-encoded columns (e.g., Dictionary(Int32, Utf8)) in comparison operations with literals or other types, DataFusion would throw an error stating the types are not comparable. This was particularly problematic for functions like NULLIF which rely on comparison coercion. The issue was that comparison_coercion_numeric didn't handle dictionary types, even though the general comparison_coercion function did have dictionary support.  ## What changes are included in this PR? 1. Refactored dictionary comparison logic: Extracted common dictionary coercion logic into dictionary_comparison_coercion_generic to avoid code duplication. 2. Added numeric-specific dictionary coercion: Introduced dictionary_comparison_coercion_numeric that uses numeric-preferring comparison rules when dealing with dictionary value types. 3. Updated comparison_coercion_numeric: Added a call to dictionary_comparison_coercion_numeric in the coercion chain to properly handle dictionary types. 4. Added sqllogictest cases demonstrating the fix works for various dictionary comparison scenarios.  ## Are these changes tested? Yes, added tests in datafusion/sqllogictest/test_files/nullif.slt covering: - Dictionary type compared with string literal - String compared with dictionary type - Dictionary compared with dictionary All tests pass with the fix and would fail without it.  ## Are there any user-facing changes? This is a bug fix that enables previously failing queries to work correctly. No breaking changes or API modifications.

Cherry-pick 7d294f1 (dict type coercion fix)

* Update to arrow/parquet 56.1.0 * Adjust for new parquet sizes, update for deprecated API * Thread through max_predicate_cache_size, add test (cherry picked from commit 980c948) Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

@asolimando

* Allow filter pushdown through AggregateExec (apache#18404) ## Which issue does this PR close? - Closes apache#18399 ## Rationale for this change Right now filters cannot pass through `AggregateExec` nodes, preventing filter pushdown optimization in queries with GROUP BY/DISTINCT operations. ## What changes are included in this PR? - Implemented `gather_filters_for_pushdown()` for `AggregateExec` that allows filters on grouping columns to pass through to children - Supports both Pre phase (static filters) and Post phase (dynamic filters from joins) Essentially, filter will pass through in the scenarios @asolimando mentioned [here](apache#18399 (comment)) ## Are these changes tested? Yes, added three tests: - `test_aggregate_filter_pushdown`: Positive case with aggregate functions - `test_no_pushdown_aggregate_filter_on_non_grouping_column`: Negative case ensuring filters on aggregate results are not pushed ## Are there any user-facing changes?   (cherry picked from commit 076b091) * physical-plan: push filters down to UnionExec children (apache#18054) Filters are safe to be pushed down, so we can override the default behavior here. Signed-off-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com> (cherry picked from commit 0ecd59b) * fix: prevent UnionExec panic with empty inputs (apache#17449) * fix: prevent UnionExec panic with empty inputs This commit fixes a panic in UnionExec when constructed with empty inputs. Previously, UnionExec::new(vec![]) would cause an index out of bounds panic at union.rs:542 when trying to access inputs[0]. Changes: - Made UnionExec::new() return Result<Self> with proper validation - Made union_schema() return Result<SchemaRef> with empty input checks - Added descriptive error messages for empty input cases - Updated all call sites to handle the new Result return type - Added comprehensive tests for edge cases Error messages: - "UnionExec requires at least one input" - "Cannot create union schema from empty inputs" The fix maintains backward compatibility for valid inputs while preventing crashes and providing clear error messages for invalid usage. Fixes apache#17052 * refactor: address PR review comments for UnionExec empty inputs fix - Add new try_new method that returns Result<Arc<dyn ExecutionPlan>> - Deprecate existing new method in favor of try_new - Optimize single-input case: try_new returns the input directly - Remove redundant assert!(result.is_err()) from tests - Rename test_union_multiple_inputs_still_works to test_union_schema_multiple_inputs - Update all call sites to use appropriate API (try_new for new code, deprecated new for tests) This maintains backward compatibility while providing better error handling and optimization for single-input cases. * Fix cargo fmt and clippy warnings - Add proper feature gates for parquet_encryption in datasource-parquet - Format code to pass cargo fmt checks - All tests passing * Fix clippy --------- Co-authored-by: Eeshan <eeshan@Eeshans-MacBook-Pro.local> Co-authored-by: ebembi-crdb <ebembi@cockroachlabs.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> (cherry picked from commit b122a16) --------- Signed-off-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com> Co-authored-by: Alfonso Subiotto Marqués <alfonso.subiotto@polarsignals.com> Co-authored-by: EeshanBembi <33062610+EeshanBembi@users.noreply.github.com> Co-authored-by: Eeshan <eeshan@Eeshans-MacBook-Pro.local> Co-authored-by: ebembi-crdb <ebembi@cockroachlabs.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

…7509) Closes apache#15342 Reduces the duplicate dependencies. We currently depend on bzip2 in two different ways. In attempting to reduce this, I needed to update `async-compression` which caused two different libraries to link to the system lzma library. This is not allowed in rust. This PR updates avro-rs, but we cannot merge this PR until that crate merges apache/avro-rs#284 and we remove the crates.io patch this PR contains. Update avro-rs and switch from the unmaintained xz2 crate to liblzma. Unit tests. None. This is simply a dependency update to a more recent crate. (cherry picked from commit 5765a08)

LiaCastaneda and others added 14 commits September 29, 2025 17:07

Refactor HashJoinExec to progressively accumulate dynamic filter boun…

99cedf6

…ds instead of computing them after data is accumulated (apache#17444) (#46) (cherry picked from commit 5b833b9) Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>

Downgrade sha2 from 0.10.9 to 0.10.8 (#29) (#48)

e18a1a4

* Downgrade sha2 from 0.10.9 to 0.10.8 (#29) (cherry picked from commit ca48c4d) * retrigger checks --------- Co-authored-by: Gabriel <45515538+gabotechs@users.noreply.github.com>

Make datafusion-datasource not depend on default features (#49)

0be6cd9

(cherry picked from commit 368d6f4) Co-authored-by: Gabriel Musat Mestre <gabriel.musatmestre@datadoghq.com>

Set update = none for datafusion-testing git submodule

5035d76

fix typos

ff2533c

Revert "Revert arrow upgrade and related changes" (#52)

89f1b7c

* Revert "Revert arrow upgrade and related changes (#50)" This reverts commit 5506e69. * Allow typo

turn off submodule updating for all testing submodules

d95ac0c

Merge pull request #55 from DataDog/ahmed/cherry-pick-7d294f1dc

1075a01

Cherry-pick 7d294f1 (dict type coercion fix)

Upgrade to arrow 56.1.0 (apache#17275) (#57)

f209f98

* Update to arrow/parquet 56.1.0 * Adjust for new parquet sizes, update for deprecated API * Thread through max_predicate_cache_size, add test (cherry picked from commit 980c948) Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

N-Boutaib closed this Nov 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Najib.boutaib/cherry pick commit 5765a08 #18801

Najib.boutaib/cherry pick commit 5765a08 #18801

Uh oh!

N-Boutaib commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Najib.boutaib/cherry pick commit 5765a08 #18801

Najib.boutaib/cherry pick commit 5765a08 #18801

Uh oh!

Conversation

N-Boutaib commented Nov 18, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants