Improve sqllogicteset speed by creating only a single large file rather than 2 by Tim-53 · Pull Request #20586 · apache/datafusion

Tim-53 · 2026-02-26T23:00:59Z

Draft as it builds on #20576

Which issue does this PR close?

Part of Speedup execution of sqllogictests with more parallelization #20524
Follow on to Speedup sqllogictests by running long running tests first #20576 from @alamb

Rationale for this change

Execution time of the test is dominated by the time writing the parquet files. By reusing the file we can gain around 30% improvement on the execution time here.

What changes are included in this PR?

Building on #20576 we reuse the needed parquet file for the test instead of recreating it.

Are these changes tested?

Ran the test with following results:

	Baseline (2 files)	Optimized (1 file)
Min	33.000s	22.653s
Max	37.662s	25.489s
Avg	34.427s	24.092s

One open question: does the correctness of this regression test rely on having two physically separate files? The race condition in #17197 was in the execution layer — both scans would still be independent DataSourceExec nodes with independent readers, so I believe the behavior is preserved. But if there's any concern, we could use system cp to copy the file and register two physical files while still only paying the generate_series cost once.

Are there any user-facing changes?

- Implement tests for push down filters in outer joins, ensuring filters are applied correctly based on join conditions. - Introduce tests for push down filters with Parquet files, including scenarios with limits and dynamic filters. - Add regression tests to address specific issues related to filter pushdown, ensuring stability and correctness. - Include tests for unnest operations with filters, verifying that filters are pushed down appropriately based on the context.

alamb · 2026-02-27T11:56:06Z

Thank you 🙏

I left a note on

Fix HashJoinExec sideways information passing for partitioned queries #17197 (comment)

Asking the original authors if they could double check

adriangb · 2026-02-27T12:03:39Z

datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

    }
 }

+// trigger ci test


Can be removed? Also just in case it's helpful: git commit -m "ci" --allow-empty --no-verify

(I think this is left over from #20566 -- when this PR gets rebased it should be removed)

adriangb

I don't think the test needs two physically distinct files. As long as it's two different execution nodes that should be good enough!

kosiew and others added 4 commits February 26, 2026 13:50

trigger sqllogictest

e8369bb

Run log running .slt tests first

b2d2635

reuse parquet file in push_down_filter_regression test

acce9a4

github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Feb 26, 2026

Tim-53 mentioned this pull request Feb 26, 2026

Speedup sqllogictests by running long running tests first #20576

Open

alamb changed the title ~~Perf/reuse parquet file push down filter regression~~ Improve sqllogicteset speed by creating only a single large file rather than 2 Feb 27, 2026

alamb mentioned this pull request Feb 27, 2026

Fix HashJoinExec sideways information passing for partitioned queries #17197

Merged

alamb mentioned this pull request Feb 27, 2026

Split push_down_filter.slt into standalone sqllogictest files to reduce long-tail runtime #20566

Merged

adriangb reviewed Feb 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve sqllogicteset speed by creating only a single large file rather than 2#20586

Improve sqllogicteset speed by creating only a single large file rather than 2#20586
Tim-53 wants to merge 4 commits intoapache:mainfrom
Tim-53:perf/reuse-parquet-file-push-down-filter-regression

Tim-53 commented Feb 26, 2026

Uh oh!

alamb commented Feb 27, 2026

Uh oh!

adriangb Feb 27, 2026

Uh oh!

alamb Feb 27, 2026 •

edited

Loading

Uh oh!

adriangb left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Tim-53 commented Feb 26, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb commented Feb 27, 2026

Uh oh!

adriangb Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

alamb Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adriangb left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alamb Feb 27, 2026 •

edited

Loading