Improve sqllogicteset speed by creating only a single large file rather than 2#20586
Draft
Tim-53 wants to merge 4 commits intoapache:mainfrom
Draft
Improve sqllogicteset speed by creating only a single large file rather than 2#20586Tim-53 wants to merge 4 commits intoapache:mainfrom
Tim-53 wants to merge 4 commits intoapache:mainfrom
Conversation
- Implement tests for push down filters in outer joins, ensuring filters are applied correctly based on join conditions. - Introduce tests for push down filters with Parquet files, including scenarios with limits and dynamic filters. - Add regression tests to address specific issues related to filter pushdown, ensuring stability and correctness. - Include tests for unnest operations with filters, verifying that filters are pushed down appropriately based on the context.
Contributor
|
Thank you 🙏 I left a note on Asking the original authors if they could double check |
adriangb
reviewed
Feb 27, 2026
| } | ||
| } | ||
|
|
||
| // trigger ci test |
Contributor
There was a problem hiding this comment.
Can be removed? Also just in case it's helpful: git commit -m "ci" --allow-empty --no-verify
Contributor
There was a problem hiding this comment.
(I think this is left over from #20566 -- when this PR gets rebased it should be removed)
adriangb
reviewed
Feb 27, 2026
Contributor
adriangb
left a comment
There was a problem hiding this comment.
I don't think the test needs two physically distinct files. As long as it's two different execution nodes that should be good enough!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft as it builds on #20576
Which issue does this PR close?
Rationale for this change
Execution time of the test is dominated by the time writing the parquet files. By reusing the file we can gain around 30% improvement on the execution time here.
What changes are included in this PR?
Building on #20576 we reuse the needed parquet file for the test instead of recreating it.
Are these changes tested?
Ran the test with following results:
One open question: does the correctness of this regression test rely on having two physically separate files? The race condition in #17197 was in the execution layer — both scans would still be independent
DataSourceExecnodes with independent readers, so I believe the behavior is preserved. But if there's any concern, we could usesystem cpto copy the file and register two physical files while still only paying thegenerate_seriescost once.Are there any user-facing changes?