Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal #215: Window EXCLUDE Functionality #9220

Merged
merged 55 commits into from
Oct 16, 2023
Merged

Conversation

hawkfish
Copy link
Contributor

@hawkfish hawkfish commented Oct 4, 2023

Implement the SQL windowing EXCLUDE functionality started by @j-w-moebius. This includes:

  • Parser and Serialisation extensions
  • Implementations and tests for all accelerators
  • Changes to the custom window API

Despite the increased complexity of the calling sequences, performance appears unaffected.

fixes: duckdblabs/duckdb-internal#215

j-w-moebius and others added 30 commits July 19, 2023 12:28
Reverse the expected order of one test
now that we have deterministic ordering of single partitions.
Move the code for WindowSegmentTree::EvaluateInternal to WindowSegmentTreeState.
This hides the implementation details and prepares to fix some memory leaks.
Fix the right side memory leak.
Avoid unnecessary per-chunk memory allocation.
Plumb the setting through and throw if it is not supported.
Change API to pass frame lists instead of window setting.
Support the EXCLUDE modifiers for COUNT(*).
Use swap instead of copying.
Richard Wesley added 15 commits September 16, 2023 15:04
Convert ReuseIndexes to handle multiple ranges.
Only tested for single range.
Reduce the insert range of Reuse to the current frames
instead of the union of current and previous frames.
Back out range restriction.
Add median tests.
Make the new exclude clause member optional so we don't have to bump the file version.
Still broken but should be closer. Probably just needs a `GetDefault`?
Updated json file with default value. Works now.
Add sorting to handle new work stealing.
Since not all custom window functions need the previous frame,
push responsibility down to the function states that do.
Pass around vectors so we have fewer silly arguments.
Factor out the frame intersection logic for reuse.
Fix the ModeState to have proper constructors and destructors
and then call them.
Hack around memory leak in PerfectAggregateHashTable.
Copy link
Contributor

@taniabogatsch taniabogatsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @hawkfish! Thanks for the PR, EXCLUDE looks like a pretty neat extension to the windowing functionality. I added some comments and nits.

From my understanding, the significant change here is the switch from a single FrameBounds parameter to a vector of those? From what I can tell, we then determine what to exclude by setting the validity mask accordingly.

test/sql/window/test_invalid_window.test Outdated Show resolved Hide resolved
src/function/aggregate/distributive/count.cpp Outdated Show resolved Hide resolved
src/execution/window_executor.cpp Show resolved Hide resolved
src/execution/window_segment_tree.cpp Outdated Show resolved Hide resolved
Incorporate review feedback.
Most notably, the ValidityMask object is smart enough to handle
the AllValid situation without constructing extra bitmasks.
@github-actions github-actions bot marked this pull request as draft October 5, 2023 23:40
@taniabogatsch
Copy link
Contributor

Thanks for all the comments and explanations on my notes! LGTM now.

  • I mentioned the size_t stuff because of our C++ Guidelines written down here
  • As for using, it is personal preference, and I understand your reasoning, I was mostly commenting because I hadn't seen it much/anywhere else in our codebase yet! :)

@hawkfish hawkfish marked this pull request as ready for review October 12, 2023 20:05
@Mytherin Mytherin merged commit e47be52 into duckdb:feature Oct 16, 2023
45 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

@hawkfish hawkfish deleted the exclude branch October 16, 2023 14:22
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Dec 11, 2023
Merge pull request duckdb/duckdb#9164 from Mause/feature/jdbc-uuid-param
Merge pull request duckdb/duckdb#9185 from pdet/adbc_07
Merge pull request duckdb/duckdb#9126 from Maxxen/parquet-kv-metadata
Merge pull request duckdb/duckdb#9123 from lnkuiper/parquet_schema
Merge pull request duckdb/duckdb#9086 from lnkuiper/json_inconsistent_structure
Merge pull request duckdb/duckdb#8977 from Tishj/python_readcsv_multi_v2
Merge pull request duckdb/duckdb#9279 from hawkfish/nsdate-cast
Merge pull request duckdb/duckdb#8851 from taniabogatsch/binary_lambdas
Merge pull request duckdb/duckdb#8983 from Maxxen/types/fixedsizelist
Merge pull request duckdb/duckdb#9318 from Maxxen/fix-unused
Merge pull request duckdb/duckdb#9220 from hawkfish/exclude
Merge pull request duckdb/duckdb#9230 from Maxxen/json-plan-serialization
Merge pull request duckdb/duckdb#9011 from Tmonster/add_create_statement_support_to_fuzzer
Merge pull request duckdb/duckdb#9400 from Maxxen/array-fixes
Merge pull request duckdb/duckdb#8741 from Tishj/python_import_cache_upgrade
Merge fixes
Merge pull request duckdb/duckdb#9395 from taniabogatsch/lambda-performance
Merge pull request duckdb/duckdb#9427 from Tishj/python_table_support_replacement_scan
Merge pull request duckdb/duckdb#9516 from carlopi/fixformat
Merge pull request duckdb/duckdb#9485 from Maxxen/fix-parquet-serialization
Merge pull request duckdb/duckdb#9388 from chrisiou/issue217
Merge pull request duckdb/duckdb#9565 from Maxxen/fix-array-vector-sizes
Merge pull request duckdb/duckdb#9583 from carlopi/feature
Merge pull request duckdb/duckdb#8907 from cryoEncryp/new-list-functions
Merge pull request duckdb/duckdb#8642 from Virgiel/capi-streaming-arrow
Merge pull request duckdb/duckdb#8658 from Tishj/pytype_optional
Merge pull request duckdb/duckdb#9040 from Light-City/feature/set_mg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants