fix: window logical plan by f4t4nt · Pull Request #3920 · Eventual-Inc/Daft

f4t4nt · 2025-03-05T23:41:43Z

Some more work towards implementing window functions for test_basic.py, unclear where gap in my understanding is - keep getting unexpected errors when running tests

…tation

codspeed-hq · 2025-03-05T23:55:26Z

CodSpeed Performance Report

Merging #3920 will not alter performance

_{Comparing fix/window-logical-plan (338d56a) with main (3be3321)}

Summary

✅ 24 untouched benchmarks

…e row ordering

codecov · 2025-03-06T19:49:52Z

Codecov Report

Attention: Patch coverage is 69.92481% with 280 lines in your changes missing coverage. Please review.

Project coverage is 76.51%. Comparing base (567ae9a) to head (338d56a).
Report is 122 commits behind head on main.

Files with missing lines	Patch %	Lines
...local-execution/src/sinks/window_partition_only.rs	82.03%	76 Missing ⚠️
src/daft-dsl/src/python.rs	46.46%	53 Missing ⚠️
src/daft-dsl/src/expr/window.rs	42.22%	52 Missing ⚠️
src/daft-logical-plan/src/ops/window.rs	41.97%	47 Missing ⚠️
daft/window.py	54.90%	23 Missing ⚠️
src/daft-logical-plan/src/logical_plan.rs	45.83%	13 Missing ⚠️
.../src/optimization/rules/extract_window_function.rs	95.83%	4 Missing ⚠️
...ft-physical-plan/src/physical_planner/translate.rs	0.00%	4 Missing ⚠️
src/daft-local-plan/src/translate.rs	80.00%	3 Missing ⚠️
src/daft-local-plan/src/plan.rs	88.88%	2 Missing ⚠️
... and 3 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3920      +/-   ##
==========================================
- Coverage   78.46%   76.51%   -1.96%     
==========================================
  Files         767      788      +21     
  Lines       97108   104323    +7215     
==========================================
+ Hits        76193    79819    +3626     
- Misses      20915    24504    +3589

Files with missing lines	Coverage Δ
src/daft-dsl/src/expr/mod.rs	`79.04% <ø> (+0.09%)`	⬆️
src/daft-dsl/src/functions/mod.rs	`81.35% <100.00%> (+0.32%)`	⬆️
src/daft-dsl/src/lib.rs	`100.00% <100.00%> (ø)`
...rc/daft-logical-plan/src/optimization/optimizer.rs	`92.35% <100.00%> (-3.12%)`	⬇️
...lan/src/optimization/rules/push_down_projection.rs	`89.71% <100.00%> (-4.04%)`	⬇️
...cal-plan/src/optimization/rules/unnest_subquery.rs	`89.59% <ø> (ø)`
daft/__init__.py	`19.23% <0.00%> (-3.85%)`	⬇️
daft/expressions/expressions.py	`93.96% <88.88%> (+0.11%)`	⬆️
src/daft-local-execution/src/pipeline.rs	`85.24% <90.90%> (-3.99%)`	⬇️
src/daft-local-plan/src/plan.rs	`94.16% <88.88%> (+0.23%)`	⬆️
... and 9 more

... and 282 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

kevinzwang

As discussed offline, let's also add some tests to the optimizer rule.

daft/daft/__init__.pyi

daft/expressions/expressions.py

src/daft-dsl/src/functions/mod.rs

src/daft-dsl/src/expr/mod.rs

kevinzwang · 2025-03-10T08:04:35Z

src/daft-logical-plan/src/optimization/rules/detect_window_function.rs

Remember to remove the prints from this file too. Also, wondering if we could generalize the extraction of expressions into logical ops since we're bound to have several of these.

kevinzwang · 2025-03-10T08:06:29Z

src/daft-logical-plan/src/optimization/rules/push_down_projection.rs

+            LogicalPlan::Window(_) => {
+                // Cannot push down past a Window because it changes the window calculation results
+                Ok(Transformed::no(plan))
+            }


We might be able to push a projection past a window if the projection keeps all the columns necessary for window. Either way it's ok to not optimize it for now

src/daft-logical-plan/src/optimization/rules/push_down_projection.rs

src/daft-logical-plan/src/optimization/rules/unnest_subquery.rs

src/daft-physical-plan/src/physical_planner/translate.rs

…unctions

universalmind303

I think we could implement logical window functions using only exprs. I don't think we need a logical plan node for it.

For reference, neither datafusion or polars have window functions on their logical plan, and only have it implemented as exprs (polars, datafusion)

kevinzwang · 2025-03-11T06:37:02Z

@universalmind303 I don't think that's true, at least for datafusion. DataFusion has a LogicalPlan::Window which it constructs when it sees a window function in a select: https://github.com/apache/datafusion/blob/6e422e0311df1ac0ea9a9d549a1b69a0ee520dc4/datafusion/core/src/dataframe/mod.rs#L346-L351

universalmind303 · 2025-03-11T17:36:13Z

@universalmind303 I don't think that's true, at least for datafusion. DataFusion has a LogicalPlan::Window which it constructs when it sees a window function in a select: https://github.com/apache/datafusion/blob/6e422e0311df1ac0ea9a9d549a1b69a0ee520dc4/datafusion/core/src/dataframe/mod.rs#L346-L351

oh how did i miss that 🤦. Go ahead and disregard that comment then.

…on methods

…parate fields

…representation

… functions

…w functions

…onsistency

…ebugging

…across platforms

… operations

…hmetic window tests

kevinzwang

Reviewed everything except the optimization rule and the execution.

This PR is starting to get pretty unweildy to review, and probably also to follow-up on reviews from your side. Could you please separate out the PR into three stacked PRs?

Just the definitions on the Python and logical plan side. Basically the skeleton, with the ability to create project nodes with window function expressions
The ExtractWindowFunction optimizer rule, along with rust tests for the rule
Swordfish execution for the window op

That way we can get the ball rolling a little more, and get things fully completed and into main incrementally.

src/daft-dsl/src/python.rs

src/daft-dsl/src/expr/mod.rs

src/daft-dsl/src/expr/window.rs

src/daft-local-execution/src/pipeline.rs

src/daft-logical-plan/src/ops/window.rs

daft/window.py

src/daft-logical-plan/src/logical_plan.rs

src/daft-logical-plan/src/optimization/rules/extract_window_function.rs

f4t4nt added 10 commits February 14, 2025 13:08

feat(window): add skeleton code for window functions and expressions

f4d2730

feat(window): implement window functions skeleton in rust

fc93093

feat(window): add window function bindings and expressions

7b7b380

feat(window): implement window function skeleton and update tests

4f5ba5c

feat(window): fix df.select in tests

0998d49

feat(window): implement window functions core and bindings

71c5fdc

test(window): retain only test_basic.py for window tests

1918fb8

fix(window): fix linter errors during window functions implementation

2173c13

fix(window): resolve logical plan errors in window functions implemen…

ea7197c

…tation

feat(window): implement window functions support in logical plan

efa0223

f4t4nt added 2 commits March 5, 2025 16:38

feat(window): initial implementation of window functions

ffca1bb

feat(window): implement window functions and add test helper to ignor…

52e49c9

…e row ordering

f4t4nt changed the title ~~fix/window logical plan~~ fix: window logical plan Mar 6, 2025

github-actions bot added the fix label Mar 6, 2025

f4t4nt requested a review from kevinzwang March 6, 2025 01:54

fix(window): skip hardcoded window test

bd5dcaf

kevinzwang mentioned this pull request Mar 6, 2025

feat/window functions implementation #3911

Closed

fix(tests): correct pytest skip decorator in window tests

f7aa71d

kevinzwang reviewed Mar 10, 2025

View reviewed changes

f4t4nt added 3 commits March 10, 2025 14:13

refactor(window): remove unused window function code

435aa14

refactor(window): simplify naming convention for empty input window f…

ee2ba53

…unctions

refactor(window): clarify min_periods comment

bd44148

universalmind303 reviewed Mar 11, 2025

View reviewed changes

f4t4nt added 3 commits March 11, 2025 14:31

feat(window): implement window partition only sink

407c7ec

refactor(window): remove non-agg window functions

dd58c15

refactor(window): rename DetectWindowFunction to ExtractWindowFunction

7dee0b5

f4t4nt added 13 commits March 11, 2025 15:21

refactor(window): remove unused WindowExpr struct

261ba6b

refactor(window): remove println debugging statements

40c51c0

refactor(logical-plan): undo changes to arguments in PushDownProjecti…

d2dcc40

…on methods

refactor(logical-plan): use WindowSpec in Window struct instead of se…

cb0d889

…parate fields

fix(unnest-subquery): move Window operator to non-pullable case

a7197ec

refactor(window): streamline WindowFrameBoundary with unified offset …

26a48ab

…representation

refactor(window): make WindowSpec methods non-consuming

ee07b73

refactor(window): remove redundant WindowBoundary wrapper

678d380

fix(window): properly expose window boundary classes in Python

7ed367c

fix(window): Window -> WindowSpec in __init__.pyi

2d1ca94

refactor(window): some clean up of unused window functions ie lag/lead

0c3eec6

fix(window): implement join-based window partition aggregation

d0cab64

fix(window): implement initial optimization for partition-only window…

79c2ad1

… functions

f4t4nt requested a review from kevinzwang March 20, 2025 00:43

fix(window): use partition_by_value to avoid hash collisions in windo…

fa64285

…w functions

f4t4nt force-pushed the fix/window-logical-plan branch from 0902f33 to fa64285 Compare March 21, 2025 00:20

f4t4nt added 10 commits March 20, 2025 17:54

fix(window): ensure deterministic partition keys for cross-platform c…

0cd6211

…onsistency

fix(window): add debug logging for cross-platform partition issues

5832091

fix(window): replace logging with println for better cross-platform d…

37b3685

…ebugging

fix(window): add debug prints to diagnose platform differences

f249599

fix(window): process all partitions together for consistent behavior …

eaa128c

…across platforms

feat(window): implement parallel processing for window partition-only…

66d4e2f

… operations

refactor(window): clean up window partition code and fix linter warnings

1714d09

fix(window): create unique names for multiple window functions

4fe92d1

test(window): skip window tests if not running with native runner

5302a05

test(window): rename test file to test_partition_only.py and add arit…

338d56a

…hmetic window tests

kevinzwang reviewed Mar 26, 2025

View reviewed changes

f4t4nt changed the base branch from main to feat/window-optimizer March 27, 2025 22:03

f4t4nt changed the base branch from feat/window-optimizer to main March 27, 2025 22:03

f4t4nt closed this Apr 1, 2025

Conversation

f4t4nt commented Mar 5, 2025

Uh oh!

codspeed-hq bot commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #3920 will not alter performance

Summary

Uh oh!

codecov bot commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kevinzwang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kevinzwang Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

kevinzwang Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

universalmind303 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevinzwang commented Mar 11, 2025

Uh oh!

universalmind303 commented Mar 11, 2025

Uh oh!

kevinzwang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codspeed-hq bot commented Mar 5, 2025 •

edited

Loading

codecov bot commented Mar 6, 2025 •

edited

Loading

universalmind303 left a comment •

edited

Loading