Skip to content

fix: window logical plan#3920

Closed
f4t4nt wants to merge 44 commits intomainfrom
fix/window-logical-plan
Closed

fix: window logical plan#3920
f4t4nt wants to merge 44 commits intomainfrom
fix/window-logical-plan

Conversation

@f4t4nt
Copy link
Contributor

@f4t4nt f4t4nt commented Mar 5, 2025

Some more work towards implementing window functions for test_basic.py, unclear where gap in my understanding is - keep getting unexpected errors when running tests

@codspeed-hq
Copy link

codspeed-hq bot commented Mar 5, 2025

CodSpeed Performance Report

Merging #3920 will not alter performance

Comparing fix/window-logical-plan (338d56a) with main (3be3321)

Summary

✅ 24 untouched benchmarks

@f4t4nt f4t4nt changed the title fix/window logical plan fix: window logical plan Mar 6, 2025
@github-actions github-actions bot added the fix label Mar 6, 2025
@f4t4nt f4t4nt requested a review from kevinzwang March 6, 2025 01:54
@codecov
Copy link

codecov bot commented Mar 6, 2025

Codecov Report

Attention: Patch coverage is 69.92481% with 280 lines in your changes missing coverage. Please review.

Project coverage is 76.51%. Comparing base (567ae9a) to head (338d56a).
Report is 122 commits behind head on main.

Files with missing lines Patch % Lines
...local-execution/src/sinks/window_partition_only.rs 82.03% 76 Missing ⚠️
src/daft-dsl/src/python.rs 46.46% 53 Missing ⚠️
src/daft-dsl/src/expr/window.rs 42.22% 52 Missing ⚠️
src/daft-logical-plan/src/ops/window.rs 41.97% 47 Missing ⚠️
daft/window.py 54.90% 23 Missing ⚠️
src/daft-logical-plan/src/logical_plan.rs 45.83% 13 Missing ⚠️
.../src/optimization/rules/extract_window_function.rs 95.83% 4 Missing ⚠️
...ft-physical-plan/src/physical_planner/translate.rs 0.00% 4 Missing ⚠️
src/daft-local-plan/src/translate.rs 80.00% 3 Missing ⚠️
src/daft-local-plan/src/plan.rs 88.88% 2 Missing ⚠️
... and 3 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3920      +/-   ##
==========================================
- Coverage   78.46%   76.51%   -1.96%     
==========================================
  Files         767      788      +21     
  Lines       97108   104323    +7215     
==========================================
+ Hits        76193    79819    +3626     
- Misses      20915    24504    +3589     
Files with missing lines Coverage Δ
src/daft-dsl/src/expr/mod.rs 79.04% <ø> (+0.09%) ⬆️
src/daft-dsl/src/functions/mod.rs 81.35% <100.00%> (+0.32%) ⬆️
src/daft-dsl/src/lib.rs 100.00% <100.00%> (ø)
...rc/daft-logical-plan/src/optimization/optimizer.rs 92.35% <100.00%> (-3.12%) ⬇️
...lan/src/optimization/rules/push_down_projection.rs 89.71% <100.00%> (-4.04%) ⬇️
...cal-plan/src/optimization/rules/unnest_subquery.rs 89.59% <ø> (ø)
daft/__init__.py 19.23% <0.00%> (-3.85%) ⬇️
daft/expressions/expressions.py 93.96% <88.88%> (+0.11%) ⬆️
src/daft-local-execution/src/pipeline.rs 85.24% <90.90%> (-3.99%) ⬇️
src/daft-local-plan/src/plan.rs 94.16% <88.88%> (+0.23%) ⬆️
... and 9 more

... and 282 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@kevinzwang kevinzwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, let's also add some tests to the optimizer rule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember to remove the prints from this file too. Also, wondering if we could generalize the extraction of expressions into logical ops since we're bound to have several of these.

Comment on lines +547 to +550
LogicalPlan::Window(_) => {
// Cannot push down past a Window because it changes the window calculation results
Ok(Transformed::no(plan))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to push a projection past a window if the projection keeps all the columns necessary for window. Either way it's ok to not optimize it for now

Copy link
Member

@universalmind303 universalmind303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could implement logical window functions using only exprs. I don't think we need a logical plan node for it.

For reference, neither datafusion or polars have window functions on their logical plan, and only have it implemented as exprs (polars, datafusion)

@kevinzwang
Copy link
Contributor

@universalmind303 I don't think that's true, at least for datafusion. DataFusion has a LogicalPlan::Window which it constructs when it sees a window function in a select: https://github.com/apache/datafusion/blob/6e422e0311df1ac0ea9a9d549a1b69a0ee520dc4/datafusion/core/src/dataframe/mod.rs#L346-L351

@universalmind303
Copy link
Member

@universalmind303 I don't think that's true, at least for datafusion. DataFusion has a LogicalPlan::Window which it constructs when it sees a window function in a select: https://github.com/apache/datafusion/blob/6e422e0311df1ac0ea9a9d549a1b69a0ee520dc4/datafusion/core/src/dataframe/mod.rs#L346-L351

oh how did i miss that 🤦. Go ahead and disregard that comment then.

@f4t4nt f4t4nt requested a review from kevinzwang March 20, 2025 00:43
@f4t4nt f4t4nt force-pushed the fix/window-logical-plan branch from 0902f33 to fa64285 Compare March 21, 2025 00:20
Copy link
Contributor

@kevinzwang kevinzwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed everything except the optimization rule and the execution.

This PR is starting to get pretty unweildy to review, and probably also to follow-up on reviews from your side. Could you please separate out the PR into three stacked PRs?

  1. Just the definitions on the Python and logical plan side. Basically the skeleton, with the ability to create project nodes with window function expressions
  2. The ExtractWindowFunction optimizer rule, along with rust tests for the rule
  3. Swordfish execution for the window op

That way we can get the ball rolling a little more, and get things fully completed and into main incrementally.

@f4t4nt f4t4nt changed the base branch from main to feat/window-optimizer March 27, 2025 22:03
@f4t4nt f4t4nt changed the base branch from feat/window-optimizer to main March 27, 2025 22:03
@f4t4nt f4t4nt closed this Apr 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants