[FEAT] [New Query Planner] Groupby support, aggregation fixes, support for remaining aggregation ops #1272

clarkzinzow · 2023-08-15T01:03:00Z

This PR adds support for df.groupby(), fixes misc. things with aggregations, and adds support for the remaining (non-sum) aggregation ops.

This PR builds off of #1257, where @xcharleslin implemented the core meat of this PR (this PR just wires things together and fixes a few minor things). From that PR's description:

"Ported over the logic in our existing AggregationPlanBuilder. Groupby-aggregates should now be fully supported (including multi-partition).

Additionally, this PR improves on Daft's existing aggregation logic by using semantic IDs in intermediate results, so that redundant intermediates are not computed.

E.g. before, getting the Sum and Mean of a column would compute and carry around two copies of the intermediate sum, one for the Sum and one for the Mean. Now, all stages address their required intermediates by semantic ID, eliminating these duplicates."

codecov · 2023-08-15T01:11:17Z

Codecov Report

Merging #1272 (3fcf3f6) into main (c43c76c) will increase coverage by 0.09%.
The diff coverage is 82.35%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1272      +/-   ##
==========================================
+ Coverage   87.66%   87.75%   +0.09%     
==========================================
  Files          61       61              
  Lines        6009     6021      +12     
==========================================
+ Hits         5268     5284      +16     
+ Misses        741      737       -4

Files Changed	Coverage Δ
daft/logical/rust_logical_plan.py	`89.03% <78.57%> (+3.71%)`	⬆️
daft/execution/rust_physical_plan_shim.py	`98.27% <100.00%> (ø)`

Xiayue Charles Lin and others added 2 commits August 14, 2023 13:51

Groupbys, and fixing implementations of aggregations

d3ac815

Add remaining aggregation support.

3fcf3f6

github-actions bot added the enhancement New feature or request label Aug 15, 2023

clarkzinzow merged commit a5c702b into main Aug 15, 2023
19 of 20 checks passed

clarkzinzow deleted the clark/groupby branch August 15, 2023 01:15

clarkzinzow mentioned this pull request Aug 15, 2023

[WIP] [FEAT] Groupbys, and fixing implementations of aggregations #1257

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] [New Query Planner] Groupby support, aggregation fixes, support for remaining aggregation ops #1272

[FEAT] [New Query Planner] Groupby support, aggregation fixes, support for remaining aggregation ops #1272

clarkzinzow commented Aug 15, 2023 •

edited

Loading

codecov bot commented Aug 15, 2023

[FEAT] [New Query Planner] Groupby support, aggregation fixes, support for remaining aggregation ops #1272

[FEAT] [New Query Planner] Groupby support, aggregation fixes, support for remaining aggregation ops #1272

Conversation

clarkzinzow commented Aug 15, 2023 • edited Loading

codecov bot commented Aug 15, 2023

Codecov Report

clarkzinzow commented Aug 15, 2023 •

edited

Loading