Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] [New Query Planner] Groupby support, aggregation fixes, support for remaining aggregation ops #1272

Merged
merged 2 commits into from
Aug 15, 2023

Conversation

clarkzinzow
Copy link
Contributor

@clarkzinzow clarkzinzow commented Aug 15, 2023

This PR adds support for df.groupby(), fixes misc. things with aggregations, and adds support for the remaining (non-sum) aggregation ops.

This PR builds off of #1257, where @xcharleslin implemented the core meat of this PR (this PR just wires things together and fixes a few minor things). From that PR's description:

"Ported over the logic in our existing AggregationPlanBuilder. Groupby-aggregates should now be fully supported (including multi-partition).

Additionally, this PR improves on Daft's existing aggregation logic by using semantic IDs in intermediate results, so that redundant intermediates are not computed.

E.g. before, getting the Sum and Mean of a column would compute and carry around two copies of the intermediate sum, one for the Sum and one for the Mean. Now, all stages address their required intermediates by semantic ID, eliminating these duplicates."

@github-actions github-actions bot added the enhancement New feature or request label Aug 15, 2023
@codecov
Copy link

codecov bot commented Aug 15, 2023

Codecov Report

Merging #1272 (3fcf3f6) into main (c43c76c) will increase coverage by 0.09%.
The diff coverage is 82.35%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1272      +/-   ##
==========================================
+ Coverage   87.66%   87.75%   +0.09%     
==========================================
  Files          61       61              
  Lines        6009     6021      +12     
==========================================
+ Hits         5268     5284      +16     
+ Misses        741      737       -4     
Files Changed Coverage Δ
daft/logical/rust_logical_plan.py 89.03% <78.57%> (+3.71%) ⬆️
daft/execution/rust_physical_plan_shim.py 98.27% <100.00%> (ø)

@clarkzinzow clarkzinzow merged commit a5c702b into main Aug 15, 2023
19 of 20 checks passed
@clarkzinzow clarkzinzow deleted the clark/groupby branch August 15, 2023 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant