Hook in agg expressions to the table.agg API (global mode only). #702

xcharleslin · 2023-03-15T05:37:03Z

Implement the table.agg API, but only for global mode. This is so we can start using ungrouped aggregations right away instead of going through eval_expression_list.

xcharleslin · 2023-03-15T05:41:27Z

src/table/ops/agg.rs

+    pub fn agg_global(&self, to_agg: &[(Expr, &str)]) -> DaftResult<Table> {
+        // Convert the input (child, name) exprs to the enum form.
+        //  e.g. (expr, "sum") to Expr::Agg(AggEXpr::Sum(expr))
+        // (We could do this at the pyo3 layer but I'm not sure why they're strings in the first place)


@jaychia @samster25 I was a bit confused about why we do (expr, agg name) in the Table.agg API. In the interest of speed I'm just rolling with it for now, but let me know if there's something I'm missing here.

So I did some more digging after you asked about it yesterday

This is an artifact of our old code where we didn't allow for evaluating on aggregation expressions, and the evaluation was handled separately outside of eval_expressions as a separate call to Block.agg which takes a string argument: https://github.com/Eventual-Inc/Daft/blob/main/daft/runners/partitioning.py#L461-L484

It does seem a little odd though and we can probably change that behavior. But yes in the interest of speed we can just keep this API first until we're fully integrated.

Maybe @samster25 can add some more color here

xcharleslin · 2023-03-15T05:44:54Z

tests/table/test_table.py

+    daft_table = Table.from_pydict({"input": input})
+    daft_table = daft_table.agg(
+        [
+            (col("input").cast(DataType.int32()).alias("count"), "count"),


These casts are necessary because the null columns get typed as Null and the aggregations are not implemented on them.

It would not be too hard to implement all aggregations on NullType, and then users can e.g. call sum() on columns without worrying if they're empty.

If this sounds like a good thing to do, I can do it as a fast follow. (@jaychia the type checking logic would also have to change though)

Yes, the type checking logic is entirely in your control, so feel free to make any changes you deem fit. I also did run into issues too with some of the table tests which run on empty arrays and resorted to casting like you do so here.

The one thing we should try to adhere to though is that things that pass type-checks should actually be runnable and vice versa :P

I can run you through the unit test which I use to do that tomorrow (https://github.com/Eventual-Inc/Daft/blob/main/tests/test_runtime_matches_static_types.py#L70-L71)

codecov · 2023-03-15T05:49:19Z

Codecov Report

Merging #702 (c6e445a) into main (676b776) will increase coverage by 0.05%.
The diff coverage is 80.48%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #702      +/-   ##
==========================================
+ Coverage   85.86%   85.91%   +0.05%     
==========================================
  Files         143      146       +3     
  Lines       11991    12155     +164     
==========================================
+ Hits        10296    10443     +147     
- Misses       1695     1712      +17

Impacted Files	Coverage Δ
daft/runners/blocks.py	`85.59% <ø> (ø)`
daft/table/table.py	`82.80% <ø> (+3.89%)`	⬆️
src/series/ops/arithmetic.rs	`99.01% <ø> (ø)`
src/dsl/expr.rs	`81.79% <69.23%> (-0.53%)`	⬇️
src/table/ops/agg.rs	`78.94% <78.94%> (ø)`
src/python/table.rs	`89.30% <100.00%> (+0.94%)`	⬆️

... and 7 files with indirect coverage changes

xcharleslin added 5 commits March 14, 2023 17:21

wip

6772f03

more wip pyo3

eb33d6c

wip groupby impl

1d51a9c

Inverse argsort done finally

0b73c63

Hook in agg expressions to the global agg API.

2a22970

xcharleslin requested a review from jaychia March 15, 2023 05:37

xcharleslin marked this pull request as ready for review March 15, 2023 05:37

rm debug

c6e445a

xcharleslin commented Mar 15, 2023

View reviewed changes

xcharleslin merged commit 62ac0c1 into main Mar 15, 2023
7 of 8 checks passed

xcharleslin deleted the charles/rs-globalagg branch March 15, 2023 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hook in agg expressions to the table.agg API (global mode only). #702

Hook in agg expressions to the table.agg API (global mode only). #702

xcharleslin commented Mar 15, 2023

xcharleslin Mar 15, 2023

jaychia Mar 15, 2023

xcharleslin Mar 15, 2023

jaychia Mar 15, 2023

codecov bot commented Mar 15, 2023

Hook in agg expressions to the table.agg API (global mode only). #702

Hook in agg expressions to the table.agg API (global mode only). #702

Conversation

xcharleslin commented Mar 15, 2023

xcharleslin Mar 15, 2023

Choose a reason for hiding this comment

jaychia Mar 15, 2023

Choose a reason for hiding this comment

xcharleslin Mar 15, 2023

Choose a reason for hiding this comment

jaychia Mar 15, 2023

Choose a reason for hiding this comment

codecov bot commented Mar 15, 2023

Codecov Report