How should aggregate work in the VM #16

tyt2y3 · 2022-08-06T09:29:19Z

Reading through some existing code, still want to start a discussion.

I think aggregate should be like project, it has a source table and a destination table, and some expressions to evaluate. It's just the evaluation model is different
In essence, how 'group by' works conceptually: for each row (there may be multiple group by columns), construct a tuple from the values of that row. Use that tuple as the key in a 'hash map' (it should be our own table + index implementation), where we run the 'reduce' function for each collision.
we might be missing an Aggregate instruction here, and the expressions to evaluate
The having construct in SQL is way too powerful, for example, we can do HAVING MAX(col3) + 1 > 10, where we have to recognize that MAX(col3) is an expression already evaluated, and we still have to evaluate the +1 part. It might be that our eager execution model does not align too well with SQL. Anyway we can leave this problem for later. May be right now we only allow a binary operator to be used as having clauses and the left operand must match one of the aggregate expressions

Or is there some simpler way to implement this?

The text was updated successfully, but these errors were encountered:

Samyak2 mentioned this issue Jan 24, 2023

GROUP BY support #24

Open

3 tasks

Samyak2 mentioned this issue Feb 21, 2023

Groupby and Aggregates - Instructions and Codegen overhaul #26

Open

6 tasks

Provide feedback