Skip to content
This repository has been archived by the owner on May 12, 2024. It is now read-only.

How should aggregate work in the VM #16

Open
tyt2y3 opened this issue Aug 6, 2022 · 0 comments
Open

How should aggregate work in the VM #16

tyt2y3 opened this issue Aug 6, 2022 · 0 comments

Comments

@tyt2y3
Copy link
Member

tyt2y3 commented Aug 6, 2022

Reading through some existing code, still want to start a discussion.

https://github.com/SeaQL/sql-assembly/blob/e38f564785a6ba8a3542d512690ef90a059274d4/src/ic.rs#L373-L427

  1. I think aggregate should be like project, it has a source table and a destination table, and some expressions to evaluate. It's just the evaluation model is different

  2. In essence, how 'group by' works conceptually: for each row (there may be multiple group by columns), construct a tuple from the values of that row. Use that tuple as the key in a 'hash map' (it should be our own table + index implementation), where we run the 'reduce' function for each collision.

  3. we might be missing an Aggregate instruction here, and the expressions to evaluate

  4. The having construct in SQL is way too powerful, for example, we can do HAVING MAX(col3) + 1 > 10, where we have to recognize that MAX(col3) is an expression already evaluated, and we still have to evaluate the +1 part. It might be that our eager execution model does not align too well with SQL. Anyway we can leave this problem for later. May be right now we only allow a binary operator to be used as having clauses and the left operand must match one of the aggregate expressions

Or is there some simpler way to implement this?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Status: Triage
Development

No branches or pull requests

1 participant