Skip to content
This repository has been archived by the owner on May 12, 2024. It is now read-only.

Groupby and Aggregates - Instructions and Codegen overhaul #26

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

Samyak2
Copy link
Collaborator

@Samyak2 Samyak2 commented Feb 21, 2023

Description

We took a second look at the GroupBy instruction and found it very lacking (see #16). This PR represents an overhaul of it.

Partially fixes #24

Changes

  • Add a separate Aggregate instruction to calculate aggregations on a grouped table
  • Common variables used in codegen are moved to a CodegenContext struct to make passing them around easier.
  • GroupBy will now produce a grouped table which will be lazily evaluated. Note: the evaluation is not in scope for this PR.
  • Add tests for these cases.
  • Update existing tests.
  • Non-grouped aggregate functions.

Some questions/concerns

  • This design was chosen because it introduces minimal changes to the existing architecture. Mainly that only aggregate operations need to be specialized for Grouped tables, the rest remains the same. But with the codegen, the non-aggregate projects are done on the same register indexes. That needs to be fixed.

@Samyak2 Samyak2 added the enhancement New feature or request label Feb 21, 2023
@Samyak2 Samyak2 requested a review from tyt2y3 February 21, 2023 19:00
@tyt2y3
Copy link
Member

tyt2y3 commented Feb 22, 2023

Are there test cases illustrating what code will be generated given the sample SQL we use in the blog post?

@Samyak2
Copy link
Collaborator Author

Samyak2 commented Feb 23, 2023

Not yet. I'm working on it currently.

Copy link
Collaborator Author

@Samyak2 Samyak2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now complete. Took a lot longer than expected :(

For this query:

select
    col1,
    col3 + 1,
    max(col2) + 1,
    count(1) as count
from abc
group by col1, col3

This is the generated code:

Source { index: %0, name: abc }
Empty { index: %1 }
Project { input: %0, output: %1, expr: column 'col2', alias: __otter_temp_col_1 }
Project { input: %0, output: %1, expr: 1, alias: __otter_temp_col_3 }
Empty { index: %2 }
GroupBy { input: %1, output: %2, expr: column 'col1' }
Empty { index: %3 }
GroupBy { input: %2, output: %3, expr: column 'col3' }
Empty { index: %4 }
Aggregate { input: %3, output: %4, func: Agg(max), col_name: __otter_temp_col_1, alias: __otter_temp_col_2 }
Aggregate { input: %3, output: %4, func: Agg(count), col_name: __otter_temp_col_3, alias: __otter_temp_col_4 }
Empty { index: %5 }
Project { input: %0, output: %5, expr: column 'col1', alias: None }
Project { input: %0, output: %5, expr: (column 'col3' + 1), alias: None }
Project { input: %4, output: %5, expr: (column '__otter_temp_col_2' + 1), alias: None }
Project { input: %4, output: %5, expr: column '__otter_temp_col_4', alias: count }
Return { index: %5 }

@tyt2y3
Copy link
Member

tyt2y3 commented May 17, 2023

Interesting, is there an example for the following too?

select
    col1,
    col3 + 1,
    max(col2 + 1)
from abc
group by col1, col3

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
Status: Next Up
Development

Successfully merging this pull request may close these issues.

GROUP BY support
2 participants