Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize aggregation on primitive (non-link) grouping columns #5

Open
asavinov opened this issue Jul 4, 2021 · 0 comments
Open
Labels
operation Tasks related to how data operations work
Projects

Comments

@asavinov
Copy link
Owner

asavinov commented Jul 4, 2021

Problem: the aggregation operation works only with link columns for grouping. It is a column operation which adds a new aggregate column to the group table. The group table must exist, and a link from a fact table (with data to be aggregated) must also exist. However, it cannot be applied for a tradition use case of groupby where we take a fact table by specifying one of its column as a grouping criterion. The problem is that the grouping table does not exist and hence we cannot define a new column for it.

In this task, we want to make the aggregate operation work in the case the table where a new aggregate column has to be added does not exist:

  • Since the table for a new aggregate column does not exist, it has to be created, and hence it becomes a table-column operation which produces three new elements: one table, one link column and one aggregate column
  • The grouping criterion can be an attribute of the fact table with the source data to be aggregated (not a link)
  • We distinguish two cases:
    • Our currently implemented use case where the grouping criterion is an already existing link column
    • To be implemented use case where the grouping criterion is a list of attributes (or columns?) without an existing target table
  • We actually need to combine two definition parameters:
    • define a projection (source attributes, link name, group table name)
    • define an aggregation (measure columns, link name, group table name, new aggregate column name)

It seems that it is difficult and not natural to combine two operations: projection and aggregation. Therefore, the groupby use case probably should be indeed implemented as two operations: project and aggregate. In this case, this task has not to be implemented.

@asavinov asavinov added this to To do in prosto via automation Jul 4, 2021
@asavinov asavinov added the operation Tasks related to how data operations work label Jul 11, 2021
@asavinov asavinov changed the title [Operations] Generalize aggregation on primitive (non-link) grouping columns Generalize aggregation on primitive (non-link) grouping columns Jul 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
operation Tasks related to how data operations work
Projects
prosto
To do
Development

No branches or pull requests

1 participant