[FLINK-6373] Add runtime support for distinct aggregation over grouped windows #3765

haohui · 2017-04-24T21:18:01Z

No description provided.

haohui · 2017-04-24T21:27:42Z

Note that this PR contains minimal amount of tests. Would love the feedbacks on what kinds of tests are required here.

fhueske · 2017-04-25T08:03:41Z

Hi @haohui, thanks for the PR! I like the approach of the wrapping distinct aggregator.

Unfortunately, this approach won't work with the upcoming changes for the the UDAGG interface. The AggregateFunction interface won't define methods to accumulate, etc. Instead, these methods can be implemented for different types, will be identified by reflection and called from generated code. We handle scalar and table functions the same way. See PR #3762 for how the AggregateFunction interface will evolve.

Best, Fabian

haohui · 2017-04-27T21:26:09Z

Updated the PR to codegen the parts used by distinct accumulator. Each column is calculated independently.

fhueske · 2017-04-28T07:50:57Z

Hi @haohui,

I suggested before that PR #3771 might be used for DISTINCT group window functions. However, this does not work because we cannot register state for an AggregateFunction. The benefit of the approach of #3771 would have been that it does not need to deserialize the Map every time a record is accumulated (or retracted). Instead the distinct values are kept in a MapState that can be accessed (and deserialized) per look up key. But this approach does not work with the AggregateFunction that we use for early aggregation.

To be honest, I'm a bit concerned about the performance of the approach of this PR because the state of the DistinctAccumulator accumulator (i.e., the complete map) will be de/serialized every time we access it.

I think we can use this approach for now, but should look out, whether we can use an approach similar to the batch side where distinct aggregations (on different keys) are translated into multiple aggregations which are later joined together (the join would be rather cheap because its a 1-to-1 join).

I'll have a look at this PR later today.
Thanks, Fabian

sunjincheng121 · 2017-05-02T02:09:57Z

Hi @haohui @fhueske I am very interested in DISTINCT, Let me share some ideas about this:
First up, in standard database there are two situations can using DISTINCT keyword.

in SELECT Clause, e.g.: SELECT DISTINCT name FROM table
in AGG Clause, e.g.: COUNT([ALL|DISTINCT] expression),SUM([ALL|DISTINCT] expression), etc.

In this post,we talk about AGG Clause. The DISTINCT keyword tells the database system to aggregate only the distinct, or unique, values within the scope of the aggregate function. i.e. database system will deal with the DISTINCT keyword, and put the unique value into AGG. Based on this understanding, I think FLINK FRAMEWORK(not the AGG) should deal with the DISTINCT keyword. we do not need DistinctAccumulator.java. About GROUP WINDOW, I think I like analyze whether the data is duplicated in XXXWindowFunction and DataSetXXXAggFunction, And add Map[String, Boolean] variable Mapping aggName -> isFirstTimeProcess( identifies whether the data is duplicated as a parameter of GeneratedAggregationsFunction. GeneratedAggregationsFunction process data according to aggCall.isDistinct(we must keep this boolean value in GeneratedAggregationsFunction) and isFirstTimeProcess. What do you think? @haohui @fhueske

Best,
SunJincheng

…d windows.

fhueske · 2018-05-07T15:56:13Z

The features that this PR was going to implement has been resolved by PR #5555.
I will close it.

…eaming tables. This closes apache#3764. This closes apache#3765. // Has been resolved by another PR.

haohui force-pushed the FLINK-6373 branch from e3993fc to 12f5e01 Compare April 25, 2017 07:31

haohui force-pushed the FLINK-6373 branch from 12f5e01 to 4923366 Compare April 27, 2017 21:24

haohui force-pushed the FLINK-6373 branch from 4923366 to 8c8ec62 Compare May 6, 2017 18:27

[FLINK-6335] Parse DISTINCT over grouped windows in stream SQL.

c8c1d24

haohui force-pushed the FLINK-6373 branch from 8c8ec62 to 824cc85 Compare May 8, 2017 21:36

[FLINK-6373] Add runtime support for distinct aggregation over groupe…

e4a571e

…d windows.

haohui force-pushed the FLINK-6373 branch from 824cc85 to e4a571e Compare May 8, 2017 23:20

fhueske pushed a commit to fhueske/flink that referenced this pull request May 7, 2018

[FLINK-8690] [table] Add DISTINCT aggregates for group windows on str…

1743055

…eaming tables. This closes apache#3764. This closes apache#3765. // Has been resolved by another PR.

asfgit closed this in d65d932 May 7, 2018

sampathBhat pushed a commit to sampathBhat/flink that referenced this pull request Jul 26, 2018

[FLINK-8690] [table] Add DISTINCT aggregates for group windows on str…

901d44a

…eaming tables. This closes apache#3764. This closes apache#3765. // Has been resolved by another PR.

rmetzger added the component=API/TableSQL label Mar 14, 2019

flinkbot added component=TableSQL/API and removed component=API/TableSQL labels Mar 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FLINK-6373] Add runtime support for distinct aggregation over grouped windows #3765

[FLINK-6373] Add runtime support for distinct aggregation over grouped windows #3765

Uh oh!

haohui commented Apr 24, 2017

Uh oh!

haohui commented Apr 24, 2017

Uh oh!

fhueske commented Apr 25, 2017

Uh oh!

haohui commented Apr 27, 2017

Uh oh!

fhueske commented Apr 28, 2017

Uh oh!

sunjincheng121 commented May 2, 2017 •

edited

Loading

Uh oh!

fhueske commented May 7, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[FLINK-6373] Add runtime support for distinct aggregation over grouped windows #3765

[FLINK-6373] Add runtime support for distinct aggregation over grouped windows #3765

Uh oh!

Conversation

haohui commented Apr 24, 2017

Uh oh!

haohui commented Apr 24, 2017

Uh oh!

fhueske commented Apr 25, 2017

Uh oh!

haohui commented Apr 27, 2017

Uh oh!

fhueske commented Apr 28, 2017

Uh oh!

sunjincheng121 commented May 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fhueske commented May 7, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sunjincheng121 commented May 2, 2017 •

edited

Loading