-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-6373] Add runtime support for distinct aggregation over grouped windows #3765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note that this PR contains minimal amount of tests. Would love the feedbacks on what kinds of tests are required here. |
|
Hi @haohui, thanks for the PR! I like the approach of the wrapping distinct aggregator. Unfortunately, this approach won't work with the upcoming changes for the the UDAGG interface. The Best, Fabian |
|
Updated the PR to codegen the parts used by distinct accumulator. Each column is calculated independently. |
|
Hi @haohui, I suggested before that PR #3771 might be used for DISTINCT group window functions. However, this does not work because we cannot register state for an AggregateFunction. The benefit of the approach of #3771 would have been that it does not need to deserialize the Map every time a record is accumulated (or retracted). Instead the distinct values are kept in a MapState that can be accessed (and deserialized) per look up key. But this approach does not work with the AggregateFunction that we use for early aggregation. To be honest, I'm a bit concerned about the performance of the approach of this PR because the state of the DistinctAccumulator accumulator (i.e., the complete map) will be de/serialized every time we access it. I think we can use this approach for now, but should look out, whether we can use an approach similar to the batch side where distinct aggregations (on different keys) are translated into multiple aggregations which are later joined together (the join would be rather cheap because its a 1-to-1 join). I'll have a look at this PR later today. |
|
Hi @haohui @fhueske I am very interested in
In this post,we talk about Best, |
|
The features that this PR was going to implement has been resolved by PR #5555. |
…eaming tables. This closes apache#3764. This closes apache#3765. // Has been resolved by another PR.
…eaming tables. This closes apache#3764. This closes apache#3765. // Has been resolved by another PR.
No description provided.