-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-1293] Add support for out-of-place aggregations #243
Conversation
import org.junit.Before; | ||
import org.junit.Test; | ||
|
||
public class AggregationApi1Test { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to AggregationTest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AggregationTest is a better name because the alternative API is no longer in the branch.
However, I'm not sure about the location of the test. It looks more like a unit test because it checks various correct call patterns and error conditions. However, for many tests it constructs a complete execution environment and consequently runs slow.
In addition it cannot be in the flink-java package because the execution environment is in flink-runtime which depends on flink-java and then you have a circular dependency.
I think the best place would be flink-tests but I don't want to have it run as an integration test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you separate the pre-execution tests and the tests that check for result correctness into two classes?
The other operator have pre-flight tests as unit tests in flink-java and the tests that execute programs as IntegrationTests in flink-test.
Thanks for your pull request! Apart from Scala API support, there are several ways how this PR could be extended:
The question is how much do we want to squeeze into this PR. I'd say let's add support for Scala and treat the other issues as improvements over this work. @he-sk If you want to try the Scala API support, you could have a look at how the old Aggregation operator was ported. |
Thank you for reviewing the pull request! Regarding the 2. point (POJO support). I have once started working on that. I think the best approach would be to add a method to the serializers which allows to access fields. Once the serialisation for POJOs has become more efficient, we can actually throw away the Tuple specific code and handle them like POJOs. |
6343785
to
ee24d78
Compare
@he-sk are you working on this PR? |
@fhueske I'm updating it right now. |
Cool, thanks! :-) Let me know, when you want somebody to have a look again. |
@fhueske No, I just updated it with the suggestions that you provided in your comments on the code. |
@fhueske First, thanks for your review. I've included the suggestions that you had above. I've also separated the unit tests and the integration tests and moved them from flink-java-examples to flink-java and flink-test respectively. I hope to work on the Scala API next. |
Sounds good! What do you think? Would that work? |
I think it would work. I just checked that a Java enum (i.e., the old |
Hi @he-sk |
Hi @fhueske, |
Is this PR still alive? |
Thanks for the ping @tonycox. |
Ok, cool. I can go through abandoned PRs and add them to https://issues.apache.org/jira/browse/FLINK-5384. Wouldn't you mind @fhueske ? |
That would be nice @tonycox :-) |
…lose(). This closes apache#3133 This closes apache#243 // closing stale PR
This patch adds support for multiple aggregations on the same field and aggregations which change the output type, e.g., count and average.
It adds the following aggregation syntax to the Java API:
Five aggregation functions are supported: min, max, sum, count and average. Notice that count does not take a field reference in the example above.
Internally, the aggregation is implemented by the following operator chain:
Average is implemented to reuse an existing sum and/or count function. In the last example above, the intermediate tuple contains only 3 fields: the key, a field to compute the sum, and a field to compute the count. The field holding the average is added after the Reduce operator in the Map2 operator.
Currently, only the Java API is implemented. My Scala knowledge is fairly limited, so it would be great if somebody else could pick that up.
Also, the result type of aggregate is simply <T extends Tuple>. To support type inference at compile time, it is necessary to integrate the work by Chen (#194).