-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New operator map partition function #42
New operator map partition function #42
Conversation
@@ -23,52 +23,19 @@ | |||
import java.util.Map; | |||
import java.util.Set; | |||
|
|||
import eu.stratosphere.api.common.operators.base.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run "mvn verify" locally. We have maven checkstyle in place. It forbids using star imports.
Ufuk Celebi on dev@flink.incubator.apache.org replies:
if you only want to run checkstyle. |
I think this will be a nice addition to the API. Would it make sense to rename the operator to flatMapPartition? It might be confusing in relation to the existing map and flatMap functions. Map is a 1:1 mapping wheras a flatMap is the 1:n mapping. The new operator is called |
*/ | ||
@Override | ||
protected void computeOperatorSpecificDefaultEstimates(DataStatistics statistics) { | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you leave this method empty?
I think it should be something similar to the regular map:
this.estimatedNumRecords = getPredecessorNode().getEstimatedNumRecords();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we cannot do any predictions about the output of this operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In FlatMapNode we use getPredecessorNode().getEstimatedNumRecords() * 5
as an estimate and in MapNode getPredecessorNode().getEstimatedNumRecords()
. What's the difference which prevents us from doing the same here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would guess that a PartitionMap is used when multiple values should be somehow (in a non-deterministic way) aggregated. Otherwise one would (and should!) use a Map- or FlatMapFunction.
An estimate which is based on the input card is not a good idea, IMHO.
I would assume that the output card is more likely to be around 1 * DOP, but this is just a gut feeling ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with fabian. Let's leave this empty, because there are really no good assumptions to make for this function.
I agree with @uce's argumentation regarding the name. I would like to merge this change soon. |
|
||
@Override | ||
public DriverStrategy getStrategy() { | ||
return DriverStrategy.MAP; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be MAP_PARTITION.
In your code, I guess the strategy is never taken from here, but still, this is a setup for future bugs.
This is a good addition. There are some comments that need addressing before this is mergeable. Also, can you undo changes to files that have nothing to do with the code changes:
|
Any updates here? |
how can i revert my changes from the wordcount example? I just revert to the last version, is that correct? |
Yes, If you want to undo the changes from the last commit, just nuke it away and force push into the branch that the PR is based on. The commit will then disappear. |
Borrow some code from openjdk's version of this script (from which this image is downstream) to automatically determine the architectures supported by the upstream image. Closes apache#41
This closes apache#42
No description provided.