New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-2590] fixing DataSetUtils.zipWithUniqueId() #1075
Conversation
s1ck
commented
Aug 29, 2015
- modified algorithm as explained in the issue
- updated method documentation
* modified algorithm as explained in the issue * updated method documentation
Thanks a lot for the contribution. |
@rmetzger +1. I think add a test is helpful. |
@@ -121,6 +122,7 @@ public void mapPartition(Iterable<T> values, Collector<Tuple2<Long, T>> out) thr | |||
|
|||
return input.mapPartition(new RichMapPartitionFunction<T, Tuple2<Long, T>>() { | |||
|
|||
long maxLength = log2(Long.MAX_VALUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make this static final
+1 for a test, otherwise this looks good! |
There is already a test case for zipWithUniqueId() in https://github.com/apache/flink/blob/master/flink-tests/src/test/java/org/apache/flink/test/util/DataSetUtilsITCase.java#L66 @StephanEwen I wanted to do this, but static doesn't work with anonymous classes. However, I can declare the UDF as a private inner class (didn't want to change much code). |
@s1ck Good idea. You can also call |
@s1ck, the +1 for renaming |
@tillrohrmann While writing the new tests for both methods, I encountered that |
There is an issue that tracks the |
@StephanEwen thx for the hint. works fine! Will cleanup and commit now. |
* added tests for parallel execution of both zip functions * renamed log2 -> getBitSize * updated documentation
@tillrohrmann I did not include the |
Ah, thank you for the proof. |
@s1ck, it's important to note that |
* maximum bit size is changed to getNumberOfParallelSubTasks() - 1
@tillrohrmann of course you are right, I thought wrong about it. it's committed |
@s1ck, looks really good. Thanks for your contribution. Will merge it now. |
Sorry, I did not see that there are also identical test cases in Scala which now fail due to the |
No problem @s1ck. It might be a bit redundant but it tests that the forwarding is done correctly. Therefore, I fixed the test case. |
Ok, thank you. |
…ipWithIndex() * modified algorithm as explained in the issue * updated method documentation [FLINK-2590] reducing required bit shift size * maximum bit size is changed to getNumberOfParallelSubTasks() - 1 This closes #1075.
…ipWithIndex() * modified algorithm as explained in the issue * updated method documentation [FLINK-2590] reducing required bit shift size * maximum bit size is changed to getNumberOfParallelSubTasks() - 1 This closes apache#1075.