Skip to content

Comments

[BEAM-1815] Avoid shuffling twice in GABW.#2334

Closed
amitsela wants to merge 1 commit intoapache:masterfrom
amitsela:force-partitioner-gbk
Closed

[BEAM-1815] Avoid shuffling twice in GABW.#2334
amitsela wants to merge 1 commit intoapache:masterfrom
amitsela:force-partitioner-gbk

Conversation

@amitsela
Copy link
Member

@amitsela amitsela commented Mar 27, 2017

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

  • Make sure the PR title is formatted like:
    [BEAM-<Jira issue #>] Description of pull request
  • Make sure tests pass via mvn clean verify. (Even better, enable
    Travis-CI on your fork and ensure the whole test matrix passes).
  • Replace <Jira issue #> in the title with the actual Jira issue
    number, if there is one.
  • If this contribution is large, please file an Apache
    Individual Contributor License Agreement.

@amitsela
Copy link
Member Author

R: @aviemzur

@amitsela
Copy link
Member Author

Run Spark RunnableOnService

@asfbot
Copy link

asfbot commented Mar 27, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Spark/1390/
--none--

@coveralls
Copy link

Coverage Status

Coverage remained the same at 70.18% when pulling 6221366 on amitsela:force-partitioner-gbk into 026aec8 on apache:master.

@asfbot
Copy link

asfbot commented Mar 27, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/8822/
--none--

Copy link
Member

@aviemzur aviemzur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome fix and improvement!

A few comments, nothing major.

A general idea: if possible, can write a util to convert a PairFunction to a FlatMapPairFunction
this could decrease the amount of changes in the code (and be useful in similar cases in the future).

.map(WindowingHelpers.<KV<K, WindowedValue<V>>>unwindowFunction())
.mapToPair(TranslationUtils.<K, WindowedValue<V>>toPairFunction())
.mapToPair(CoderHelpers.toByteFunction(keyCoder, wvCoder));
// Use the RDD partitioner, if exists.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment correct? Looks like we're creating a HashPartitioner not using the RDD partitioner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftovers.. I'll fix that.

Tuple2<ByteArray, Iterable<byte[]>>, Tuple2<K, Iterable<V>>>() {
@Override
public Tuple2<K, Iterable<V>> apply(Tuple2<ByteArray, Iterable<byte[]>> t2) {
K k = fromByteArray(t2._1().getValue(), keyCoder);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation seems off here.

}
}));
public Iterable<Tuple2<K, Iterable<V>>> call(
Iterator<Tuple2<ByteArray, Iterable<byte[]>>> itr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation seems off here.

@amitsela
Copy link
Member Author

@aviemzur PTAL

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 70.162% when pulling 002989e on amitsela:force-partitioner-gbk into 026aec8 on apache:master.

@amitsela
Copy link
Member Author

org.apache.beam.examples.WindowedWordCountIT.testWindowedWordCountInStreaming failure is Flink, and org.apache.beam.runners.apex.ApexRunnerTest.testConfigProperties is Apex so disregard the failures.

@amitsela
Copy link
Member Author

Run Spark RunnableOnService

@asfbot
Copy link

asfbot commented Mar 28, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Spark/1408/
--none--

Copy link
Member

@aviemzur aviemzur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Minor comments re: Javadoc and method name suggestions.


/** A pair to {@link KV} function . */
/** {@link KV} to pair flatmap function. */
public static <K, V> PairFlatMapFunction<Iterator<KV<K, V>>, K, V> toPairFlatMapFunction() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename suggestion: flatMapKVToPair / kvToPairFlatMapFunction?

toPairByKeyInWindowedValue() {
return new PairFunction<WindowedValue<KV<K, V>>, K, WindowedValue<KV<K, V>>>() {
/** A pair to {@link KV} flatmap function . */
static <K, V> FlatMapFunction<Iterator<Tuple2<K, V>>, KV<K, V>> fromPairFlatMapFunction() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename suggestion: flatMapPairToKV / pairToKVFlatMapFunction?

};
}

public static <T, K, V> PairFlatMapFunction<Iterator<T>, K, V> pairFunctionToPairFlatMapFunction(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a v. useful public function, we should probably have a Javadoc for it.
Same comment for functionToFlatMapFunction

…void unnecessary shuffles in

the composite GBK implementation.

Add Javadoc.
@amitsela amitsela force-pushed the force-partitioner-gbk branch from 002989e to 9ab77be Compare March 28, 2017 12:51
@amitsela
Copy link
Member Author

Run Spark RunnableOnService

@asfbot
Copy link

asfbot commented Mar 28, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Spark/1410/
--none--

@coveralls
Copy link

Coverage Status

Coverage remained the same at 70.172% when pulling 9ab77be on amitsela:force-partitioner-gbk into d35e1b0 on apache:master.

@asfbot
Copy link

asfbot commented Mar 28, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/8856/
--none--

@asfgit asfgit closed this in 434eadb Mar 28, 2017
@amitsela amitsela deleted the force-partitioner-gbk branch March 28, 2017 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants