Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-2049] Remove KeyedCombineFn #2636

Merged
merged 2 commits into from
May 1, 2017

Conversation

kennknowles
Copy link
Member

@kennknowles kennknowles commented Apr 21, 2017

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

  • Make sure the PR title is formatted like:
    [BEAM-<Jira issue #>] Description of pull request
  • Make sure tests pass via mvn clean verify. (Even better, enable
    Travis-CI on your fork and ensure the whole test matrix passes).
  • Replace <Jira issue #> in the title with the actual Jira issue
    number, if there is one.
  • If this contribution is large, please file an Apache
    Individual Contributor License Agreement.

I think it is probably best to leave tweaking StateTag and StateSpec until a later PR...

@kennknowles kennknowles force-pushed the delete-KeyedCombineFn branch 3 times, most recently from b04586d to b076a0f Compare April 21, 2017 21:22
@kennknowles kennknowles force-pushed the delete-KeyedCombineFn branch 5 times, most recently from 349ccb8 to 07f69b0 Compare April 25, 2017 17:15
@kennknowles
Copy link
Member Author

R: @tgroh (also any more eyes are helpful)

I'll be running all sorts of test suites offline until Jenkins is back in working order. Most are pretty easy and quick.

@kennknowles
Copy link
Member Author

Confirmed the following offline, with some unrelated metrics flakes.

mvn clean verify -P release -pl sdks/java/core/ -am -amd

CombineFnUtil.toFnWithContext(Sum.ofIntegers().<String>asKeyedFn());
String key = "key";
accum = keyedFnWithContext.createAccumulator(key, nullContext);
CombineFnWithContext<Integer, int[], Integer> keyedFnWithContext =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not keyed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -73,8 +72,8 @@ public void testToFnWithContextIdempotent() throws Exception {
CombineFnUtil.toFnWithContext(Sum.ofIntegers());
assertTrue(fnWithContext == CombineFnUtil.toFnWithContext(fnWithContext));

KeyedCombineFnWithContext<Object, Integer, int[], Integer> keyedFnWithContext =
CombineFnUtil.toFnWithContext(Sum.ofIntegers().asKeyedFn());
CombineFnWithContext<Integer, int[], Integer> keyedFnWithContext =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not keyed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -40,14 +40,14 @@


/**
* A {@link org.apache.beam.sdk.transforms.CombineFnBase.PerKeyCombineFn}
* A {@link org.apache.beam.sdk.transforms.CombineFnBase.GlobalCombineFn}
* with a {@link org.apache.beam.sdk.transforms.CombineWithContext.Context} for the SparkRunner.
*/
public class SparkKeyedCombineFn<K, InputT, AccumT, OutputT> extends SparkAbstractCombineFn {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be renamed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact the code paths are different enough that we should address this separately. It is easy, but not "just a rename", mostly due to keeping the WindowedValue wrappers in place everywhere. It is definitely a runner implementation detail that it chooses to specialize.


/**
* Static utility methods that provide {@link GlobalCombineFnRunner} implementations for different
* keyed combine functions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not keyed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

private final FlinkBroadcastStateInternals<K> flinkStateInternals;

FlinkKeyedCombiningState(
DefaultOperatorStateBackend flinkStateBackend,
StateTag<? super K, CombiningState<InputT, AccumT, OutputT>> address,
Combine.KeyedCombineFn<? super K, InputT, AccumT, OutputT> combineFn,
Combine.CombineFn<InputT, AccumT, OutputT> combineFn,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here FWIW. Leaving runner-specific constructs in place.

@kennknowles
Copy link
Member Author

OK. I hit a few more straggling names and tests.

@kennknowles
Copy link
Member Author

retest this please

Copy link
Member

@tgroh tgroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if tests pass.

@kennknowles
Copy link
Member Author

@kennknowles kennknowles force-pushed the delete-KeyedCombineFn branch 3 times, most recently from 15cd836 to e629d49 Compare April 30, 2017 01:55
@kennknowles
Copy link
Member Author

retest this please

@kennknowles
Copy link
Member Author

run dataflow validatesrunner

@coveralls
Copy link

Coverage Status

Coverage increased (+0.3%) to 70.152% when pulling 1717916 on kennknowles:delete-KeyedCombineFn into 14d60b2 on apache:master.

@kennknowles
Copy link
Member Author

run dataflow validatesrunner

@coveralls
Copy link

Coverage Status

Coverage increased (+0.3%) to 70.133% when pulling 07ca542 on kennknowles:delete-KeyedCombineFn into a198f8d on apache:master.

@kennknowles
Copy link
Member Author

Dataflow ValidatesRunner is stalling reliably at HEAD; nothing to do with this PR. It may be infrastructure or pom or another issue. Certainly it is urgent, but I don't think this PR should wait on it. We have lots of assurances:

  • Jenkins did run WordCountIT and WindowedWordCountIT in precommit, so basic per key combines work
  • Building the new Dataflow worker involved testing akin to the Dataflow ValidatesRunner
  • I have since then manually run all of CombineTest on Dataflow
  • Other runners have compile time protection against this sort of change
  • I have manually run other runners' ValidatesRunner tests anyhow

Given the "LGTM if tests pass" I am going to merge this.

Incidentally, all of this testing I've done against the merger of HEAD and this PR also gives some assurance that our code at HEAD is OK.

@asfgit asfgit merged commit 07ca542 into apache:master May 1, 2017
asfgit pushed a commit that referenced this pull request May 1, 2017
  Update Dataflow worker version to beam-master-20170430
  Remove KeyedCombineFn
@coveralls
Copy link

coveralls commented May 1, 2017

Coverage Status

Coverage decreased (-31.7%) to 38.139% when pulling 07ca542 on kennknowles:delete-KeyedCombineFn into a198f8d on apache:master.

@kennknowles kennknowles deleted the delete-KeyedCombineFn branch May 26, 2017 05:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants