[BEAM-646] Get the #apply out of the DirectRunner#2006
[BEAM-646] Get the #apply out of the DirectRunner#2006tgroh wants to merge 3 commits intoapache:masterfrom
Conversation
|
R: @amitsela Whaboom! |
|
Changes Unknown when pulling a8bf815 on tgroh:surgery_ish into ** on apache:master**. |
|
Refer to this link for build results (access rights to CI server needed): |
|
Good news! I removed the need for the crazy |
|
Refer to this link for build results (access rights to CI server needed): |
| .put( | ||
| PTransformMatchers.stateOrTimerParDoSingle(), | ||
| new ParDoSingleViaMultiOverrideFactory()) | ||
| .put(PTransformMatchers.splittableParDoMulti(), new ParDoMultiOverrideFactory()) |
There was a problem hiding this comment.
When you have a sensitivity to ordering, comment about it it. For example "ParDoSingleViaMulti comes before ParDoMultiOverrideFactory because state and splittable are only set up to work on multi" and then later "ParDoSingleViaMulti comes after ParDoMultiOverrideFactory because the resulting expansions contains yet more ParDo singles"
There was a problem hiding this comment.
Done. Most of the other things are independent of this ParDo chain, and/or are commented above.
| @Override | ||
| public DirectPipelineResult run(Pipeline pipeline) { | ||
| for (Map.Entry<PTransformMatcher, PTransformOverrideFactory> override : | ||
| defaultTransformOverrides.entrySet()) { |
There was a problem hiding this comment.
Really nice. This is the perfect "finish line" for this API.
| new DirectGroupAlsoByWindow<String, WindowedValue<KV<String, Integer>>>( | ||
| WindowingStrategy.globalDefault(), WindowingStrategy.globalDefault())) | ||
| .apply( | ||
| ParDo.of(new ParDoMultiOverrideFactory.ToKeyedWorkItem<String, Integer>()) |
There was a problem hiding this comment.
This is a good change. It tests only that the analysis works on the post-surgery graph containing real direct runner primitives.
| !isSplittable(getFn()), | ||
| "%s does not support Splittable DoFn", | ||
| input.getPipeline().getOptions().getRunner().getName()); | ||
| // SplittableDoFn should be forbidden on the runner-side. |
There was a problem hiding this comment.
Are there JIRAs for the other runners to reject it? Otherwise I think this will just be silently busted.
There was a problem hiding this comment.
+1 isn't SplittableDoFn to be handled by runners via SplittableParDo ?
There was a problem hiding this comment.
I've explicitly rejected SplittableDoFn in all of the existing runners.
|
Besides joining Kenn's comment on removing assertions in ParDo, looks fine, but unless I'm missing something here it shouldn't be removed, unless this responsibility is passed to runners now, and if so it should still be as part of a separate PR (so still, not here). Spark runner currently doesn't use |
Remove Existing Outputs from Producer Map in replace. This permits replacements to produce the same PValue as the transform they are replacing, for example in CreatePCollectionView. In replace(), add the replacement input to Unexpanded Inputs.
Remove DirectRunner#apply(). This migrates the DirectRunner to work on a runner-agnostic graph.
|
Refer to this link for build results (access rights to CI server needed): |
kennknowles
left a comment
There was a problem hiding this comment.
Just nits on the error messages. LGTM when you fix them up.
|
|
||
| if (signature.processElement().isSplittable()) { | ||
| throw new UnsupportedOperationException( | ||
| String.format("SparkRunner does not support SplittableDoFn: %s", doFn)); |
There was a problem hiding this comment.
Error message nits:
- For proper identifiers, interpolate when possible, as in
SparkRunner.class.getSimpleName() SplittableDoFnis not a proper identifier. It is just asplittable DoFn.
|
|
||
| if (signature.processElement().isSplittable()) { | ||
| throw new UnsupportedOperationException( | ||
| String.format("ApexRunner does not support Splittable DoFn: %s", doFn)); |
There was a problem hiding this comment.
Error message nits:
- For proper identifiers, interpolate when possible, as in
ApexRunner.class.getSimpleName() Splittableshouldn't be capitalized. It is just asplittable DoFn.
|
|
||
| if (signature.processElement().isSplittable()) { | ||
| throw new UnsupportedOperationException( | ||
| String.format("ApexRunner does not support Splittable DoFn: %s", doFn)); |
There was a problem hiding this comment.
Error message nits:
- For proper identifiers, interpolate when possible, as in
ApexRunner.class.getSimpleName() Splittableshouldn't be capitalized. It is just asplittable DoFn.
| DoFnSignature signature = DoFnSignatures.getSignature(doFn.getClass()); | ||
| if (signature.processElement().isSplittable()) { | ||
| throw new UnsupportedOperationException( | ||
| String.format("FlinkRunner does not currently support Splittable DoFn: %s", doFn)); |
There was a problem hiding this comment.
Error message nits:
- For proper identifiers, interpolate when possible, as in
FlinkRunner.class.getSimpleName() Splittableshouldn't be capitalized. It is just asplittable DoFn.
| DoFnSignature signature = DoFnSignatures.getSignature(doFn.getClass()); | ||
| if (signature.processElement().isSplittable()) { | ||
| throw new UnsupportedOperationException( | ||
| String.format("FlinkRunner does not currently support Splittable DoFn: %s", doFn)); |
There was a problem hiding this comment.
Error message nits:
- For proper identifiers, interpolate when possible, as in
FlinkRunner.class.getSimpleName() Splittableshouldn't be capitalized. It is just asplittable DoFn.
| DoFnSignature signature = DoFnSignatures.getSignature(fn.getClass()); | ||
| if (signature.processElement().isSplittable()) { | ||
| throw new UnsupportedOperationException( | ||
| String.format("FlinkRunner does not currently support splittable DoFn: %s", fn)); |
There was a problem hiding this comment.
This is the DataflowRunner. Using identifier interpolation would have caught this :-)
There was a problem hiding this comment.
The one time I don't artisinally craft a locally sourced error message I also fail to remember to fix my copypaste.
|
Refer to this link for build results (access rights to CI server needed): |
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
[BEAM-<Jira issue #>] Description of pull requestmvn clean verify. (Even better, enableTravis-CI on your fork and ensure the whole test matrix passes).
<Jira issue #>in the title with the actual Jira issuenumber, if there is one.
Individual Contributor License Agreement.
Use the Graph Surgery API in the DirectRunner.
The Forwarding View is currently required to ensure that the
WriteViewtransformis marked as the producer of the View proper. It is then replaced by the original view,
so the Pipeline writes the view to the correct location.
I'm going to revisit the producer-marking code soon, but this is safe to review.
I also missed invariant maintenance within TransformHierarchy within #1998,
so that's added here as well.