[BEAM-3565] Add FusedPipeline#toPipeline#4777
Conversation
40cd510 to
025f9f7
Compare
| */ | ||
| public Components asComponents(Components base) { | ||
| Builder newComponents = base.toBuilder().clearTransforms(); | ||
| private Map<String, PTransform> getTopLevelTransforms(Components base) { |
There was a problem hiding this comment.
getRunnerExecutedPrimitiveTransforms?
If we go with my suggestion of treating executablestage as a primitive, then this can be called getPrimitiveTransforms or something along those lines. Even asComponents would still make sense.
There was a problem hiding this comment.
getExecutableTransforms is what I am calling this now.
All of the transforms are primitives, but there are non-executable transforms (within a stage) that get merged into the components as well.
|
|
||
| /** | ||
| * Return a {@link Components} like the {@code base} components, but with the only transforms | ||
| * equal to this fused pipeline. |
There was a problem hiding this comment.
but with the only transforms equal to this fused pipeline. -> but with the set of transforms required to be executed by a runner.
| * PCollections} are not yet modelled by {@link QueryablePipeline}, so the input {@link | ||
| * Components} should be treatable as though each node is a primitive. | ||
| */ | ||
| static QueryablePipeline forComponents(Components components) { |
There was a problem hiding this comment.
It would make sense to make ExecutableStage a primitive and have its payload be the subtransforms, coders, pcollections, ...
This will address some of the coder issues and the duality where you want the runner based pipeline representation to differ from the SDK based pipeline segment which will now be completely embedded inside the executable stage.
There was a problem hiding this comment.
#4844 performs a lot of this change, without embedding the components within the stage; it will still require us to create a partial components, but should ultimately cause this construction to go through the same path as the original queryablePipeline
025f9f7 to
2641890
Compare
|
Please address all the comments. |
e2ba80f to
361211e
Compare
|
PTAL |
361211e to
6e82d2e
Compare
| * <p>The only composites will be the stages returned by {@link #getFusedStages()}. | ||
| * <p>The transforms that are present in the returned map are the union of the results of {@link | ||
| * #getRunnerExecutedTransforms()} and {@link #getFusedStages()}, where each {@link | ||
| * ExecutableStage}. |
| /** | ||
| * Return a {@link Components} like the {@code base} components, but with the only transforms | ||
| * equal to this fused pipeline. | ||
| * Return a {@link Components} like the {@code base} components, but with the set of transforms to |
There was a problem hiding this comment.
This comment seems like it should apply toPipeline and not the private method getExecutableStages
Also, Return a {@link Pipeline}
There was a problem hiding this comment.
This comment still seems out of date since it refers to base components.
| /** The {@link PTransform PTransforms} that a runner is responsible for executing. */ | ||
| public abstract Set<PTransformNode> getRunnerExecutedTransforms(); | ||
|
|
||
| public RunnerApi.Pipeline toPipeline(Components initialComponents) { |
There was a problem hiding this comment.
We were given the Pipeline when we constructed the FusedStage via GreedyPipelineFuser, why do we need initialComponents to be passed in again here?
| * Get the PCollections which are not consumed by any {@link PTransformNode} in this {@link | ||
| * QueryablePipeline}. | ||
| */ | ||
| private Set<PCollectionNode> getLeafPCollections() { |
There was a problem hiding this comment.
This isn't used anywhere, what is the intent of adding it right now?
There was a problem hiding this comment.
Pulled in accidentally, I think.
| fused.getFusedStages().size() == 2, | ||
| "Unexpected number of fused stages %s", | ||
| fused.getFusedStages()); | ||
| RunnerApi.Pipeline fusedProto = fused.toPipeline(protoPipeline.getComponents()); |
There was a problem hiding this comment.
fusedProto -> fusedProtoPipeline?
| PCollection.class.getSimpleName(), fusedProto.getRootTransformIds(i)), | ||
| producedPCollections, | ||
| hasItems(rootTransform.getInputsMap().values().toArray(new String[0]))); | ||
| for (String consumed : consumedPCollections) { |
There was a problem hiding this comment.
Don't all roots have zero consumed PCollections so this is just checking that no two roots produce the same PCollection.
There was a problem hiding this comment.
the Pipeline root transform ID's is really the top-level transforms (transforms not contained in any other transform)
lukecwik
left a comment
There was a problem hiding this comment.
Please address minor comment nits.
| /** | ||
| * Return a {@link Components} like the {@code base} components, but with the only transforms | ||
| * equal to this fused pipeline. | ||
| * Return a {@link Components} like the {@code base} components, but with the set of transforms to |
There was a problem hiding this comment.
This comment still seems out of date since it refers to base components.
| * PTransforms} present in the original Pipeline that this {@link FusedPipeline} was created from, | ||
| * plus all of the {@link ExecutableStage ExecutableStages} contained within this {@link | ||
| * FusedPipeline}. The Root Transform IDs will contain all of the runner executed transforms and | ||
| * all of the ExecutableStages contained within the Pipeline. |
There was a problem hiding this comment.
ExecutableStages -> {@link ExecutableStage executable stages}
| * <p>The {@link Components} of the returned pipeline will contain all of the {@link PTransform | ||
| * PTransforms} present in the original Pipeline that this {@link FusedPipeline} was created from, | ||
| * plus all of the {@link ExecutableStage ExecutableStages} contained within this {@link | ||
| * FusedPipeline}. The Root Transform IDs will contain all of the runner executed transforms and |
There was a problem hiding this comment.
The upper casing on Root Transform IDs is strange, would you rather link the Pipeline root transform ids method?
| PCollection.class.getSimpleName(), fusedProto.getRootTransformIds(i)), | ||
| producedPCollections, | ||
| hasItems(rootTransform.getInputsMap().values().toArray(new String[0]))); | ||
| for (String consumed : consumedPCollections) { |
Any pipeline node should be identifiable within the components the pipeline was built from, and this allows cleaner comparisons based on the id.
Given a Pipeline Components, this constructs the fused representation, including a shallow topologically ordered set of root transforms.
e74aa23 to
d254651
Compare
|
Done to all |
The FusedPipeline is the physical plan corresponding with an original, logical Pipeline. Converting it back into a proto allows that plan to be manipulated with existing libraries, and for runners to interact with that plan for runner-specific implementation reasons.
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue.mvn clean verifyto make sure basic checks pass. A more thorough check will be performed on your pull request automatically.