-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-2333] Go to proto and back before running a pipeline in Java DirectRunner #3334
Conversation
eff523d
to
8dd7dfb
Compare
113973e
to
f4b11b3
Compare
Seems to pass most tests (like all of |
c3918c6
to
c80fb3f
Compare
The only remaining failure is DisplayData, which is simply a TODO, and then turning the hacks into better designs. |
94724e4
to
6cae654
Compare
6cae654
to
22e2a02
Compare
R: @tgroh There are a couple of hacks left in here:
But it still seems like a decent milestone to perhaps checkpoint where things are with the DirectRunner, depending on how egregious you find said hacks. Suggestions welcome. |
bbeba80
to
735673b
Compare
735673b
to
e99b8f8
Compare
Yup, hacking uniqueness works. Probably we should fix it. Filed https://issues.apache.org/jira/browse/BEAM-2632 as a starter ticket. |
e99b8f8
to
182fdd7
Compare
@@ -274,7 +317,8 @@ public Node getCurrent() { | |||
|
|||
private final String fullName; | |||
|
|||
// Nodes for sub-transforms of a composite transform. | |||
// Nodes for sub-transforms of a composite transform, in shallow topological | |||
// order so visiting in this order yields a recursively topological traversal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not strictly required; visit
will ensure that the topological order is respected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it? That's new. Great! Reverted.
@@ -296,15 +297,18 @@ private static void writeToStreamAndClose(List<String> lines, OutputStream outpu | |||
private void assertReadingCompressedFileMatchesExpected( | |||
File file, CompressionType compressionType, List<String> expected) { | |||
|
|||
int thisUniquifier = ++uniquifier; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be pulled out as an independent change?
RehydratedPTransform.of( | ||
transformSpec.getUrn(), transformSpec.getParameter(), additionalInputs); | ||
|
||
// HACK: A primitive is something with outputs that are not in its input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this is as hacky as this comment claims, as we discussed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth factoring into an isPrimitive
method, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. And reversed polarity of if
accordingly.
|
||
@Override | ||
public void visitPrimitiveTransform(Node node) { | ||
// TODO: Include DisplayData in the proto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you include a stub implementation that puts in an empty DisplayData
(including a DisplayDataTranslation
class, so once it's implemented it's already wired in?
An explicit JIRA for "Implement Runner API Display Data" is probably worthwhile as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Moved that to PTransformTranslation
. Filed BEAM-2645.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider moving the TODO to that location as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
.withAllowedLateness(Duration.standardMinutes(3L))); | ||
final WindowingStrategy<?, ?> windowedStrategy = windowed.getWindowingStrategy(); | ||
PCollection<KV<String, Long>> keyed = windowed.apply(WithKeys.<String, Long>of("foo")); | ||
PCollection<KV<String, Iterable<Long>>> grouped = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth including a combine and something with side inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done now. Minor change to the verification. We discussed that the verification could be made both more thorough and less brittle by doing something other than counting, but I'd love to do that in a follow-up...
} catch (IOException exc) { | ||
throw new IllegalArgumentException( | ||
String.format( | ||
"%s received transform that did not contain a Java-serialized %s: %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That specific reason seems like an exception that should be thrown out of the getWindowFn
call, as it's returning an object of type WindowFn
; This should just report that it couldn't get the WindowFn out of the transform and attach the cause.
* @param inputs the expanded inputs to the transform | ||
* @param outputs the expanded outputs of the transform | ||
*/ | ||
private Node( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should finalized
be set to true here? all places we use it we then immediately set the bit to true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion. I think it is a bit uniform and readable as-is.
9db4a17
to
834339c
Compare
834339c
to
48905bd
Compare
48905bd
to
a2b9260
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also going to break out some of these sub-pieces for quick-and-easy reviews.
|
||
@Override | ||
public void visitPrimitiveTransform(Node node) { | ||
// TODO: Include DisplayData in the proto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Moved that to PTransformTranslation
. Filed BEAM-2645.
RehydratedPTransform.of( | ||
transformSpec.getUrn(), transformSpec.getParameter(), additionalInputs); | ||
|
||
// HACK: A primitive is something with outputs that are not in its input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. And reversed polarity of if
accordingly.
ec93c33
to
8cca62b
Compare
Seem this PR lost a race with some changes to |
12a285f
to
5c293f8
Compare
Turns out they were just broken tests actually passing |
// Only ParDo really separates main from side inputs | ||
Map<TupleTag<?>, PValue> additionalInputs = Collections.emptyMap(); | ||
|
||
// TODO: move this ownership into the ParDoTranslator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bothers me a bit. I think we've discussed and the obvious plan is to also have ParDoTranslator
registered by URN to do the rehydration, rather than one uniform class for RehydratedPTransform
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's sensible. Doesn't need to happen here, but it's likely worth filing a JIRA to follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
additionalInputs = PCollectionViews.toAdditionalInputs(views); | ||
} | ||
|
||
// TODO: move this ownership into the CombineTranslator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above comment for PAR_DO
. This is another case where we want a custom rehydration to produce a RawPTransform
that knows what components to register. In this case, the issue is not correct execution, but correct re-dehydration. A bit of a corner case, but should be designed right, and helpful for testing.
Changes Unknown when pulling 5c293f8 on kennknowles:Pipeline-rehydrate into ** on apache:master**. |
Changes Unknown when pulling 5c293f8 on kennknowles:Pipeline-rehydrate into ** on apache:master**. |
|
||
@Override | ||
public void visitPrimitiveTransform(Node node) { | ||
// TODO: Include DisplayData in the proto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider moving the TODO to that location as well.
/** Utilities for going to/from DisplayData protos. */ | ||
public class DisplayDataTranslation { | ||
public static RunnerApi.DisplayData toProto(DisplayData displayData) { | ||
// TODO https://issues.apache.org/jira/browse/BEAM-2645 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could consider putting in an "isStubImplementation=true
" to signal to anyone looking at the result that it's not empty because it was actually empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// Only ParDo really separates main from side inputs | ||
Map<TupleTag<?>, PValue> additionalInputs = Collections.emptyMap(); | ||
|
||
// TODO: move this ownership into the ParDoTranslator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's sensible. Doesn't need to happen here, but it's likely worth filing a JIRA to follow-up.
.withAllowedLateness(Duration.standardMinutes(3L))); | ||
final WindowingStrategy<?, ?> windowedStrategy = windowed.getWindowingStrategy(); | ||
PCollection<KV<String, Long>> keyed = windowed.apply(WithKeys.<String, Long>of("foo")); | ||
PCollection<KV<String, Iterable<Long>>> grouped = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this done?
fb5097f
to
8ca4591
Compare
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
[BEAM-<Jira issue #>] Description of pull request
mvn clean verify
.<Jira issue #>
in the title with the actual Jira issuenumber, if there is one.
Individual Contributor License Agreement.
This turn a Pipeline into a protobuf message and then decodes it prior to running it with the Java DirectRunner. Everything is still hardcoded to Java, but this proves that there is adequate information in the protobuf to correctly run a pipeline.