New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-3202] Ensure that PipelineOptions.getOptionsId is always populated. #4140
Conversation
R: @staslev |
SUCCESS --none-- |
@@ -123,6 +125,19 @@ public void testAppNameIsSetWhenUsingAs() { | |||
} | |||
|
|||
@Test | |||
public void testOptionsIdIsSet() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this test should be placed in the SerializablePipelineOptionsTest
suite since it's highly related to the functionality of serializing/deserializing PipelineOptions
.
Placing it there can provide test coverage for the fundamentals of SerializablePipelineOptions
like so:
@Test
public void testPipelineOptionsIdIsConsistent() {
final PipelineOptions options = PipelineOptionsFactory.create();
final SerializablePipelineOptions serializablePipelineOptions =
new SerializablePipelineOptions(options);
final PipelineOptions clone =
SerializableUtils.clone(serializablePipelineOptions).get();
final PipelineOptions anotherClone =
SerializableUtils.clone(serializablePipelineOptions).get();
assertThat("getOptionsId() was not consistent across original and deserialized instances",
options.getOptionsId(),
is(clone.getOptionsId()));
assertThat("getOptionsId() was not consistent across two deserialized instances",
clone.getOptionsId(),
is(anotherClone.getOptionsId()));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its really a property of PipelineOptions that options id is always populated and that encoding/decoding via Jackson always has options id populated.
SerializablePipelineOptions is a wrapper which makes it easier for runners to encode/decode PipelineOptions relying on Java's serialization mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it is indeed a property of PipelineOptions
, however, SerializablePipelineOptions
uses the very same Jackson under the hood to support Java serialization via SerializablePipelineOptions#readObject()
.
So as far as runner authors are concerned SerializablePipelineOptions
is the API for serializing/deserializing PipelineOptions
, and it would be nice if we had test coverage for it.
I guess that in a way the lack of such coverage is what made me stumble across this in the first place, since I had numerous deserializations of a SerializablePipelineOptions
instances taking place.
(I guess what I'm trying to say is that at the moment SerializablePipelineOptions
is more than just a wrapper, and is in fact an API which I believe would benefit from having the test I mentioned)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SerializablePipelineOptions is a wrapper and moving the test would only enforce that the wrapper works and not that the underlying implementation is doing the right thing since SerializablePipelineOptions could always enforce that getOptionsId had been called before deserialization yet another runner may just use Jackson directly (i.e. Dataflow).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see your point now.
It would be nice if we could eliminate the duplicate Jackson code (i.e., the mapper
construction and usage) by having (all?) runners use the same method for serializing and deserializing PipelineOptions
. Out of curiosity, why doesn't the Dataflow runner use SerializablePipelineOptions
like the other runners?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the Dataflow service isn't written in Java and it would be hard to decode Java serialized objects. Part of the RPC to create the Dataflow job takes the JSON format which is much easier to decode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the info.
This seems to resolve the issue so it LGTM.
SUCCESS --none-- |
1 similar comment
SUCCESS --none-- |
FAILURE --none-- |
1 similar comment
FAILURE --none-- |
SUCCESS --none-- |
@lukecwik Do you think it would be possible to have this cherry-picked for the upcoming |
If the current release candidate doesn't make it through validation, I'll try to get it cherry picked otherwise I would rather get this release out as is because of how long it has been taking. |
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue.mvn clean verify
to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.