Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-3202] Ensure that PipelineOptions.getOptionsId is always populated. #4140

Closed
wants to merge 1 commit into from

Conversation

lukecwik
Copy link
Member

@lukecwik lukecwik commented Nov 16, 2017

Follow this checklist to help us incorporate your contribution quickly and easily:

  • Make sure there is a JIRA issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes.
  • Each commit in the pull request should have a meaningful subject line and body.
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Run mvn clean verify to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

@lukecwik
Copy link
Member Author

R: @staslev

@asfgit
Copy link

asfgit commented Nov 16, 2017

SUCCESS

--none--

@@ -123,6 +125,19 @@ public void testAppNameIsSetWhenUsingAs() {
}

@Test
public void testOptionsIdIsSet() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this test should be placed in the SerializablePipelineOptionsTest suite since it's highly related to the functionality of serializing/deserializing PipelineOptions.
Placing it there can provide test coverage for the fundamentals of SerializablePipelineOptions like so:

  @Test
  public void testPipelineOptionsIdIsConsistent() {
    final PipelineOptions options = PipelineOptionsFactory.create();
    final SerializablePipelineOptions serializablePipelineOptions =
        new SerializablePipelineOptions(options);

    final PipelineOptions clone =
        SerializableUtils.clone(serializablePipelineOptions).get();

    final PipelineOptions anotherClone =
        SerializableUtils.clone(serializablePipelineOptions).get();

    assertThat("getOptionsId() was not consistent across original and deserialized instances",
               options.getOptionsId(),
               is(clone.getOptionsId()));

    assertThat("getOptionsId() was not consistent across two deserialized instances",
               clone.getOptionsId(),
               is(anotherClone.getOptionsId()));
  }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its really a property of PipelineOptions that options id is always populated and that encoding/decoding via Jackson always has options id populated.

SerializablePipelineOptions is a wrapper which makes it easier for runners to encode/decode PipelineOptions relying on Java's serialization mechanism.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is indeed a property of PipelineOptions, however, SerializablePipelineOptions uses the very same Jackson under the hood to support Java serialization via SerializablePipelineOptions#readObject().
So as far as runner authors are concerned SerializablePipelineOptions is the API for serializing/deserializing PipelineOptions, and it would be nice if we had test coverage for it.

I guess that in a way the lack of such coverage is what made me stumble across this in the first place, since I had numerous deserializations of a SerializablePipelineOptions instances taking place.

(I guess what I'm trying to say is that at the moment SerializablePipelineOptions is more than just a wrapper, and is in fact an API which I believe would benefit from having the test I mentioned)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SerializablePipelineOptions is a wrapper and moving the test would only enforce that the wrapper works and not that the underlying implementation is doing the right thing since SerializablePipelineOptions could always enforce that getOptionsId had been called before deserialization yet another runner may just use Jackson directly (i.e. Dataflow).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see your point now.

It would be nice if we could eliminate the duplicate Jackson code (i.e., the mapper construction and usage) by having (all?) runners use the same method for serializing and deserializing PipelineOptions. Out of curiosity, why doesn't the Dataflow runner use SerializablePipelineOptions like the other runners?

Copy link
Member Author

@lukecwik lukecwik Nov 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the Dataflow service isn't written in Java and it would be hard to decode Java serialized objects. Part of the RPC to create the Dataflow job takes the JSON format which is much easier to decode.

Copy link
Contributor

@staslev staslev Nov 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info.

This seems to resolve the issue so it LGTM.

@asfgit
Copy link

asfgit commented Nov 16, 2017

SUCCESS

--none--

1 similar comment
@asfgit
Copy link

asfgit commented Nov 16, 2017

SUCCESS

--none--

@asfgit
Copy link

asfgit commented Nov 16, 2017

FAILURE

--none--

1 similar comment
@asfgit
Copy link

asfgit commented Nov 16, 2017

FAILURE

--none--

@asfgit
Copy link

asfgit commented Nov 16, 2017

SUCCESS

--none--

@asfgit asfgit closed this in 40100db Nov 16, 2017
@staslev
Copy link
Contributor

staslev commented Nov 19, 2017

@lukecwik Do you think it would be possible to have this cherry-picked for the upcoming 2.2.0 release?

@lukecwik
Copy link
Member Author

If the current release candidate doesn't make it through validation, I'll try to get it cherry picked otherwise I would rather get this release out as is because of how long it has been taking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants