Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-12767] Improve PipelineOption parsing UX #15338

Merged
merged 1 commit into from
Aug 31, 2021

Conversation

steveniemitz
Copy link
Contributor

This adds two improvements to the PipelineOptionFactory command line parser:

  • Failure to parse complex types due to missing quotes is retried with the input automatically quoted
  • @JsonDeserialize is now supported on getters, the specified deserializer in the annotation will be used to parse the input string.

R: @lukecwik @kennknowles since you two are the last to make any changes here.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

ValidatesRunner compliance status (on master branch)

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- Build Status Build Status Build Status Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Python --- Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status ---
XLang Build Status Build Status Build Status Build Status Build Status ---

Examples testing status on various runners

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- --- --- --- --- --- ---
Java --- Build Status
Build Status
Build Status
--- --- --- --- ---
Python --- --- --- --- --- --- ---
XLang --- --- --- --- --- --- ---

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Go Java Python
Build Status Build Status Build Status
Build Status
Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@codecov
Copy link

codecov bot commented Aug 17, 2021

Codecov Report

Merging #15338 (cbb363f) into master (12d624b) will increase coverage by 0.03%.
The diff coverage is n/a.

❗ Current head cbb363f differs from pull request most recent head 16224d3. Consider uploading reports for the commit 16224d3 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15338      +/-   ##
==========================================
+ Coverage   83.79%   83.83%   +0.03%     
==========================================
  Files         440      440              
  Lines       59804    59886      +82     
==========================================
+ Hits        50115    50207      +92     
+ Misses       9689     9679      -10     
Impacted Files Coverage Δ
...n/apache_beam/ml/gcp/recommendations_ai_test_it.py 56.81% <0.00%> (-12.95%) ⬇️
.../apache_beam/io/gcp/datastore/v1new/datastoreio.py 86.45% <0.00%> (-0.99%) ⬇️
sdks/python/apache_beam/io/gcp/bigquery.py 75.53% <0.00%> (-0.85%) ⬇️
sdks/python/apache_beam/io/gcp/bigquery_tools.py 86.67% <0.00%> (-0.74%) ⬇️
sdks/python/apache_beam/dataframe/frames.py 94.87% <0.00%> (-0.07%) ⬇️
...ks/python/apache_beam/runners/worker/sdk_worker.py 88.95% <0.00%> (-0.06%) ⬇️
setup.py 0.00% <0.00%> (ø)
...dks/python/apache_beam/options/pipeline_options.py 95.16% <0.00%> (ø)
.../python/apache_beam/io/gcp/resource_identifiers.py 100.00% <0.00%> (ø)
.../python/apache_beam/portability/api/metrics_pb2.py 100.00% <0.00%> (ø)
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80da0ec...16224d3. Read the comment docs.

@steveniemitz steveniemitz force-pushed the better-option-parsing branch 2 times, most recently from 183b6d8 to 1c80016 Compare August 18, 2021 16:23
@steveniemitz
Copy link
Contributor Author

hm, this says there's changes requested still but I can't actually find them in the review... @lukecwik I think I addressed all your comments.

@steveniemitz
Copy link
Contributor Author

Run Java PreCommit

@lukecwik
Copy link
Member

hm, this says there's changes requested still but I can't actually find them in the review... @lukecwik I think I addressed all your comments.

Thats just what the UI will say forever until another review happens by the person who requested changes.

Copy link
Member

@lukecwik lukecwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ready to review yet since I don't see any logic around @JsonSerialize/@JsonIgnore as we discussed. Everything else looks pretty good.

*
* <p>If the getter method is annotated with {@link
* com.fasterxml.jackson.databind.annotation.JsonDeserialize} the specified deserializer will be
* used, otherwise the default ObjectMapper deserialization strategy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* used, otherwise the default ObjectMapper deserialization strategy.
* used, otherwise the default ObjectMapper deserialization strategy is used.

PipelineOptionsFactory.fromArgs("--complexType=test").as(OptionsWithJsonDeserialize1.class);

expectedException.expect(IllegalArgumentException.class);
expectedException.expectMessage(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExpectedException is deprecated, please use assertThrows(..., lambda that throws exception);

@@ -1572,6 +1601,80 @@ public int compare(Method o1, Method o2) {
return builder.build();
}

private static JsonDeserializer<Object> deserializerForMethod(Method method, JavaType type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: method comment

lukecwik
lukecwik previously approved these changes Aug 18, 2021
@lukecwik lukecwik self-requested a review August 18, 2021 20:45
@lukecwik lukecwik dismissed their stale review August 18, 2021 20:45

Didn't mean to LGTM

Copy link
Member

@lukecwik lukecwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my earlier review #15338 (review)

@steveniemitz
Copy link
Contributor Author

steveniemitz commented Aug 18, 2021

I don't see any logic around @JsonSerialize/@JsonIgnore as we discussed.

I'm not sure what I'd be testing/adding logic for here? The places that serialize PipelineOptions I've seen pass the typed PipelineOptions object to an ObjectMapper directly, is there some other path you were thinking about?

never mind I see what you mean now, I didn't realize the PipelineOptions had a custom serializer/deserializer as well. Will update that.

@steveniemitz steveniemitz force-pushed the better-option-parsing branch 2 times, most recently from 13807ab to c45ce17 Compare August 19, 2021 17:45
@lukecwik
Copy link
Member

I don't see any logic around @JsonSerialize/@JsonIgnore as we discussed.

I'm not sure what I'd be testing/adding logic for here? The places that serialize PipelineOptions I've seen pass the typed PipelineOptions object to an ObjectMapper directly, is there some other path you were thinking about?

never mind I see what you mean now, I didn't realize the PipelineOptions had a custom serializer/deserializer as well. Will update that.

Great, thanks.

@steveniemitz steveniemitz force-pushed the better-option-parsing branch 2 times, most recently from 7afc8c6 to 9e0c248 Compare August 19, 2021 19:21
@steveniemitz
Copy link
Contributor Author

alright, all hooked up and I added some more tests to ProxyInvocationHandlerTest as well to test the round-trip.

@steveniemitz
Copy link
Contributor Author

Run Java PreCommit

@steveniemitz
Copy link
Contributor Author

Run Java_Examples_Dataflow PreCommit

@steveniemitz
Copy link
Contributor Author

steveniemitz commented Aug 19, 2021

I'm still tracking down one more issue w/ serialization now when serializing custom ValueProviders

edit: nm there wasn't an issue, I was just using an out-of-date build on my tests =/

@steveniemitz
Copy link
Contributor Author

Run Java_Examples_Dataflow PreCommit

@steveniemitz
Copy link
Contributor Author

Run Java PreCommit

@steveniemitz
Copy link
Contributor Author

@lukecwik should be good to review now.

@lukecwik
Copy link
Member

Run Java PreCommit

1 similar comment
@steveniemitz
Copy link
Contributor Author

Run Java PreCommit

@steveniemitz
Copy link
Contributor Author

Run Java_Examples_Dataflow PreCommit

@steveniemitz
Copy link
Contributor Author

Run Java_Examples_Dataflow PreCommit

@steveniemitz
Copy link
Contributor Author

Run Java PreCommit

@steveniemitz
Copy link
Contributor Author

Run Java_Examples_Dataflow_Java11 PreCommit

@steveniemitz
Copy link
Contributor Author

lol one day these tests will succeed...

@steveniemitz
Copy link
Contributor Author

Run Java PreCommit

@steveniemitz
Copy link
Contributor Author

Run Java_Examples_Dataflow PreCommit

@steveniemitz
Copy link
Contributor Author

Run Java PreCommit

@steveniemitz
Copy link
Contributor Author

@lukecwik finally jammed all the tests through!

@lukecwik lukecwik merged commit 0592083 into apache:master Aug 31, 2021
@lukecwik
Copy link
Member

Thanks, if you have some time and can look at why the test is flaking it would be greatly appreciated.

@kw2542
Copy link
Contributor

kw2542 commented Sep 1, 2021

@steveniemitz
Copy link
Contributor Author

hm, weird. Yeah feel free to revert, I can look into the failures

@kw2542
Copy link
Contributor

kw2542 commented Sep 1, 2021

I am not 100% sure if this is the RC but looks like it.

java.lang.RuntimeException: Failed to invoke samza job
	at org.apache.beam.runners.samza.SamzaPipelineRunner.run(SamzaPipelineRunner.java:83)
	at org.apache.beam.runners.jobsubmission.JobInvocation.runPipeline(JobInvocation.java:86)
	at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
	at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
	at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: Failed to serialize PipelineOptions
	at org.apache.beam.runners.samza.translation.ConfigBuilder.build(ConfigBuilder.java:110)
	at org.apache.beam.runners.samza.SamzaRunner.runPortablePipeline(SamzaRunner.java:93)
	at org.apache.beam.runners.samza.SamzaPipelineRunner.run(SamzaPipelineRunner.java:73)
	... 7 more
Caused by: java.lang.IllegalArgumentException: Failed to serialize PipelineOptions
	at org.apache.beam.runners.core.construction.SerializablePipelineOptions.serializeToJson(SerializablePipelineOptions.java:70)
	at org.apache.beam.runners.core.construction.SerializablePipelineOptions.<init>(SerializablePipelineOptions.java:44)
	at org.apache.beam.runners.samza.translation.ConfigBuilder.build(ConfigBuilder.java:102)
	... 9 more
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Error while populating display data for component 'org.apache.beam.sdk.options.ProxyInvocationHandler$PipelineOptionsDisplayData': null
	at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._wrapAsIOE(DefaultSerializerProvider.java:509)
	at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.java:482)
	at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:319)
	at com.fasterxml.jackson.databind.ObjectMapper._writeValueAndClose(ObjectMapper.java:4487)
	at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:3742)
	at org.apache.beam.runners.core.construction.SerializablePipelineOptions.serializeToJson(SerializablePipelineOptions.java:68)
	... 11 more
Caused by: org.apache.beam.sdk.transforms.display.DisplayData$InternalBuilder$PopulateDisplayDataException: Error while populating display data for component 'org.apache.beam.sdk.options.ProxyInvocationHandler$PipelineOptionsDisplayData': null
	at org.apache.beam.sdk.transforms.display.DisplayData$InternalBuilder.include(DisplayData.java:784)
	at org.apache.beam.sdk.transforms.display.DisplayData$InternalBuilder.delegate(DisplayData.java:740)
	at org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:157)
	at com.sun.proxy.$Proxy0.populateDisplayData(Unknown Source)
	at org.apache.beam.sdk.transforms.display.DisplayData$InternalBuilder.include(DisplayData.java:775)
	at org.apache.beam.sdk.transforms.display.DisplayData$InternalBuilder.access$100(DisplayData.java:703)
	at org.apache.beam.sdk.transforms.display.DisplayData.from(DisplayData.java:81)
	at org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.serialize(ProxyInvocationHandler.java:706)
	at org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.serialize(ProxyInvocationHandler.java:654)
	at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.java:480)
	... 15 more
Caused by: java.lang.NullPointerException
	at com.fasterxml.jackson.databind.DeserializationContext.isEnabled(DeserializationContext.java:395)
	at com.fasterxml.jackson.databind.deser.std.StdDeserializer._checkFromStringCoercion(StdDeserializer.java:1364)
	at com.fasterxml.jackson.databind.deser.std.StdDeserializer._parseIntPrimitive(StdDeserializer.java:728)
	at com.fasterxml.jackson.databind.deser.std.NumberDeserializers$IntegerDeserializer.deserialize(NumberDeserializers.java:529)
	at com.fasterxml.jackson.databind.deser.std.NumberDeserializers$IntegerDeserializer.deserialize(NumberDeserializers.java:506)
	at org.apache.beam.sdk.options.PipelineOptionsFactory.deserializeNode(PipelineOptionsFactory.java:1791)
	at org.apache.beam.sdk.options.ProxyInvocationHandler.getValueFromJson(ProxyInvocationHandler.java:517)
	at org.apache.beam.sdk.options.ProxyInvocationHandler.getValueFromJson(ProxyInvocationHandler.java:512)
	at org.apache.beam.sdk.options.ProxyInvocationHandler.access$400(ProxyInvocationHandler.java:93)
	at org.apache.beam.sdk.options.ProxyInvocationHandler$PipelineOptionsDisplayData.populateDisplayData(ProxyInvocationHandler.java:359)
	at org.apache.beam.sdk.transforms.display.DisplayData$InternalBuilder.include(DisplayData.java:775)
	... 24 more
ERROR:root:java.lang.NullPointerException

This is the stacktrace in Samza runner that failed these tests, where a similar stacktrace in Flink and Sparker Runner, what do you think?

@steveniemitz
Copy link
Contributor Author

I think the patch is actually really simple, I can get another PR going right now, or we can revert and re-apply with it fixed. I'm fine with either.

diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java
index 030c56ba59..e20a8641b6 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java
@@ -44,6 +44,7 @@ import com.fasterxml.jackson.databind.node.TreeTraversingParser;
 import com.fasterxml.jackson.databind.ser.DefaultSerializerProvider;
 import com.fasterxml.jackson.databind.type.TypeBindings;
 import com.fasterxml.jackson.databind.util.SimpleBeanPropertyDefinition;
+import com.fasterxml.jackson.databind.util.TokenBuffer;
 import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
 import java.beans.BeanInfo;
 import java.beans.IntrospectionException;
@@ -499,7 +500,8 @@ public class PipelineOptionsFactory {
 
   private static final DefaultDeserializationContext DESERIALIZATION_CONTEXT =
       new DefaultDeserializationContext.Impl(MAPPER.getDeserializationContext().getFactory())
-          .createInstance(MAPPER.getDeserializationConfig(), null, null);
+          .createInstance(
+              MAPPER.getDeserializationConfig(), new TokenBuffer(MAPPER, false).asParser(), null);

@lukecwik
Copy link
Member

lukecwik commented Sep 1, 2021

I think the patch is actually really simple, I can get another PR going right now, or we can revert and re-apply with it fixed. I'm fine with either.

diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java
index 030c56ba59..e20a8641b6 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java
@@ -44,6 +44,7 @@ import com.fasterxml.jackson.databind.node.TreeTraversingParser;
 import com.fasterxml.jackson.databind.ser.DefaultSerializerProvider;
 import com.fasterxml.jackson.databind.type.TypeBindings;
 import com.fasterxml.jackson.databind.util.SimpleBeanPropertyDefinition;
+import com.fasterxml.jackson.databind.util.TokenBuffer;
 import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
 import java.beans.BeanInfo;
 import java.beans.IntrospectionException;
@@ -499,7 +500,8 @@ public class PipelineOptionsFactory {
 
   private static final DefaultDeserializationContext DESERIALIZATION_CONTEXT =
       new DefaultDeserializationContext.Impl(MAPPER.getDeserializationContext().getFactory())
-          .createInstance(MAPPER.getDeserializationConfig(), null, null);
+          .createInstance(
+              MAPPER.getDeserializationConfig(), new TokenBuffer(MAPPER, false).asParser(), null);

You can get both PRs out (forward fix and rollback) and run the additional failing tests in the forward fix.

@steveniemitz
Copy link
Contributor Author

and run the additional failing tests in the forward fix.

Forgive the noob question, is there a way to run these post-commit tests on a PR?

@kw2542
Copy link
Contributor

kw2542 commented Sep 1, 2021

and run the additional failing tests in the forward fix.

Forgive the noob question, is there a way to run these post-commit tests on a PR?

I am testing out the rollback at #15438 where you can see how these tests are being triggered.

@steveniemitz
Copy link
Contributor Author

Thanks!

@steveniemitz
Copy link
Contributor Author

steveniemitz commented Sep 1, 2021

roll-forward PR up as well: #15439 , I'm not sure if the trigger phrase worked for that one though, maybe I don't have permissions?

I'd love to figure out how to reproduce this in a unit test as well, it seems like we're missing coverage over something pretty fundamental.

@steveniemitz
Copy link
Contributor Author

looks like the failing tests are fixed in #15439 now, so we can either roll forward with that or roll back w/ the revert. I'd prefer to roll forward but I'll leave that up to you :)

@kw2542
Copy link
Contributor

kw2542 commented Sep 1, 2021

looks like the failing tests are fixed in #15439 now, so we can either roll forward with that or roll back w/ the revert. I'd prefer to roll forward but I'll leave that up to you :)

+1, let's fix forward.

@tvalentyn
Copy link
Contributor

Forgive the noob question, is there a way to run these post-commit tests on a PR?

There is a link in PR template: See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants