Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-8374] Enable returning missing PublishResult fields in SnsIO.Write #9758

Merged
merged 1 commit into from
Mar 17, 2020

Conversation

jfarr
Copy link
Contributor

@jfarr jfarr commented Oct 10, 2019

This PR adds new coders for returning the entire PublishResult object from SnsIO, or the entire PublishResult without HTTP headers. This is necessary in order to check the HTTP status for errors since the current implementation only encodes the messageId field. These new coders can be selected by calling the new withFullPublishResult() and withFullPublishResultWithoutHeaders() methods on SnsIO.Write. The withCoder() method also allows setting a custom coder. This PR also adds coders for common AWS SDK objects ResponseMetadata and SdkHttpMetadata.
R: @lukecwik


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@kennknowles
Copy link
Member

@jhalaria @iemejia

Copy link
Member

@lukecwik lukecwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is a breaking change to any Dataflow streaming pipelines using the PublishResult.

@jfarr
Copy link
Contributor Author

jfarr commented Oct 12, 2019

Note that this is a breaking change to any Dataflow streaming pipelines using the PublishResult.

@lukecwik Thanks for the feedback. I did not realize that, I'm very new to Beam. What do you suggest would be the best way to deal with that?

Copy link
Member

@iemejia iemejia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last week we merged SnsIO based on Amazon SDK for Java v2 I was wondering if these fixes could apply there too. We may consider this optional for this PR but maybe worth for consistency to check if we can have a similar approach there too. CC @cmachgodaddy

@iemejia
Copy link
Member

iemejia commented Oct 14, 2019

One extra issue that I have not really thought about is the size difference that we will have by adding the full headers/sdk metadata to all answers, mmm... now I am hesitating about maybe having a V2 like approach for the cases where we care about the full metadata (akin to FileIO).
What is the use case that is pushing into this approach?

@jfarr
Copy link
Contributor Author

jfarr commented Oct 17, 2019

One extra issue that I have not really thought about is the size difference that we will have by adding the full headers/sdk metadata to all answers, mmm... now I am hesitating about maybe having a V2 like approach for the cases where we care about the full metadata (akin to FileIO).
What is the use case that is pushing into this approach?

@iemejia for our use case we only need the HTTP status. We have written an SNS sink that uses this to check for error responses. I just included HTTP headers for completeness but I can remove them if it's a concern. In that case I wonder if a separate coder for SdkHttpMetadata makes sense though.

@jfarr
Copy link
Contributor Author

jfarr commented Oct 17, 2019

Last week we merged SnsIO based on Amazon SDK for Java v2 I was wondering if these fixes could apply there too. We may consider this optional for this PR but maybe worth for consistency to check if we can have a similar approach there too. CC @cmachgodaddy

Sure, I work with Cam and we discussed it today. I can apply the changes to v2 in this PR, no problem.

@cmachgodaddy
Copy link
Contributor

cmachgodaddy commented Oct 18, 2019

I wonder if we should add unit test for this coder? I know we did not have it, b/c we had just one property to encode and decode. But, now we have quite a few.

@jfarr
Copy link
Contributor Author

jfarr commented Oct 23, 2019

@iemejia @lukecwik How does this look? If you're happy with it I'll port it over to amazon-web-services2.

Copy link
Member

@lukecwik lukecwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest keeping the "existing" coder and allowing users to choose a coder that encodes all the additional metadata and http response code. This way users who need the additional details could use a builder style method within the IO class like "withMetadata()" or "withFullMessage" which ensures that the full message is encoded. The class comment could say by default we only provide fields X and Y but if you need everything then specify "with..."

By making this change you won't break existing users and it won't impact their performance and it will enable your usecase. Finally for the V2 version we may want to use the "full" coder by default and use the stripped down coder when told via a "withMessageIdOnly()" or something along those lines.

@jfarr
Copy link
Contributor Author

jfarr commented Nov 4, 2019

@lukecwik @iemejia I reverted the original coder and added a new "full" (all fields) and "minimal" (no HTTP headers) coder, and also factored out new coders for the common SDK objects. If this looks more like it then I'll flesh out the unit tests, please let me know.

@jfarr jfarr changed the title [BEAM-8374] Fixes bug in SnsIO PublishResultCoder [BEAM-8374] Enable returning missing PublishResult fields in SnsIO.Write Nov 4, 2019
@jfarr jfarr changed the title [BEAM-8374] Enable returning missing PublishResult fields in SnsIO.Write [WIP][BEAM-8374] Enable returning missing PublishResult fields in SnsIO.Write Nov 4, 2019
Copy link
Member

@lukecwik lukecwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direction looks good, will let you follow up on the comments and fill out the tests.

@jfarr
Copy link
Contributor Author

jfarr commented Dec 2, 2019

Hi @lukecwik thank you for the feedback. I believe all of the requested changes have been addressed now.

@jfarr jfarr changed the title [WIP][BEAM-8374] Enable returning missing PublishResult fields in SnsIO.Write [BEAM-8374] Enable returning missing PublishResult fields in SnsIO.Write Dec 2, 2019
@stale
Copy link

stale bot commented Jan 31, 2020

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Jan 31, 2020
@jfarr
Copy link
Contributor Author

jfarr commented Feb 1, 2020

Hi @lukecwik @iemejia this PR was marked stale. I believe I have addressed all of your feedback. Is there anything else you would like me to change?

@stale stale bot removed the stale label Feb 1, 2020
@lukecwik
Copy link
Member

lukecwik commented Feb 4, 2020

Sorry for the delay, taking a look now.

Copy link
Member

@lukecwik lukecwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, mostly minor feedback.

@jfarr
Copy link
Contributor Author

jfarr commented Mar 8, 2020

Hi @lukecwik and @iemejia, I believe I have incorporated all of your review feedback. I also made a few changes to the interface that I hope will improve clarity:

  • Instead of the confusingly named fullSdkHttpMetadata() and minimalSdkHttpMetadata() methods, the AwsCoders class now has a method sdkHttpMetadata() which creates a coder that by default does not include the response headers, and if you want to include the headers there is also a method sdkHttpMetadata(boolean includeHeaders). I'm on the fence about getting rid of the boolean parameter and just including a method like sdkHttpMetadataIncludingHeaders() so let me know if you have a preference.

  • I changed the names of the SnsIO methods to withFullPublishResultNoHeaders and withFullPublishResultIncludingHeaders. The default is still to encode only the messageId for backward compatibility. I also added a method to specify your own coder for future extensibility.

I hope this PR is ready to merge now, please let me know if it needs any additional changes.

@lukecwik
Copy link
Member

Thanks for returning back to this. Taking a look now.

Copy link
Member

@lukecwik lukecwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added my suggestions to name changes inline as responses to your comments.

Copy link
Member

@lukecwik lukecwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run Java PreCommit

@lukecwik
Copy link
Member

Run Java PreCommit

@jfarr
Copy link
Contributor Author

jfarr commented Mar 11, 2020

Thanks for the feedback @lukecwik. I've addressed your comments and also added one more unit test in SnsIOTest. Please let me know what you think now.

@lukecwik
Copy link
Member

retest this please

@jfarr
Copy link
Contributor Author

jfarr commented Mar 13, 2020

@lukecwik thanks for the suggestions!

@jfarr
Copy link
Contributor Author

jfarr commented Mar 13, 2020

One last question @lukecwik, would you prefer to apply these changes in amazon-web-services2 in this PR? Or should we take that to a separate PR? That might be a good opportunity to discuss making some improvements to how errors are handled in this IO. In my opinion it's somewhat suboptimal that IOExceptions and the like are handled in the IO but HTTP error responses have to be handled in the caller, for example.

Copy link
Contributor

@cmachgodaddy cmachgodaddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

* @return the SdkHttpMetadata coder
*/
public static Coder<SdkHttpMetadata> sdkHttpMetadata() {
return new SdkHttpMetadataCoder(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why this coder is not following the pattern coder.of(xxx) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmachgodaddy That's because this class is designed as a factory for coders, with the coder implementation being an internal detail, as @lukecwik suggested in an earlier comment thread.

Style note, you can create one class called AwsCoders which has three static methods minimalSdkHttpMetadata() and fullSdkHttpMetadata() and responseMetadata() that return the coders. This way you can make all the coder implementations private inner static classes and they all become implementation details without needing to expose them to users. Users can still get an instance if they need one but it gives us flexibility from a maintenance point of view how the coders are organized.

I could create a static factory method of() on SdkHttpMetadata but it wouldn't serve any purpose, it would only be called by sdkHttpMetadata() which is already a static factory method. SdkHttpMetadata is not meant to be instantiated directly by the caller.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of() prevents moving the class while a top level static method allows for stuff to still be moved around.

}

@Override
public void verifyDeterministic() throws NonDeterministicException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And one more curiosity, why we have to implement verifyDeterministic for this coder?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default coders are considered non-deterministic which prevent them from being used as the key coder for many things (GBK, map/multimap side input keys, map state keys, ...).

Implementing it allows us to be deterministic if the coders we use are deterministic. So if the internally used coders become deterministic then we get that for free.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CustomCoder's implementation always throws NonDeterministicException so this seemed more correct.

@Override
public void verifyDeterministic() throws NonDeterministicException {
throw new NonDeterministicException(
this,
"CustomCoder implementations must override verifyDeterministic,"
+ " or they are presumed nondeterministic.");
}

@@ -32,6 +32,7 @@
@Override
public List<CoderProvider> getCoderProviders() {
return ImmutableList.of(
CoderProviders.forCoder(TypeDescriptor.of(PublishResult.class), PublishResultCoder.of()));
CoderProviders.forCoder(
TypeDescriptor.of(PublishResult.class), PublishResultCoders.defaultPublishResult()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, any reasons this is not XXXcoder.of(...) pattern?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's sort of the same reason as above, PublishResultCoders is a factory for creating several different coder implementations so the factory methods have different names to indicate which one. I could rename this method to of() but since we are introducing new behavior I wanted to make it very clear that this is the default, or existing, behavior. Also I'm not sure if it makes sense to have a method called of() on the factory not on the coder itself.

@lukecwik
Copy link
Member

lukecwik commented Mar 13, 2020

One last question @lukecwik, would you prefer to apply these changes in amazon-web-services2 in this PR? Or should we take that to a separate PR? That might be a good opportunity to discuss making some improvements to how errors are handled in this IO. In my opinion it's somewhat suboptimal that IOExceptions and the like are handled in the IO but HTTP error responses have to be handled in the caller, for example.

I would prefer a separate PR so that people can start using this sooner then later. Also smaller PRs are easier to review.

@lukecwik
Copy link
Member

@iemejia Any final comments?

Otherwise I intend to merge when the tests are green.

@lukecwik
Copy link
Member

Run Java PreCommit

1 similar comment
@lukecwik
Copy link
Member

Run Java PreCommit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants