Skip to content

[BEAM-2863] Add the ability to length prefix unknown coders#4377

Merged
lukecwik merged 1 commit intoapache:masterfrom
lukecwik:side_input4
Jan 12, 2018
Merged

[BEAM-2863] Add the ability to length prefix unknown coders#4377
lukecwik merged 1 commit intoapache:masterfrom
lukecwik:side_input4

Conversation

@lukecwik
Copy link
Member

@lukecwik lukecwik commented Jan 10, 2018

Follow this checklist to help us incorporate your contribution quickly and easily:

  • Make sure there is a JIRA issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes.
  • Each commit in the pull request should have a meaningful subject line and body.
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Run mvn clean verify to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Add the ability to length prefix unknown coders using the portable representation allowing a Runner to not need to know about all coder representations.

This is towards supporting the side inputs over the portability framework but can also be used for the data plane.

RunnerApi.Components components,
boolean replaceWithByteArrayCoder) {

MessageWithComponents.Builder builder = MessageWithComponents.newBuilder();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the variable name builder is giving me headaches when combined with getSpecBuilder().getSpecBuilder() and other builders flying around; Mind converting into something like resultBuilder?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* @param replaceWithByteArrayCoder whether to replace an unknown coder with a
* {@link ByteArrayCoder}.
* @return A {@link MessageWithComponents} with the
* {@link MessageWithComponents#getCoder() root coder} and its component coders.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it to be notable that no IDs in the result will collide with any IDs in the input components for different coders.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

builder.setComponents(components);
}
} else if (WELL_KNOWN_CODER_URNS.contains(currentCoder.getSpec().getSpec().getUrn())) {
RunnerApi.Coder.Builder updatedCoder = currentCoder.toBuilder();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems sufficiently involved (given the recursive subcomponent rewriting) that it could stand being in a separate method. This might be true for the bulk of the replaceWithByteArrayCoder branches as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split out to different methods covering the different cases.


/** Tests for {@link LengthPrefixUnknownCoders}. */
@RunWith(JUnit4.class)
public class LengthPrefixUnknownCodersTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it would read better as a Parameterized test (with the [Original, Result, replaceWithByteArray] as your parameters)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

/** Test replacing unknown coders with {@code LengthPrefixCoder<ByteArray>}. */
@Test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test including LengthPrefixCoder at the top level?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

LengthPrefixCoder.of(ByteArrayCoder.of())),
GlobalWindow.Coder.INSTANCE);

/** Test wrapping unknown coders with {@code LengthPrefixCoder}. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never remember if UTF-8 Strings are well known or not; maybe consider a CustomCoder just for the obviousness of it being unknown?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@tgroh tgroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor style things, otherwise lgtm

String lengthPrefixComponentCoderId = coderId;
if (replaceWithByteArrayCoder) {
return createLengthPrefixByteArrayCoder(coderId, components);
// lengthPrefixComponentCoderId = generateUniqueId(coderId + "-byte_array",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm the comments

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Forgot that this was here.

public static Collection<Object[]> data() {
return ImmutableList.of(
/** Test wrapping unknown coders with {@code LengthPrefixCoder}. */
new Object[] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the use of an AutoValue inner class here instead of an Object[]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to stick with the Object[] for now since it will make the migration to JUnit 5 parameterized tests simpler.

Coder<?> original, Coder<?> expected, boolean replaceWithByteArray) throws IOException {
@Test
public void test() throws IOException {
MessageWithComponents messageWithComponents = CoderTranslation.toProto(original);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like 'originalCoderProto', 'lengthPrefixedCoderProto', etc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

… portable representation allowing a Runner to not need to know about all coder representations.

This is towards supporting the side inputs over the portability framework.
@lukecwik lukecwik merged commit 02bd77a into apache:master Jan 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants