[BEAM-2863] Add the ability to length prefix unknown coders#4377
[BEAM-2863] Add the ability to length prefix unknown coders#4377lukecwik merged 1 commit intoapache:masterfrom
Conversation
| RunnerApi.Components components, | ||
| boolean replaceWithByteArrayCoder) { | ||
|
|
||
| MessageWithComponents.Builder builder = MessageWithComponents.newBuilder(); |
There was a problem hiding this comment.
the variable name builder is giving me headaches when combined with getSpecBuilder().getSpecBuilder() and other builders flying around; Mind converting into something like resultBuilder?
| * @param replaceWithByteArrayCoder whether to replace an unknown coder with a | ||
| * {@link ByteArrayCoder}. | ||
| * @return A {@link MessageWithComponents} with the | ||
| * {@link MessageWithComponents#getCoder() root coder} and its component coders. |
There was a problem hiding this comment.
I believe it to be notable that no IDs in the result will collide with any IDs in the input components for different coders.
| builder.setComponents(components); | ||
| } | ||
| } else if (WELL_KNOWN_CODER_URNS.contains(currentCoder.getSpec().getSpec().getUrn())) { | ||
| RunnerApi.Coder.Builder updatedCoder = currentCoder.toBuilder(); |
There was a problem hiding this comment.
This seems sufficiently involved (given the recursive subcomponent rewriting) that it could stand being in a separate method. This might be true for the bulk of the replaceWithByteArrayCoder branches as well
There was a problem hiding this comment.
Split out to different methods covering the different cases.
|
|
||
| /** Tests for {@link LengthPrefixUnknownCoders}. */ | ||
| @RunWith(JUnit4.class) | ||
| public class LengthPrefixUnknownCodersTest { |
There was a problem hiding this comment.
This seems like it would read better as a Parameterized test (with the [Original, Result, replaceWithByteArray] as your parameters)
| } | ||
|
|
||
| /** Test replacing unknown coders with {@code LengthPrefixCoder<ByteArray>}. */ | ||
| @Test |
There was a problem hiding this comment.
Test including LengthPrefixCoder at the top level?
| LengthPrefixCoder.of(ByteArrayCoder.of())), | ||
| GlobalWindow.Coder.INSTANCE); | ||
|
|
||
| /** Test wrapping unknown coders with {@code LengthPrefixCoder}. */ |
There was a problem hiding this comment.
I never remember if UTF-8 Strings are well known or not; maybe consider a CustomCoder just for the obviousness of it being unknown?
tgroh
left a comment
There was a problem hiding this comment.
minor style things, otherwise lgtm
| String lengthPrefixComponentCoderId = coderId; | ||
| if (replaceWithByteArrayCoder) { | ||
| return createLengthPrefixByteArrayCoder(coderId, components); | ||
| // lengthPrefixComponentCoderId = generateUniqueId(coderId + "-byte_array", |
There was a problem hiding this comment.
Done. Forgot that this was here.
| public static Collection<Object[]> data() { | ||
| return ImmutableList.of( | ||
| /** Test wrapping unknown coders with {@code LengthPrefixCoder}. */ | ||
| new Object[] { |
There was a problem hiding this comment.
I prefer the use of an AutoValue inner class here instead of an Object[]
There was a problem hiding this comment.
I'm going to stick with the Object[] for now since it will make the migration to JUnit 5 parameterized tests simpler.
| Coder<?> original, Coder<?> expected, boolean replaceWithByteArray) throws IOException { | ||
| @Test | ||
| public void test() throws IOException { | ||
| MessageWithComponents messageWithComponents = CoderTranslation.toProto(original); |
There was a problem hiding this comment.
Something like 'originalCoderProto', 'lengthPrefixedCoderProto', etc?
… portable representation allowing a Runner to not need to know about all coder representations. This is towards supporting the side inputs over the portability framework.
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue.mvn clean verifyto make sure basic checks pass. A more thorough check will be performed on your pull request automatically.Add the ability to length prefix unknown coders using the portable representation allowing a Runner to not need to know about all coder representations.
This is towards supporting the side inputs over the portability framework but can also be used for the data plane.