[BEAM-498] Replace ParDo with MapElements and FlatMapElements where possible#756
[BEAM-498] Replace ParDo with MapElements and FlatMapElements where possible#756kennknowles wants to merge 1 commit intoapache:masterfrom
Conversation
| private final SerializableFunction<InputT, OutputT> fn; | ||
| private final transient TypeDescriptor<OutputT> outputType; | ||
| private final SimpleFunction<InputT, OutputT> fn; | ||
| private final Class<?> fnClass; // for display data purposes |
There was a problem hiding this comment.
Careful, storing Class<?> instances can be problematic. Anonymous classes backed by Java 8 lambdas will explode during serialization. See CombineJava8Test#testLambdaSerialization(). In Combine.java, we store the DisplayData.Item<?> instead.
Recommend writing a Java8 test for this class as well. I wonder: would be it be easy to write some FindBugs-like static analysis that would catch Serializable classes with non-transient Class<?> instance fields?
There was a problem hiding this comment.
Thank you! I did not identify the cause of the failure, but this was certainly the issue. Whatever I store does have to be serialized and non-transient, because of the unfortunate issue with ParDo. Your idea solved it. I'm going to drop the commit from this PR, please review #757 that is focused on just that.
|
This is small enough to review as two commits, but I'd recommend fixing up the history on merge to separate out the changes in two commits. |
|
R: -@swegner love the feedback but this PR may or may not remain interesting to you |
30c5559 to
79cd37b
Compare
|
PTAL. Rebased and fixed up. |
| */ | ||
| public class MapElements<InputT, OutputT> | ||
| extends PTransform<PCollection<InputT>, PCollection<OutputT>> { | ||
| extends PTransform<PCollection<? extends InputT>, PCollection<OutputT>> { |
There was a problem hiding this comment.
Were the changes to MapElements not in a different PR already?
There was a problem hiding this comment.
This particular change was not. There were cases that could not be ported without it, since the new generic makes it the same type as ParDo.
0e3b412 to
85e3da9
Compare
|
This actually looks like a Jenkins race condition or some such. The errors in that build are not part of this PR. Rebasing to kick... |
ac329b1 to
6befb9c
Compare
There are a number of places in the Java SDK where we use ParDo.of(DoFn) when MapElements or other higher-level composites are applicable and readable. This change alters a number of those.
|
R: -@bjchambers I had added you since you were very in-the-loop on both the new R: @aljoscha care to take a look? The tests have been passing for a while, but with some Travis timeouts. I just rebased to force Travis to run again in its new, faster, configuration. |
|
LGTM |
* chore(deps): update all dependencies * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> Co-authored-by: Daniel Sanche <sanche@google.com>
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
[BEAM-<Jira issue #>] Description of pull requestmvn clean verify. (Even better, enableTravis-CI on your fork and ensure the whole test matrix passes).
<Jira issue #>in the title with the actual Jira issuenumber, if there is one.
Individual Contributor License Agreement.
The commits ended up having fairly separate topics, but can be reviewed individually or as a medium-sized change.
ParDowithMapElementsandFlatMapElementswhere it is easy to do so.DoFnused a less-powerful form ofTypeDescriptorand switched trivially to the enhanced version.MapElementsandFlatMapElementswas a lack of use of the input type descriptor. Making it available involved a moderate refactor. In the process I broke some tests to do with display data and fixed them plus enhancements to display data forSimpleFunction.If reviewers insist, I can try to alter this commit history.
R: @bjchambers AND @swegner