Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-11042] Remove DirectRunner dependency on deprecated CreatePCollectionView primitive #13043

Merged

Conversation

kennknowles
Copy link
Member

@kennknowles kennknowles commented Oct 7, 2020

I built this change on top of #13042 and confirmed that the direct runner's ValidatesRunner still passed. I confirmed with grep -r CreatePCollectionView runners/direct-java that there are no more mentions in the direct runner.

Summary of how the change works:

  • For all the ParDo.MultiOutput primitives, attach a WriteView direct runner step to each side input. This has the same behavior as today but does not depend on having a CreatePCollectionView primitive in the SDK graph.
  • Since CreatePCollectionView is still in the graph and it is not that easy to delete, just attach a noop evaluation. When CreatePCollectionView is deleted soon, this can be removed.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Dataflow Flink Samza Spark Twister2
Go Build Status --- Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status --- Build Status --- Build Status ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status --- --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@kennknowles
Copy link
Member Author

As it turns out, I tested this on a branch where I had removed CreatePCollectionView from the expansion, and when it is added batch everything crashes of course because a primitive handler is missing.

@kennknowles kennknowles force-pushed the portable-PCollectionView-DirectRunner branch 3 times, most recently from fa8231d to 9c97490 Compare October 7, 2020 23:57
@kennknowles
Copy link
Member Author

This is fun. If the rewrites are in one order, just FileIOTest.testFileIoDynamicNaming fails, because the view is not written. If you swap the overrides and the view writing, then all sorts of SDF and stateful DoFn tests fail for a similar reason.

@kennknowles kennknowles force-pushed the portable-PCollectionView-DirectRunner branch 2 times, most recently from 45ac573 to bb30001 Compare October 8, 2020 00:44
@kennknowles kennknowles force-pushed the portable-PCollectionView-DirectRunner branch 5 times, most recently from b7b51e4 to de34ae4 Compare October 8, 2020 06:43
@kennknowles
Copy link
Member Author

Fixed up the issue. It was the heavily mocked out direct runner tests that needed to be adjusted.

Please take a look.

Copy link
Contributor

@boyuanzz boyuanzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. Overall looks good to me except one minor comment.

import org.apache.beam.sdk.util.WindowedValue;
import org.checkerframework.checker.nullness.qual.Nullable;

class CreateViewNoopEvaluatorFactory implements TransformEvaluatorFactory {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some comments here to explain why we need a no-op here? Also can we attach a TODO here to clean it up in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a big comment and BEAM-11049.

@kennknowles kennknowles force-pushed the portable-PCollectionView-DirectRunner branch from de34ae4 to e2b129f Compare October 9, 2020 16:49
Copy link
Member Author

@kennknowles kennknowles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Because you approved with one comment, I already fixed the commit history and force pushed to the PR branch. I will wait for some green to make sure I didn't accidentally insert an error when editing the comments, then merge.

import org.apache.beam.sdk.util.WindowedValue;
import org.checkerframework.checker.nullness.qual.Nullable;

class CreateViewNoopEvaluatorFactory implements TransformEvaluatorFactory {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a big comment and BEAM-11049.

@kennknowles
Copy link
Member Author

Run Java PreCommit

@kennknowles
Copy link
Member Author

Run Java PreCommit

(flake in attempted metrics)

@kennknowles
Copy link
Member Author

Run Java PreCommit

@kennknowles
Copy link
Member Author

Same failure. Since it was green before and I only changed comments, I still believe it is a flake. I wonder if this PR changed some flake rate.

@kennknowles kennknowles merged commit bd93c5f into apache:master Oct 12, 2020
@kennknowles kennknowles deleted the portable-PCollectionView-DirectRunner branch October 12, 2020 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants