New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-11934] Remove Dataflow override of streaming WriteFiles with runner determined sharding #15178
Conversation
R: @chamikaramj |
Do you think we should keep the old behavior as an option in case some customers run into issues ? |
We should make it the default for runner v2. @reuvenlax Is it worth making this the default and requiring users to pass in a flag to maintain support for pipeline update for older versions? |
The override was not used because before the introduction of auto sharding, streaming write was required to specify a fixed number of shards otherwise fails:
I forgot to remove the override when I loosed the check to allow runner determined sharding for unbounded data :\ |
I assume that you meant making runner determined sharding default for runner v2. Older versions don't support runner determined sharding for unbounded data (strictly speaking, versions before 2.29.0) so fixed sharding option will be carried over on pipeline update. We could make new pipelines opt in by default. |
I guess the only concern will be users who are upgrading from Beam 2.29.0 or later then, right ? Since it's just a couple versions (and approximately three months) I don't think there's a need to be concerned about preserving backwards compatibility. |
Run Java PreCommit |
Run Java_Examples_Dataflow PreCommit |
SGTM |
PreCommit failures look irrelevant: org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful |
Run Java PreCommit |
cc: |
All tests are passing now :D |
Thanks. LGTM. |
i suspect this PR is causing the WordCount test to become extremely flaky. I think this code actually was being actuated before, and now without it every bundle is becoming a separate file to writet. |
… with runner determined sharding (apache#15178)" This reverts commit ee32c23.
… with runner determined sharding (apache#15178)" This reverts commit ee32c23.
…nner determined sharding (apache#15178) * Remove Dataflow override of streaming WriteFiles * Update the documentation in FileIO * spotless * Fix checkStyle
… with runner determined sharding (apache#15178)" This reverts commit ee32c23.
…nner determined sharding (apache#15178) * Remove Dataflow override of streaming WriteFiles * Update the documentation in FileIO * spotless * Fix checkStyle
… with runner determined sharding (apache#15178)" This reverts commit ee32c23.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
ValidatesRunner
compliance status (on master branch)Examples testing status on various runners
Post-Commit SDK/Transform Integration Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.