Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
[BEAM-8191] Fixes potentially large number of tasks on Spark after Flatten.pCollections() #9544
In the Spark runner (Beam 2.14.0 and 2.15.0), a Flatten.pCollections() PTransform is translated into a Spark union operation. This union will create a potentially large number of partitions in the RDD that can overload the driver.
This PR does a coalesce operation after the union, which will reduce the number of partitions in the RDD at virtually no cost.
See the JIRA ticket for more detail:
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the email@example.com list. Thank you for your contributions.