[BEAM-7750] Don't map transforms to PubSub subscriptions unless necessary#9146
Merged
charlesccychen merged 1 commit intoapache:masterfrom Aug 1, 2019
Merged
Conversation
fa58da1 to
1a4a4f0
Compare
Member
|
LGTM. cc: @charlesccychen in case he has additional comments since he is the original author. |
Contributor
|
Thanks! This LGTM. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Streaming pipelines that read from a PubSub topic and are executed using
BundleBasedDirectRunnerare never garbage collected because the_PubSubReadEvaluator._subscription_cacheclass attribute keeps references to allReadFromPubSubtransforms for the duration of the Python session. This is not ideal in the case of interactive pipeline prototyping and development, where the user may start and stop multiple pipelines from the same sessionThis PR makes two functional changes:
_PubSubReadEvaluator._subscription_cacheto keep references to only those subscriptions that are created by_PubSubReadEvaluator. This means that Pipelines will be correctly garbage collected if the user supplies a subscription, rather than a topic, toReadFromPubSub.atexit.registerrather than__del__to clean up subscriptions created by the_PubSubReadEvaluatorclass. According to the CPython documentation, "it is not guaranteed that__del__()methods are called for objects that still exist when the interpreter exits". Since the_PubSubReadEvaluator._subscription_cacheclass attribute will keep references to all created subscriptions until interpreter exists, it is not guaranteed that__del__()is called for any of those subscriptions. Usingatexit.registeris a better fit in this case. This partially (or fully?) addresses BEAM-2988.CC: @ivant @ananvay
R: @aaltay
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.