Skip to content

[BEAM-7750] Don't map transforms to PubSub subscriptions unless necessary#9146

Merged
charlesccychen merged 1 commit intoapache:masterfrom
ostrokach:bugfix/pubsub_reader_global_scope
Aug 1, 2019
Merged

[BEAM-7750] Don't map transforms to PubSub subscriptions unless necessary#9146
charlesccychen merged 1 commit intoapache:masterfrom
ostrokach:bugfix/pubsub_reader_global_scope

Conversation

@ostrokach
Copy link
Contributor

@ostrokach ostrokach commented Jul 24, 2019

Streaming pipelines that read from a PubSub topic and are executed using BundleBasedDirectRunner are never garbage collected because the _PubSubReadEvaluator._subscription_cache class attribute keeps references to all ReadFromPubSub transforms for the duration of the Python session. This is not ideal in the case of interactive pipeline prototyping and development, where the user may start and stop multiple pipelines from the same session

This PR makes two functional changes:

  • Change _PubSubReadEvaluator._subscription_cache to keep references to only those subscriptions that are created by _PubSubReadEvaluator. This means that Pipelines will be correctly garbage collected if the user supplies a subscription, rather than a topic, to ReadFromPubSub.
  • Use atexit.register rather than __del__ to clean up subscriptions created by the _PubSubReadEvaluator class. According to the CPython documentation, "it is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits". Since the _PubSubReadEvaluator._subscription_cache class attribute will keep references to all created subscriptions until interpreter exists, it is not guaranteed that __del__() is called for any of those subscriptions. Using atexit.register is a better fit in this case. This partially (or fully?) addresses BEAM-2988.

CC: @ivant @ananvay
R: @aaltay


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@ostrokach ostrokach changed the title Don't map transforms to PubSub subscriptions unless necessary [BEAM-7750] Don't map transforms to PubSub subscriptions unless necessary Jul 24, 2019
@ostrokach ostrokach force-pushed the bugfix/pubsub_reader_global_scope branch from fa58da1 to 1a4a4f0 Compare July 24, 2019 20:29
@aaltay aaltay requested a review from charlesccychen July 27, 2019 01:19
@aaltay
Copy link
Member

aaltay commented Jul 27, 2019

LGTM.

cc: @charlesccychen in case he has additional comments since he is the original author.

@charlesccychen
Copy link
Contributor

Thanks! This LGTM.

@charlesccychen charlesccychen merged commit 0226799 into apache:master Aug 1, 2019
@ostrokach ostrokach deleted the bugfix/pubsub_reader_global_scope branch August 1, 2019 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants