Skip to content

Conversation

@rohdesamuel
Copy link
Contributor

@rohdesamuel rohdesamuel commented Mar 13, 2020

The problem was that the StreamingCache needed a way to communicate with the BackgroundCachingJob that it was done, so it knew when to stop tailing the file. This was previously done with capturing the user_pipeline when making the StreamingCache. However, the StreamingCache is used as a global cache for all pipelines. Thus, when the pipeline was redefined the StreamingCache had a stale reference to the previous pipeline and would loop forever.

The fix is to change the cache key of PCollection to be pipeline specific. In this PR, it can be seen that the pipeline id is used in the cache key. This is used to tell the StreamingCache what pipeline it's associated with. The StreamingCache can't be created with the user_pipeline because the pipeline_instrument object is dependent on the cache which creates the user pipeline.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@rohdesamuel rohdesamuel force-pushed the streamingcache_refactor branch 3 times, most recently from 8f2efd8 to 68f2c6d Compare March 16, 2020 21:56
@rohdesamuel rohdesamuel changed the title Streamingcache refactor [BEAM-9524] Fix for ib.show() executing indefinitely Mar 16, 2020
@rohdesamuel rohdesamuel force-pushed the streamingcache_refactor branch from 68f2c6d to 8cb62c1 Compare March 16, 2020 22:11
@rohdesamuel rohdesamuel marked this pull request as ready for review March 16, 2020 22:11
@rohdesamuel
Copy link
Contributor Author

R: @pabloem

@rohdesamuel rohdesamuel force-pushed the streamingcache_refactor branch from 8cb62c1 to 3e1d606 Compare March 16, 2020 22:49
@pabloem
Copy link
Member

pabloem commented Mar 17, 2020

retest this please

…estart

Change-Id: I53aa32a75645086efffa091a53880a076c3a689d
Change-Id: I1ab6e7036172d7e2d07c774778a50e165df6bdca
@rohdesamuel rohdesamuel force-pushed the streamingcache_refactor branch from 3e1d606 to 54a7e19 Compare March 19, 2020 01:02
@pabloem
Copy link
Member

pabloem commented Mar 19, 2020

retest this please

1 similar comment
@pabloem
Copy link
Member

pabloem commented Mar 19, 2020

retest this please


@staticmethod
def from_str(r):
split = r.split('|')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check length of the list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think probably a better way is to do the following:

var, version, producer_version, pipeline_id = r.split('|')
return CacheKey(var, version, producer_version, pipeline_id)

@pabloem
Copy link
Member

pabloem commented Mar 19, 2020

retest this please

1 similar comment
@pabloem
Copy link
Member

pabloem commented Mar 19, 2020

retest this please

Change-Id: I247f37cd7acffb6ad796ce0fa8b54b0feff400d1
@pabloem
Copy link
Member

pabloem commented Mar 19, 2020

retest this please

2 similar comments
@pabloem
Copy link
Member

pabloem commented Mar 19, 2020

retest this please

@pabloem
Copy link
Member

pabloem commented Mar 19, 2020

retest this please

@pabloem pabloem merged commit d545ece into apache:master Mar 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants