Skip to content

[BEAM-818,BEAM-828] Make tempLocation an inaccessible ValueProvider#2847

Closed
kennknowles wants to merge 1 commit intoapache:masterfrom
kennknowles:runtime-ValueProvider
Closed

[BEAM-818,BEAM-828] Make tempLocation an inaccessible ValueProvider#2847
kennknowles wants to merge 1 commit intoapache:masterfrom
kennknowles:runtime-ValueProvider

Conversation

@kennknowles
Copy link
Member

@kennknowles kennknowles commented May 2, 2017

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

  • Make sure the PR title is formatted like:
    [BEAM-<Jira issue #>] Description of pull request
  • Make sure tests pass via mvn clean verify.
  • Replace <Jira issue #> in the title with the actual Jira issue
    number, if there is one.
  • If this contribution is large, please file an Apache
    Individual Contributor License Agreement.

This is my first foray into ValueProvider and PipelineOptions excitement. The goal: allow users to access the tempLocation that they specify but make it inaccessible to PTransform.expand(..) implementations.

The natural way to do this would be to clone and alter the pipeline options, something that runners also ought to be doing when they tweak them like TestDataflowRunner does. I am informed this is not possible or feasible in a short timeframe.

So this is a hack that might not actually work that lets get() succeed but makes isAccessible() false. Throwing it out there as a strawman as I don't see a way to do this correctly given today's constraints.

Once an approach is decided upon, I will propagate as needed, to e.g. gcpTempLocation.

@kennknowles
Copy link
Member Author

R: @lukecwik @dhalperi please tell me how wrong this is :-)

@kennknowles kennknowles force-pushed the runtime-ValueProvider branch from ebfb87a to b970b22 Compare May 3, 2017 02:14
@kennknowles
Copy link
Member Author

kennknowles commented May 3, 2017

I think I am going to close this, based on this logic:

  1. If a transform's expand(input) uses input.getPipeline().getOptions() to access required always-available things like tempLocation, this can be mostly mechanically replaced with ProcessContext.getPipelineOptions(). I think this is every use in BigQueryIO.
  2. If a transform's expand(input) uses input.getPipeline().getOptions() to configure itself, then these should be builder methods accepting ValueProvider and the existing types and design patterns for that are just right. I don't know of any uses like this at the moment.
  3. A transform's validate(PipelineOptions) method is now called after construction is complete; it is OK to give it PipelineOptions, though they should be moved to the actual execution environment. This is infeasible in the short term and needs more design work.

I picked up this work sort of out of nowhere and didn't glean this context from my readings. Now I think my approach is not just a hack, but entirely wrong.

@kennknowles kennknowles closed this May 3, 2017
@kennknowles kennknowles deleted the runtime-ValueProvider branch May 26, 2017 05:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant