Skip to content

[BEAM-3074] Stage the portable pipeline; put URL in pipeline options#4090

Merged
asfgit merged 1 commit into
apache:masterfrom
kennknowles:py-stage-pipeline-with-options
Nov 13, 2017
Merged

[BEAM-3074] Stage the portable pipeline; put URL in pipeline options#4090
asfgit merged 1 commit into
apache:masterfrom
kennknowles:py-stage-pipeline-with-options

Conversation

@kennknowles
Copy link
Copy Markdown
Member

Follow this checklist to help us incorporate your contribution quickly and easily:

  • Make sure there is a JIRA issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes.
  • Each commit in the pull request should have a meaningful subject line and body.
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Run mvn clean verify to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

@kennknowles
Copy link
Copy Markdown
Member Author

R: @aaltay

Same as last time; new place to put the info. The way I inspect pipeline options in the tests seems unpleasant.

@kennknowles
Copy link
Copy Markdown
Member Author

R: @robertwb wdyt?

Copy link
Copy Markdown
Contributor

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree these tests are kind of icky, but it seems in line with the rest of the tests here. Some questions about the deleted lines though.

kind='local' if self.local else 'harness',
packages=package_descriptors,
# https://issues.apache.org/jira/browse/BEAM-3116
# metadata=dataflow.WorkerPool.MetadataValue(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this line be uncommented rather than deleted?

servicePath=self.google_cloud_options.dataflow_endpoint)))

# https://issues.apache.org/jira/browse/BEAM-3116
# pool.metadata.additionalProperties.append(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

if additionalProperty.key == 'options':
recovered_options = additionalProperty.value
break
self.assertIsNotNone(pipeline_options,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for loops can have an "else" clause that's triggered if no break was encountered.

if additionalProperty.key == 'options':
recovered_options = additionalProperty.value
break
self.assertIsNotNone(pipeline_options,
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

robertwb wrote:
for loops can have an "else" clause that's triggered if no break was encountered.

Nice. Done.

kind='local' if self.local else 'harness',
packages=package_descriptors,
# https://issues.apache.org/jira/browse/BEAM-3116
# metadata=dataflow.WorkerPool.MetadataValue(),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

robertwb wrote:
Should this line be uncommented rather than deleted?

Actually they can't work right now. The issue is an internal Dataflow bug that manifests as BEAM-3116. TL;DR: we cannot use worker pool metadata at all.

Template jobs are encoded in proto3 JSON format, more or less, and when they are run they are decoded via proto2 JSON format. This bug went undetected until now because there were no nonempty maps. Longer term, templates should use a proto binary encoding and perhaps be created service-side.

For now, putting this in pipeline options instead, as a generic "bag of values".

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we leave these comments here as a reminder that we need to fix the bug and pass the pool.metadata (here or elsewhere)? Or is this just some cleanup (that's irrelevant to the other changes in this PR?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an internal bug filed, and discussion on going, on fixing the pool metadata situation as well as designing the very best place for this sort of thing. IMO for the purposes of this PR both pool metadata and pipeline options are both sort of fine places to put this data, but neither are obviously the Right Thing To Do. So I really do mean to fully abandon ever trying to put this data in the pool metadata.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context. LGTM.

@kennknowles kennknowles force-pushed the py-stage-pipeline-with-options branch from da2a280 to 6e682c6 Compare November 8, 2017 04:33
@kennknowles
Copy link
Copy Markdown
Member Author

It looks like the usual failure in universal_local_runner. I think it should probably be sickbayed given how many builds it hits.

@kennknowles
Copy link
Copy Markdown
Member Author

run python precommit

@kennknowles kennknowles force-pushed the py-stage-pipeline-with-options branch 2 times, most recently from 84691da to 101339d Compare November 9, 2017 04:38
@kennknowles kennknowles force-pushed the py-stage-pipeline-with-options branch from 101339d to 97bfdcb Compare November 9, 2017 22:07
@kennknowles
Copy link
Copy Markdown
Member Author

PTAL - fixed up lint issues.

@asfgit asfgit merged commit 97bfdcb into apache:master Nov 13, 2017
asfgit pushed a commit that referenced this pull request Nov 13, 2017
…n pipeline options

  Stage the portable pipeline; put URL in pipeline options
@kennknowles kennknowles deleted the py-stage-pipeline-with-options branch April 24, 2018 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants