-
Notifications
You must be signed in to change notification settings - Fork 4.6k
[BEAM-3074] Stage the portable pipeline; put URL in pipeline options #4090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -190,18 +190,11 @@ def __init__(self, packages, options, environment_version, pipeline_url): | |
| pool = dataflow.WorkerPool( | ||
| kind='local' if self.local else 'harness', | ||
| packages=package_descriptors, | ||
| # https://issues.apache.org/jira/browse/BEAM-3116 | ||
| # metadata=dataflow.WorkerPool.MetadataValue(), | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Actually they can't work right now. The issue is an internal Dataflow bug that manifests as BEAM-3116. TL;DR: we cannot use worker pool metadata at all. Template jobs are encoded in proto3 JSON format, more or less, and when they are run they are decoded via proto2 JSON format. This bug went undetected until now because there were no nonempty maps. Longer term, templates should use a proto binary encoding and perhaps be created service-side. For now, putting this in pipeline options instead, as a generic "bag of values".
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't we leave these comments here as a reminder that we need to fix the bug and pass the pool.metadata (here or elsewhere)? Or is this just some cleanup (that's irrelevant to the other changes in this PR?)
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have an internal bug filed, and discussion on going, on fixing the pool metadata situation as well as designing the very best place for this sort of thing. IMO for the purposes of this PR both pool metadata and pipeline options are both sort of fine places to put this data, but neither are obviously the Right Thing To Do. So I really do mean to fully abandon ever trying to put this data in the pool metadata.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the context. LGTM. |
||
| taskrunnerSettings=dataflow.TaskRunnerSettings( | ||
| parallelWorkerSettings=dataflow.WorkerSettings( | ||
| baseUrl=GoogleCloudOptions.DATAFLOW_ENDPOINT, | ||
| servicePath=self.google_cloud_options.dataflow_endpoint))) | ||
|
|
||
| # https://issues.apache.org/jira/browse/BEAM-3116 | ||
| # pool.metadata.additionalProperties.append( | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same. |
||
| # dataflow.WorkerPool.MetadataValue.AdditionalProperty( | ||
| # key=names.STAGED_PIPELINE_URL_METADATA_FIELD, value=pipeline_url)) | ||
|
|
||
| pool.autoscalingSettings = dataflow.AutoscalingSettings() | ||
| # Set worker pool options received through command line. | ||
| if self.worker_options.num_workers: | ||
|
|
@@ -260,6 +253,7 @@ def __init__(self, packages, options, environment_version, pipeline_url): | |
| options_dict = {k: v | ||
| for k, v in sdk_pipeline_options.iteritems() | ||
| if v is not None} | ||
| options_dict["pipelineUrl"] = pipeline_url | ||
| self.proto.sdkPipelineOptions.additionalProperties.append( | ||
| dataflow.Environment.SdkPipelineOptionsValue.AdditionalProperty( | ||
| key='options', value=to_json_value(options_dict))) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this line be uncommented rather than deleted?