Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-7824] Sets a default environment for Dataflow runner #9165

Merged
merged 1 commit into from Jul 30, 2019

Conversation

chamikaramj
Copy link
Contributor

  • Sets a default environment for Dataflow runner
  • Also sets the default environment and environments received from External transform to runner API proto.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@chamikaramj
Copy link
Contributor Author

R: @ihji

CC: @robertwb @angoenka

@chamikaramj
Copy link
Contributor Author

Run Python PostCommit

@chamikaramj
Copy link
Contributor Author

Run Python 2 PostCommit

@@ -952,7 +969,8 @@ def _get_required_container_version(job_type=None):
current version of the SDK.
"""
if 'dev' in beam_version.__version__:
if job_type == 'FNAPI_BATCH' or job_type == 'FNAPI_STREAMING':
if (job_type == JOB_TYPE_PYTHON_BATCH or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python-batch or python-fnapi-streaming?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted.

@@ -927,7 +944,7 @@ def get_default_container_image_for_current_sdk(job_type):
% str(sys.version_info[0:2]))

# TODO(tvalentyn): Use enumerated type instead of strings for job types.
if job_type == 'FNAPI_BATCH' or job_type == 'FNAPI_STREAMING':
if job_type == JOB_TYPE_PYTHON_BATCH or job_type == JOB_TYPE_FNAPI_STREAMING:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FNAPI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted.

from apache_beam.portability.api import beam_runner_api_pb2
default_container_image = (
apiclient.get_default_container_image_for_current_sdk(
apiclient.get_job_type(options)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value is only used for FnApi. Additionally, FnApi doesn't have different containers for streaming vs. batch. It's too bad we have to go through the (irrelevant for portability) job type to get this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated function to take a bool (use_fnapi) instead of job type.

@chamikaramj
Copy link
Contributor Author

Thanks Robert. PTAL.

Copy link
Contributor

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one minor comment.


def proto(self):
"""Runner API payload for a `PTransform`"""
return self._proto

def to_runner_api(self, context, has_parts=False):
id_to_proto_map = self._context.environments.get_id_to_proto_map()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit fragile in that we're assuming that if the ids match, the protos match (which could be bad for auto-generated names like env0). Could you add a check for this.

It also seems that we're copying too much (every environment from the context, not just the one(s) referenced from this proto), but perhaps there's no good way to get around that. Could you at least add a TODO referencing the JIRA about making environment a top-level attribute of transforms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check for environments.

Created https://issues.apache.org/jira/browse/BEAM-7850 for making Environment a top level attribute of PTransform.

Copy link
Contributor Author

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.


def proto(self):
"""Runner API payload for a `PTransform`"""
return self._proto

def to_runner_api(self, context, has_parts=False):
id_to_proto_map = self._context.environments.get_id_to_proto_map()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check for environments.

Created https://issues.apache.org/jira/browse/BEAM-7850 for making Environment a top level attribute of PTransform.

@chamikaramj
Copy link
Contributor Author

Run Python_PVR_Flink PreCommit

@chamikaramj
Copy link
Contributor Author

Flink failure seems to be unrelated.

@chamikaramj chamikaramj merged commit 0f2c9dd into apache:master Jul 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants