Skip to content

[BEAM-7761] Add transform_name_mapping pipeline option for python sdk#9072

Merged
aaltay merged 2 commits intoapache:masterfrom
y1chi:transform_name_mapping
Jul 20, 2019
Merged

[BEAM-7761] Add transform_name_mapping pipeline option for python sdk#9072
aaltay merged 2 commits intoapache:masterfrom
y1chi:transform_name_mapping

Conversation

@y1chi
Copy link

@y1chi y1chi commented Jul 16, 2019

Add transform_name_mapping pipeline option for python sdk.
It is useful for updating an existing pipeline on Google Cloud Dataflow: https://cloud.google.com/dataflow/docs/guides/updating-a-pipeline


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@y1chi y1chi force-pushed the transform_name_mapping branch 3 times, most recently from 6454014 to dfb2704 Compare July 16, 2019 17:34
@y1chi
Copy link
Author

y1chi commented Jul 16, 2019

R: @aaltay

Hi ahmet would you mind take a look on this PR?

Copy link
Member

@aaltay aaltay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Have you tried running a test on Dataflow? Perhaps we can add an integration test for Dataflow.

Could you also add a JIRA issue to the PR?

@y1chi y1chi changed the title Add transform_name_mapping pipeline option for python sdk [BEAM-7761] Add transform_name_mapping pipeline option for python sdk Jul 17, 2019
@aaltay
Copy link
Member

aaltay commented Jul 17, 2019

R; @dustin12 could you take a look at this?

@y1chi
Copy link
Author

y1chi commented Jul 18, 2019

LGTM.

Have you tried running a test on Dataflow? Perhaps we can add an integration test for Dataflow.

Could you also add a JIRA issue to the PR?

tried to add an integration test for dataflow but I guess it's only gonna work after sdk has the option.

Copy link
Contributor

@dustin12 dustin12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe our Java streaming numbers integration test and streaming drain test test this option at least somewhat. Does python have the equivalent tests yet?

if not view.job_name:
errors.extend(self._validate_error(
'Existing job name must be provided when updating a pipeline.'))
if view.transform_name_mapping:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also confirm in here that the job is streaming and update.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added streaming option validation below.

'See https://cloud.google.com/dataflow/pipelines/'
'See https://cloud.google.com/dataflow/docs/guides/'
'updating-a-pipeline')
parser.add_argument('--transform_name_mapping',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell python does not have a place for streaming specific options. Is it worth making it and moving this there?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option is probably more update related, which is currently only support with dataflow runner I guess?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. Update is only supported by Dataflow.

@y1chi y1chi force-pushed the transform_name_mapping branch from b80c4c8 to 3115bab Compare July 19, 2019 19:40
@y1chi
Copy link
Author

y1chi commented Jul 19, 2019

I believe our Java streaming numbers integration test and streaming drain test test this option at least somewhat. Does python have the equivalent tests yet?

I'm adding similar update integration tests for python, but did not find any tests related to transform name mapping.

@y1chi
Copy link
Author

y1chi commented Jul 19, 2019

Run Portable_Python PreCommit

@y1chi
Copy link
Author

y1chi commented Jul 19, 2019

Run Python_PVR_Flink PreCommit

@aaltay aaltay merged commit 7fe54a0 into apache:master Jul 20, 2019
@y1chi y1chi deleted the transform_name_mapping branch July 22, 2019 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants