Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add drain option when canceling Dataflow pipelines #11374

Merged
merged 4 commits into from Oct 29, 2020

Conversation

TobKed
Copy link
Contributor

@TobKed TobKed commented Oct 9, 2020


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg bot added the provider:google Google (including GCP) related issues label Oct 9, 2020
@TobKed TobKed marked this pull request as draft October 9, 2020 12:07
@TobKed TobKed changed the title Add drain option when canceling Dataflow pipelines [WIP] Add drain option when canceling Dataflow pipelines Oct 9, 2020
@TobKed TobKed mentioned this pull request Oct 9, 2020
6 tasks
@github-actions
Copy link

github-actions bot commented Oct 9, 2020

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.

for job_id in job_ids:
for job in jobs:
requested_state = (
DataflowJobType.JOB_TYPE_STREAMING
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataflowJobStatus.JOB_STATE_DRAINED instead of DataflowJobType.JOB_TYPE_STREAMING ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fixed.

Copy link
Member

@aaltay aaltay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you.

I know little about airflow, 2 clarifying questions:

  • Would users generally decide on drain vs cancel when they are starting the operation? (Would they change their minds before onkill?)
  • Would they want to wait until drain is completed? Drain may or may not fully complete in 5 minutes timeout window.

@TobKed
Copy link
Contributor Author

TobKed commented Oct 13, 2020

* Would users generally decide on drain vs cancel when they are starting the operation? (Would they change their minds before onkill?)

Users define this parameter in the code which defines the DAG. AFAIK when DAG is running it is not possible to change this parameters afterwards. User can edit DAG and run it again with the new parameters.
@turbaszek could you confirm am I right, please?

* Would they want to wait until drain is completed? Drain may or may not fully complete in 5 minutes timeout window.

I created separate PR to handle timeout, I think it is good idea to allow configure it so users could decide what timeout is suitable for them.
#11501

@TobKed TobKed changed the title [WIP] Add drain option when canceling Dataflow pipelines Add drain option when canceling Dataflow pipelines Oct 14, 2020
@TobKed
Copy link
Contributor Author

TobKed commented Oct 14, 2020

Thanks @aaltay
I rebased on the latest master and added tests.
cc @mik-laj @turbaszek @potiuk

@TobKed TobKed marked this pull request as ready for review October 14, 2020 10:19
@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.

@TobKed TobKed force-pushed the dataflow-add-drain-option branch 3 times, most recently from 8e51958 to 07e3e33 Compare October 22, 2020 07:06
@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.

@TobKed TobKed force-pushed the dataflow-add-drain-option branch 2 times, most recently from a34b65e to ee87cda Compare October 23, 2020 11:03
@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.

@mik-laj mik-laj merged commit e5713e0 into apache:master Oct 29, 2020
@mik-laj mik-laj deleted the dataflow-add-drain-option branch October 29, 2020 13:22
michalmisiewicz pushed a commit to michalmisiewicz/airflow that referenced this pull request Oct 30, 2020
* Add drain option when cancel Dataflow pipelines

* fixup! Add drain option when cancel Dataflow pipelines

* fixup! fixup! Add drain option when cancel Dataflow pipelines

* fixup! fixup! fixup! Add drain option when cancel Dataflow pipelines
szn pushed a commit to szn/airflow that referenced this pull request Nov 1, 2020
* Add drain option when cancel Dataflow pipelines

* fixup! Add drain option when cancel Dataflow pipelines

* fixup! fixup! Add drain option when cancel Dataflow pipelines

* fixup! fixup! fixup! Add drain option when cancel Dataflow pipelines
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants