Skip to content

Fix CloudRunExecuteJobOperator ignoring task failures in deferrable mode#67767

Closed
shahar1 wants to merge 1 commit into
apache:mainfrom
shahar1:fix/cloud-run-deferrable-cancellation
Closed

Fix CloudRunExecuteJobOperator ignoring task failures in deferrable mode#67767
shahar1 wants to merge 1 commit into
apache:mainfrom
shahar1:fix/cloud-run-deferrable-cancellation

Conversation

@shahar1
Copy link
Copy Markdown
Contributor

@shahar1 shahar1 commented May 30, 2026

This PR continues and replaces #62278 by @Ajay9704.

Summary

When CloudRunExecuteJobOperator runs with deferrable=True and the underlying Cloud Run Job is cancelled (e.g. via the Google Cloud UI or API), the LRO completes without setting operation.error — cancelled tasks appear in cancelled_count rather than failed_count. The original deferrable path therefore silently returned success instead of failing, diverging from the non-deferrable behavior.

The trigger (CloudRunJobFinishedTrigger) was already fixed in main to detect this by deserialising the Execution proto and emitting a FAIL event. This PR adds a matching defensive check in execute_complete for the case where a SUCCESS event still carries task-count fields indicating an incomplete or partially-failed run — mirroring the validation already done on the non-deferrable path via _fail_if_execution_failed.

Changes

  • execute_complete: raises RuntimeError when succeeded_count + failed_count != task_count or when failed_count > 0.
  • Tests: two new unit tests covering the cancelled-job and failed-tasks scenarios.

closes: #57791


Was generative AI tooling used to co-author this PR?
  • Yes — Claude Code (claude-sonnet-4-6)

Generated-by: Claude Code (claude-sonnet-4-6) following the guidelines

@boring-cyborg boring-cyborg Bot added area:providers provider:google Google (including GCP) related issues labels May 30, 2026
@shahar1 shahar1 force-pushed the fix/cloud-run-deferrable-cancellation branch from 32d0167 to 13d1be3 Compare May 30, 2026 06:21
When a Cloud Run Job's trigger receives a SUCCESS event it may still
carry task-count fields indicating that not all tasks finished (e.g.
a cancelled execution that resolved without an operation error).

Add a defensive check in execute_complete that raises RuntimeError when
succeeded_count + failed_count != task_count or when failed_count > 0,
mirroring the validation already done in the non-deferrable path via
_fail_if_execution_failed.

Add unit tests covering both the cancelled (incomplete tasks) and
failed-tasks scenarios.

closes: apache#57791
@shahar1 shahar1 force-pushed the fix/cloud-run-deferrable-cancellation branch from 13d1be3 to 635559d Compare May 30, 2026 06:30
@shahar1 shahar1 changed the title Fix CloudRunExecuteJobOperator fails when Cloud Run job is cancelled in deferrable mode Fix CloudRunExecuteJobOperator ignoring task failures in deferrable mode May 30, 2026
@shahar1
Copy link
Copy Markdown
Contributor Author

shahar1 commented May 30, 2026

Already fixed in main

@shahar1 shahar1 closed this May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CloudRunExecuteJobOperator with deferrable=True succeeds when CRJ is canceled

1 participant