Skip to content

DataprocCreateClusterOperator: Fix non-deferrable reconciliation and handle deletion during creation#61951

Open
SameerMesiah97 wants to merge 1 commit intoapache:mainfrom
SameerMesiah97:61947-DataprocCreateClusterOperator-Recon-Fix
Open

DataprocCreateClusterOperator: Fix non-deferrable reconciliation and handle deletion during creation#61951
SameerMesiah97 wants to merge 1 commit intoapache:mainfrom
SameerMesiah97:61947-DataprocCreateClusterOperator-Recon-Fix

Conversation

@SameerMesiah97
Copy link
Contributor

Description

This change refactors the DataprocCreateClusterOperator.execute method to ensure cluster state reconciliation is consistently applied in non-deferrable mode.

After submitting the cluster creation request and waiting for the long-running operation (LRO) to complete, the operator now explicitly fetches the current cluster state and passes it through a dedicated _reconcile_cluster_state method before returning success.

The reconciliation logic, previously embedded inline in execute, has been consolidated into _reconcile_cluster_state. This method handles clusters in CREATING, DELETING, and STOPPED states by waiting, recreating, or restarting as appropriate.

Rationale

The operator docstring specifies that when use_if_exists=True, the operator should:

  • Wait if the cluster is in CREATING
  • Wait for deletion and then create a new cluster if in DELETING
  • Handle ERROR state appropriately

Although state-handling logic existed, the non-deferrable execution path previously returned immediately after the create LRO completed, preventing the existing reconciliation logic from being triggered in certain scenarios (e.g. cluster transitioning to DELETING during creation).

This change ensures the pre-existing reconciliation behavior is executed consistently, aligning runtime behavior with the documented contract.

Notes

  • Added explicit NotFound handling after the Long-Running Operation (LRO) completes to surface a clear AirflowException if the cluster was deleted before its state could be reconciled.
  • Additional logging has been added and some existing log messages have been clarified or cleaned up for improved observability during state transitions.
  • Comments/variable names have been added or clarified where appropriate.

Tests

Unit tests have been added to cover reconciliation scenarios:

  • CREATING: verifies the operator waits for creation to complete and transitions correctly to RUNNING.
  • DELETING: verifies the operator waits for deletion to complete and then re-creates the cluster.
  • DELETING (timeout): verifies the operator raises an AirflowException when the cluster remains in DELETING state and deletion is not triggered.
  • STOPPED: verifies the operator triggers cluster start logic.
  • ERROR: verifies error-state handling and deletion behavior when delete_on_error=True.

Existing tests have been updated to align with the new reconciliation flow and state handling behavior.

Backwards Compatibility

There is no intended change to the operator’s public contract. The implementation now consistently executes the previously defined reconciliation logic in non-deferrable mode.

Closes: #61947

@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Feb 15, 2026
Copy link
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mypy currently fails :(

@SameerMesiah97
Copy link
Contributor Author

mypy currently fails :(

Yes. I will fix it.

…iliation runs after creation completes.

– Extract reconciliation logic into `_reconcile_cluster_state()`
– Ensure DELETING state waits for deletion and re-creates the cluster
– Ensure CREATING state is fully reconciled before returning
– Handle STOPPED state via restart path
– Raise explicit exception if cluster is not found after LRO completion
– Return reconciled cluster to avoid stale state

Update and extend unit tests to cover reconciliation scenarios in the non-deferrable path (CREATING, DELETING, STOPPED, ERROR, and timeout cases).
@SameerMesiah97 SameerMesiah97 force-pushed the 61947-DataprocCreateClusterOperator-Recon-Fix branch from 613a843 to 747bf37 Compare February 15, 2026 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DataprocCreateClusterOperator incorrectly succeeds when cluster is deleted during creation (non-deferrable)

2 participants