Skip to content

Fix DataprocCreateBatchOperator stuck in deferred state for a long time#67638

Open
VladaZakharova wants to merge 1 commit into
apache:mainfrom
VladaZakharova:dataproc-create-batch-fix
Open

Fix DataprocCreateBatchOperator stuck in deferred state for a long time#67638
VladaZakharova wants to merge 1 commit into
apache:mainfrom
VladaZakharova:dataproc-create-batch-fix

Conversation

@VladaZakharova
Copy link
Copy Markdown
Contributor

Fixes DataprocBatchTrigger so it reuses a single Dataproc batch async client while polling batch state.
DataprocCreateBatchOperator can remain stuck in deferred state even after the Dataproc batch has completed successfully.

This is similar to the issue recently fixed for DataprocSubmitJobOperator: #62082


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@VladaZakharova VladaZakharova requested a review from shahar1 as a code owner May 28, 2026 09:58
@boring-cyborg boring-cyborg Bot added area:providers provider:google Google (including GCP) related issues labels May 28, 2026
@potiuk potiuk added the ready for maintainer review Set after triaging when all criteria pass. label May 28, 2026

async def run(self):
hook = self.get_async_hook()
await hook.get_batch_client(region=self.region)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since get_batch() already calls get_batch_client(region) internally, do we need this explicit pre-loop get_batch_client() call? Would the first get_batch() call initialize and cache the client anyway?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:google Google (including GCP) related issues ready for maintainer review Set after triaging when all criteria pass.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants