Skip to content

Fix Google Dataflow hook failing import when apache-beam not installed#65659

Merged
shahar1 merged 3 commits intoapache:mainfrom
shahar1:fix/google-dataflow-beam-hard-import
Apr 22, 2026
Merged

Fix Google Dataflow hook failing import when apache-beam not installed#65659
shahar1 merged 3 commits intoapache:mainfrom
shahar1:fix/google-dataflow-beam-hard-import

Conversation

@shahar1
Copy link
Copy Markdown
Contributor

@shahar1 shahar1 commented Apr 22, 2026

DataflowHook imported BeamHook, BeamRunnerType, and beam_options_to_args at module level from airflow.providers.apache.beam. Since apache-airflow-providers-apache-beam is not a declared dependency of the google provider, this hard import causes a ModuleNotFoundError whenever the google provider is imported in an environment where beam is not installed.

The issue was discovered during the provider release process (#65614), where the check-provider-yaml-valid pre-commit hook imports all modules registered in provider.yaml to verify class registrations. When beam was absent from the environment, the hook aborted with:

ModuleNotFoundError: No module named 'airflow.providers.apache.beam'

The fix moves the two beam imports to lazy form inside the methods that actually require them (DataflowHook.__init__ and DataflowHook._build_gcloud_command), so importing the module has no dependency on beam being installed.


Was generative AI tooling used to co-author this PR?
  • Yes — GitHub Copilot (Claude Sonnet 4.6)

Generated-by: GitHub Copilot (Claude Sonnet 4.6) following the guidelines

@boring-cyborg boring-cyborg Bot added area:providers provider:google Google (including GCP) related issues labels Apr 22, 2026
Comment thread providers/google/src/airflow/providers/google/cloud/hooks/dataflow.py Outdated
shahar1 added 3 commits April 22, 2026 16:10
DataflowHook imported BeamHook, BeamRunnerType, and beam_options_to_args
at module level, making the entire google provider fail to import when
apache-airflow-providers-apache-beam is not installed.

apache-beam is not a declared dependency of the google provider, so this
hard import was always incorrect. The import error surfaced during the
provider release process (apache#65614) when the provider yaml check script
imports all registered provider modules and beam was not present in the
environment.

Move the imports to lazy form inside the two methods that actually use
them: DataflowHook.__init__ and DataflowHook._build_gcloud_command.
@shahar1 shahar1 force-pushed the fix/google-dataflow-beam-hard-import branch from 1c7648d to a39983b Compare April 22, 2026 13:14
@shahar1 shahar1 merged commit 578ab8e into apache:main Apr 22, 2026
93 checks passed
@boring-cyborg
Copy link
Copy Markdown

boring-cyborg Bot commented Apr 22, 2026

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@shahar1 shahar1 deleted the fix/google-dataflow-beam-hard-import branch April 22, 2026 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants