Skip to content

Add BigQuery routine operators and existence sensor#65499

Open
alamashir wants to merge 9 commits intoapache:mainfrom
alamashir:feat/bigquery-routine-operators
Open

Add BigQuery routine operators and existence sensor#65499
alamashir wants to merge 9 commits intoapache:mainfrom
alamashir:feat/bigquery-routine-operators

Conversation

@alamashir
Copy link
Copy Markdown
Contributor

@alamashir alamashir commented Apr 19, 2026

Summary

Adds first-class Airflow operators for BigQuery routines (user-defined functions, stored procedures, table-valued and aggregate functions), so DAGs can own their routine definitions declaratively instead of embedding CREATE FUNCTION / CREATE PROCEDURE DDL inside BigQueryInsertJobOperator.

New operators (providers/google/src/airflow/providers/google/cloud/operators/bigquery.py):

  • BigQueryCreateRoutineOperator — creates any routine type; if_exists of fail | skip | replace
  • BigQueryUpdateRoutineOperator — updates selected fields on an existing routine. Because BigQuery's routines.update REST API is a full-resource PUT (not a PATCH), the hook fetches the current routine, merges the requested changes, and writes the complete resource back.
  • BigQueryDeleteRoutineOperatorignore_if_missing toggle
  • BigQueryGetRoutineOperator — pushes the serialized Routine resource to XCom
  • BigQueryListRoutinesOperator — dataset-scoped list via XCom

New sensor (sensors/bigquery.py):

  • BigQueryRoutineExistenceSensor

New hook methods on BigQueryHook: create_routine, update_routine, delete_routine, get_routine, list_routines. They wrap google-cloud-bigquery's existing routine client APIs — no new provider dependency.

Docs: new bigquery_routines.rst guide with per-routine-type examples, cross-linked from bigquery.rst.

System test: example_bigquery_routines.py exercises the full lifecycle (create scalar UDF / procedure / TVF → sensor → update → get → list → delete).

Unit tests: 24 tests covering success, failure and edge cases (e.g. if_exists branches, missing routineReference, not_found_ok).

closes: #65467
related: #47579

Test plan

  • uv run --project providers/google pytest providers/google/tests/unit/google/cloud/{hooks,operators,sensors}/test_bigquery.py -k Routine — 24 passed
  • ruff format + ruff check clean on all touched files
  • System test (example_bigquery_routines.py) run end-to-end against a live GCP project — passed (DAG state success, ~5 min run). The system run exercised every operator and the sensor against real BigQuery:
    • create_dataset → created a fresh dataset
    • create_scalar_routine / create_procedure / create_tvf → created SCALAR_FUNCTION, PROCEDURE, and TABLE_VALUED_FUNCTION routines via BigQueryCreateRoutineOperator
    • wait_for_routineBigQueryRoutineExistenceSensor polled and returned True once the scalar UDF was visible
    • update_routine → mutated the scalar UDF's description via BigQueryUpdateRoutineOperator (this is what flushed out the PUT-vs-PATCH semantic and the retry-sentinel bug fixed in this PR)
    • get_routine / list_routines → fetched the updated routine and listed all three routines
    • delete_scalar_routine / delete_procedure / delete_tvf → deleted each routine via BigQueryDeleteRoutineOperator
    • delete_dataset → tore down the dataset
    • The DAG's watcher task (trigger_rule=ONE_FAILED) did not fire, confirming no upstream failures

Was generative AI tooling used to co-author this PR?
  • Yes — Claude Opus 4.7 (Claude Code)

Generated-by: Claude Opus 4.7 (Claude Code) following the guidelines

Introduces first-class Airflow operators for BigQuery routines (UDFs,
stored procedures, table-valued and aggregate functions), so DAGs can
own their routine definitions declaratively instead of embedding
CREATE FUNCTION / CREATE PROCEDURE DDL inside BigQueryInsertJobOperator.

New operators:

* BigQueryCreateRoutineOperator (with if_exists: fail|skip|replace)
* BigQueryUpdateRoutineOperator (explicit field mask)
* BigQueryDeleteRoutineOperator (ignore_if_missing)
* BigQueryGetRoutineOperator (routine resource via XCom)
* BigQueryListRoutinesOperator (dataset-scoped list via XCom)

New sensor:

* BigQueryRoutineExistenceSensor

New BigQueryHook methods wrap the google-cloud-bigquery client's
routine APIs. No new provider dependency is added.

Related: apache#47579
Covers success, failure and edge cases for the five new routine
operators, the existence sensor, and the underlying hook methods
(including if_exists fail/skip/replace, not_found_ok and missing
routineReference validation).
Declare template_ext on BigQueryCreateRoutineOperator (.sql) and
BigQueryUpdateRoutineOperator (.json, .sql) so users can pass file
paths like "routines/add_one.sql" to templated fields and have Airflow
render the file contents at runtime.
Restores the pre-existing duplicate license header in bigquery.rst so
this PR's diff contains only the routine-operator additions.
BigQuery's routines.update REST API is a full-resource PUT (not a PATCH),
so sending only the changed fields fails with "Routine type must be
specified". The hook now fetches the existing routine, merges the
requested field changes, and writes the complete resource back.

Also replace the gapic _MethodDefault retry sentinel with the
BigQuery client's DEFAULT_RETRY callable on all five routine
operators -- the handwritten BQ client calls retry(call) directly
and cannot accept the sentinel.

Tested end-to-end against a live GCP project with the
example_bigquery_routines system DAG (all 11 lifecycle tasks pass).
@alamashir alamashir force-pushed the feat/bigquery-routine-operators branch from f6e5cba to f71f50b Compare April 19, 2026 21:33
CI's provider-docs consistency check failed because the new
how-to guide page was added to the docs directory but not
listed in provider.yaml.
@alamashir alamashir marked this pull request as ready for review April 19, 2026 23:17
@alamashir alamashir requested a review from shahar1 as a code owner April 19, 2026 23:17
Generated from provider.yaml by the update-providers-build-files
prek hook after adding the bigquery_routines.rst how-to guide.
Comment thread providers/google/provider.yaml
@potiuk
Copy link
Copy Markdown
Member

potiuk commented Apr 22, 2026

@alamashir This PR has been converted to draft because it does not yet meet our Pull Request quality criteria.

Issues found:

  • Unresolved review comments (3 threads from maintainers): please walk through each unresolved review thread. Even if a suggestion looks incorrect or irrelevant — and some of them will be, especially any comments left by automated reviewers like GitHub Copilot — it is still the author's responsibility to respond: apply the fix, reply in-thread with a brief explanation of why the suggestion does not apply, or resolve the thread if the feedback is no longer relevant. Leaving threads unaddressed for weeks blocks the PR from moving forward.

What to do next:

  • Walk through each unresolved review thread and respond as described above.
  • Make sure static checks pass locally: prek run --from-ref main --stage pre-commit.
  • Mark the PR as "Ready for review" when you're done.

Converting a PR to draft is not a rejection — it is an invitation to bring the PR up to the project's standards so that maintainer review time is spent productively. There is no rush — take your time and work at your own pace. We appreciate your contribution and are happy to wait for updates. If you have questions, feel free to ask on the Airflow Slack.

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Apr 22, 2026

Quick follow-up to the triage comment above — one clarification on the "Unresolved review comments" item:

Once you believe a thread has been addressed — whether by pushing a fix, or by replying in-thread with an explanation of why the suggestion doesn't apply — please mark the thread as resolved yourself by clicking the "Resolve conversation" button at the bottom of each thread. Reviewers don't auto-close their own threads, so an addressed-but-unresolved thread reads as "still waiting on the author" and keeps the PR from moving forward. The author doing the resolve-click is the expected convention on this project.

@alamashir
Copy link
Copy Markdown
Contributor Author

Hi @potiuk — thanks for the triage and the clarification on thread-resolution convention.

Addressed:

  • All 3 review threads resolved (each had an in-thread reply).
  • prek run --from-ref main --stage pre-commit run locally — all hooks passed.

Marking ready for review.

@alamashir alamashir marked this pull request as ready for review April 23, 2026 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers kind:documentation provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add BigQuery routine operators (procedures, UDFs, TVFs, remote & Spark)

3 participants