Add BigQuery routine operators and existence sensor#65499
Add BigQuery routine operators and existence sensor#65499alamashir wants to merge 9 commits intoapache:mainfrom
Conversation
Introduces first-class Airflow operators for BigQuery routines (UDFs, stored procedures, table-valued and aggregate functions), so DAGs can own their routine definitions declaratively instead of embedding CREATE FUNCTION / CREATE PROCEDURE DDL inside BigQueryInsertJobOperator. New operators: * BigQueryCreateRoutineOperator (with if_exists: fail|skip|replace) * BigQueryUpdateRoutineOperator (explicit field mask) * BigQueryDeleteRoutineOperator (ignore_if_missing) * BigQueryGetRoutineOperator (routine resource via XCom) * BigQueryListRoutinesOperator (dataset-scoped list via XCom) New sensor: * BigQueryRoutineExistenceSensor New BigQueryHook methods wrap the google-cloud-bigquery client's routine APIs. No new provider dependency is added. Related: apache#47579
Covers success, failure and edge cases for the five new routine operators, the existence sensor, and the underlying hook methods (including if_exists fail/skip/replace, not_found_ok and missing routineReference validation).
Declare template_ext on BigQueryCreateRoutineOperator (.sql) and BigQueryUpdateRoutineOperator (.json, .sql) so users can pass file paths like "routines/add_one.sql" to templated fields and have Airflow render the file contents at runtime.
Restores the pre-existing duplicate license header in bigquery.rst so this PR's diff contains only the routine-operator additions.
BigQuery's routines.update REST API is a full-resource PUT (not a PATCH), so sending only the changed fields fails with "Routine type must be specified". The hook now fetches the existing routine, merges the requested field changes, and writes the complete resource back. Also replace the gapic _MethodDefault retry sentinel with the BigQuery client's DEFAULT_RETRY callable on all five routine operators -- the handwritten BQ client calls retry(call) directly and cannot accept the sentinel. Tested end-to-end against a live GCP project with the example_bigquery_routines system DAG (all 11 lifecycle tasks pass).
f6e5cba to
f71f50b
Compare
CI's provider-docs consistency check failed because the new how-to guide page was added to the docs directory but not listed in provider.yaml.
Generated from provider.yaml by the update-providers-build-files prek hook after adding the bigquery_routines.rst how-to guide.
|
@alamashir This PR has been converted to draft because it does not yet meet our Pull Request quality criteria. Issues found:
What to do next:
Converting a PR to draft is not a rejection — it is an invitation to bring the PR up to the project's standards so that maintainer review time is spent productively. There is no rush — take your time and work at your own pace. We appreciate your contribution and are happy to wait for updates. If you have questions, feel free to ask on the Airflow Slack. |
|
Quick follow-up to the triage comment above — one clarification on the "Unresolved review comments" item: Once you believe a thread has been addressed — whether by pushing a fix, or by replying in-thread with an explanation of why the suggestion doesn't apply — please mark the thread as resolved yourself by clicking the "Resolve conversation" button at the bottom of each thread. Reviewers don't auto-close their own threads, so an addressed-but-unresolved thread reads as "still waiting on the author" and keeps the PR from moving forward. The author doing the resolve-click is the expected convention on this project. |
|
Hi @potiuk — thanks for the triage and the clarification on thread-resolution convention. Addressed:
Marking ready for review. |
Summary
Adds first-class Airflow operators for BigQuery routines (user-defined functions, stored procedures, table-valued and aggregate functions), so DAGs can own their routine definitions declaratively instead of embedding
CREATE FUNCTION/CREATE PROCEDUREDDL insideBigQueryInsertJobOperator.New operators (
providers/google/src/airflow/providers/google/cloud/operators/bigquery.py):BigQueryCreateRoutineOperator— creates any routine type;if_existsoffail|skip|replaceBigQueryUpdateRoutineOperator— updates selected fields on an existing routine. Because BigQuery'sroutines.updateREST API is a full-resource PUT (not a PATCH), the hook fetches the current routine, merges the requested changes, and writes the complete resource back.BigQueryDeleteRoutineOperator—ignore_if_missingtoggleBigQueryGetRoutineOperator— pushes the serializedRoutineresource to XComBigQueryListRoutinesOperator— dataset-scoped list via XComNew sensor (
sensors/bigquery.py):BigQueryRoutineExistenceSensorNew hook methods on
BigQueryHook:create_routine,update_routine,delete_routine,get_routine,list_routines. They wrapgoogle-cloud-bigquery's existing routine client APIs — no new provider dependency.Docs: new
bigquery_routines.rstguide with per-routine-type examples, cross-linked frombigquery.rst.System test:
example_bigquery_routines.pyexercises the full lifecycle (create scalar UDF / procedure / TVF → sensor → update → get → list → delete).Unit tests: 24 tests covering success, failure and edge cases (e.g.
if_existsbranches, missingroutineReference,not_found_ok).closes: #65467
related: #47579
Test plan
uv run --project providers/google pytest providers/google/tests/unit/google/cloud/{hooks,operators,sensors}/test_bigquery.py -k Routine— 24 passedruff format+ruff checkclean on all touched filesexample_bigquery_routines.py) run end-to-end against a live GCP project — passed (DAG statesuccess, ~5 min run). The system run exercised every operator and the sensor against real BigQuery:create_dataset→ created a fresh datasetcreate_scalar_routine/create_procedure/create_tvf→ created SCALAR_FUNCTION, PROCEDURE, and TABLE_VALUED_FUNCTION routines viaBigQueryCreateRoutineOperatorwait_for_routine→BigQueryRoutineExistenceSensorpolled and returned True once the scalar UDF was visibleupdate_routine→ mutated the scalar UDF's description viaBigQueryUpdateRoutineOperator(this is what flushed out the PUT-vs-PATCH semantic and the retry-sentinel bug fixed in this PR)get_routine/list_routines→ fetched the updated routine and listed all three routinesdelete_scalar_routine/delete_procedure/delete_tvf→ deleted each routine viaBigQueryDeleteRoutineOperatordelete_dataset→ tore down the datasetwatchertask (trigger_rule=ONE_FAILED) did not fire, confirming no upstream failuresWas generative AI tooling used to co-author this PR?
Generated-by: Claude Opus 4.7 (Claude Code) following the guidelines