Description
Add first-class operators for managing BigQuery routines — scalar UDFs, stored procedures, table-valued functions, aggregate functions, remote functions, and Spark stored procedures — in the Google provider.
Today the only way to deploy a BigQuery routine from Airflow is to embed raw CREATE FUNCTION / CREATE PROCEDURE DDL inside BigQueryInsertJobOperator. The google-cloud-bigquery client already exposes create_routine, get_routine, update_routine, delete_routine, and list_routines, but BigQueryHook does not wrap them.
Proposed new operators in airflow.providers.google.cloud.operators.bigquery:
BigQueryCreateRoutineOperator — supports every routine type; if_exists={"skip","replace","fail"}
BigQueryUpdateRoutineOperator
BigQueryDeleteRoutineOperator
BigQueryGetRoutineOperator
BigQueryListRoutinesOperator
New sensor in airflow.providers.google.cloud.sensors.bigquery:
BigQueryRoutineExistenceSensor
Plus five matching hook methods in BigQueryHook. No new provider dependency — everything is already shipped with google-cloud-bigquery.
Use case/motivation
Deploying a routine and a pipeline that depends on it today requires either (a) hand-rolled DDL in BigQueryInsertJobOperator with no idempotency beyond CREATE OR REPLACE, no typed validation, and no structured deployment record, or (b) managing routines out-of-band via Terraform (google_bigquery_routine) or dbt hacks — which splits routine deployment from the pipeline that consumes it.
First-class operators let a single DAG own both the routine definition and the pipeline that uses it, with idempotent re-runs and typed kwargs that validate at author time instead of at job execution. Remote functions (BQ → Cloud Functions/Cloud Run) and Spark stored procedures (BQ → Dataproc Serverless) particularly benefit, since both tightly couple routine deployment to other infra.
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
Description
Add first-class operators for managing BigQuery routines — scalar UDFs, stored procedures, table-valued functions, aggregate functions, remote functions, and Spark stored procedures — in the Google provider.
Today the only way to deploy a BigQuery routine from Airflow is to embed raw
CREATE FUNCTION/CREATE PROCEDUREDDL insideBigQueryInsertJobOperator. Thegoogle-cloud-bigqueryclient already exposescreate_routine,get_routine,update_routine,delete_routine, andlist_routines, butBigQueryHookdoes not wrap them.Proposed new operators in
airflow.providers.google.cloud.operators.bigquery:BigQueryCreateRoutineOperator— supports every routine type;if_exists={"skip","replace","fail"}BigQueryUpdateRoutineOperatorBigQueryDeleteRoutineOperatorBigQueryGetRoutineOperatorBigQueryListRoutinesOperatorNew sensor in
airflow.providers.google.cloud.sensors.bigquery:BigQueryRoutineExistenceSensorPlus five matching hook methods in
BigQueryHook. No new provider dependency — everything is already shipped withgoogle-cloud-bigquery.Use case/motivation
Deploying a routine and a pipeline that depends on it today requires either (a) hand-rolled DDL in
BigQueryInsertJobOperatorwith no idempotency beyondCREATE OR REPLACE, no typed validation, and no structured deployment record, or (b) managing routines out-of-band via Terraform (google_bigquery_routine) or dbt hacks — which splits routine deployment from the pipeline that consumes it.First-class operators let a single DAG own both the routine definition and the pipeline that uses it, with idempotent re-runs and typed kwargs that validate at author time instead of at job execution. Remote functions (BQ → Cloud Functions/Cloud Run) and Spark stored procedures (BQ → Dataproc Serverless) particularly benefit, since both tightly couple routine deployment to other infra.
Related issues
No response
Are you willing to submit a PR?
Code of Conduct