Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent templated field logic checks in __init__ of operators automatically #33786

Merged
merged 2 commits into from
Jan 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
25 changes: 25 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,31 @@ repos:
# changes quickly - especially when we want the early modifications from the first local group
# to be applied before the non-local pre-commits are run
hooks:
- id: validate-operators-init
name: Prevent templated field logic checks in operators' __init__
language: python
entry: ./scripts/ci/pre_commit/pre_commit_validate_operators_init.py
pass_filenames: true
files: ^airflow/providers/.*/(operators|transfers|sensors)/.*\.py$
additional_dependencies: [ 'rich>=12.4.4' ]
# TODO: Handle the provider-specific exclusions and remove them from the list, see:
# https://github.com/apache/airflow/issues/36484
exclude: |
(?x)^(
^.*__init__\.py$|
^airflow\/providers\/google\/cloud\/operators\/bigquery\.py$|
^airflow\/providers\/amazon\/aws\/transfers\/gcs_to_s3\.py$|
^airflow\/providers\/databricks\/operators\/databricks\.py$|
^airflow\/providers\/google\/cloud\/operators\/cloud_storage_transfer_service\.py$|
^airflow\/providers\/google\/cloud\/transfers\/bigquery_to_mysql\.py$|
^airflow\/providers\/google\/cloud\/operators\/vertex_ai\/auto_ml\.py$|
^airflow\/providers\/amazon\/aws\/transfers\/redshift_to_s3\.py$|
^airflow\/providers\/google\/cloud\/operators\/compute\.py$|
^airflow\/providers\/google\/cloud\/operators\/vertex_ai\/custom_job\.py$|
^airflow\/providers\/cncf\/kubernetes\/operators\/pod\.py$|
^airflow\/providers\/amazon\/aws\/operators\/emr\.py$|
^airflow\/providers\/amazon\/aws\/operators\/eks\.py$
)$
- id: ruff
name: Run 'ruff' for extremely fast Python linting
description: "Run 'ruff' for extremely fast Python linting"
Expand Down
2 changes: 2 additions & 0 deletions STATIC_CODE_CHECKS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -414,6 +414,8 @@ require Breeze Docker image to be built locally.
+-----------------------------------------------------------+--------------------------------------------------------------+---------+
| update-version | Update version to the latest version in the documentation | |
+-----------------------------------------------------------+--------------------------------------------------------------+---------+
| validate-operators-init | Prevent templated field logic checks in operators' __init__ | |
+-----------------------------------------------------------+--------------------------------------------------------------+---------+
| yamllint | Check YAML files with yamllint | |
+-----------------------------------------------------------+--------------------------------------------------------------+---------+

Expand Down
1 change: 1 addition & 0 deletions dev/breeze/src/airflow_breeze/pre_commit_ids.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,5 +133,6 @@
"update-supported-versions",
"update-vendored-in-k8s-json-schema",
"update-version",
"validate-operators-init",
"yamllint",
]
82 changes: 82 additions & 0 deletions docs/apache-airflow/howto/custom-operator.rst
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,88 @@ Currently available lexers:

If you use a non-existing lexer then the value of the template field will be rendered as a pretty-printed object.

Limitations
^^^^^^^^^^^
To prevent misuse, the following limitations must be observed when defining and assigning templated fields in the
operator's constructor (when such exists, otherwise - see below):

1. Templated fields' corresponding parameters passed into the constructor must be named exactly
as the fields. The following example is invalid, as the parameter passed into the constructor is not the same as the
templated field:

.. code-block:: python

class HelloOperator(BaseOperator):
template_fields = "field_a"

def __init__(field_a_id) -> None: # <- should be def __init__(field_a)-> None
self.field_a = field_a_id # <- should be self.field_a = field_a

2. Templated fields' instance members must be assigned with their corresponding parameter from the constructor,
either by a direct assignment or by calling the parent's constructor (in which these fields are
defined as ``template_fields``) with explicit an assignment of the parameter.
The following example is invalid, as the instance member ``self.field_a`` is not assigned at all, despite being a
templated field:

.. code-block:: python

class HelloOperator(BaseOperator):
template_fields = ("field_a", "field_b")

def __init__(field_a, field_b) -> None:
self.field_b = field_b


The following example is also invalid, as the instance member ``self.field_a`` of ``MyHelloOperator`` is initialized
implicitly as part of the ``kwargs`` passed to its parent constructor:

.. code-block:: python

class HelloOperator(BaseOperator):
template_fields = "field_a"

def __init__(field_a) -> None:
self.field_a = field_a


class MyHelloOperator(HelloOperator):
template_fields = ("field_a", "field_b")

def __init__(field_b, **kwargs) -> None: # <- should be def __init__(field_a, field_b, **kwargs)
super().__init__(**kwargs) # <- should be super().__init__(field_a=field_a, **kwargs)
self.field_b = field_b

3. Applying actions on the parameter during the assignment in the constructor is not allowed.
Any action on the value should be applied in the ``execute()`` method.
Therefore, the following example is invalid:

.. code-block:: python

class HelloOperator(BaseOperator):
template_fields = "field_a"

def __init__(field_a) -> None:
self.field_a = field_a.lower() # <- assignment should be only self.field_a = field_a

When an operator inherits from a base operator and does not have a constructor defined on its own, the limitations above
do not apply. However, the templated fields must be set properly in the parent according to those limitations.

Thus, the following example is valid:

.. code-block:: python

class HelloOperator(BaseOperator):
template_fields = "field_a"

def __init__(field_a) -> None:
self.field_a = field_a


class MyHelloOperator(HelloOperator):
template_fields = "field_a"

The limitations above are enforced by a pre-commit named 'validate-operators-init'.

Add template fields with subclassing
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A common use case for creating a custom operator is for simply augmenting existing ``template_fields``.
Expand Down