Skip to content

Confusing paragraph in Airflow Bestpractice documentaion (Airflow Variables) #33993

@KadeRyu

Description

@KadeRyu

What do you see as an issue?

In the Airflow Best Practices article, I found a paragraph that was confusing to readers:


Bad example:

from airflow.models import Variable

foo_var = Variable.get("foo")  # DON'T DO THAT
bash_use_variable_bad_1 = BashOperator(
    task_id="bash_use_variable_bad_1", bash_command="echo variable foo=${foo_env}", env={"foo_env": foo_var}
)

bash_use_variable_bad_2 = BashOperator(
    task_id="bash_use_variable_bad_2",
    bash_command=f"echo variable foo=${Variable.get('foo')}",  # DON'T DO THAT
)

bash_use_variable_bad_3 = BashOperator(
    task_id="bash_use_variable_bad_3",
    bash_command="echo variable foo=${foo_env}",
    env={"foo_env": Variable.get("foo")},  # DON'T DO THAT
)

Good example:

bash_use_variable_good = BashOperator(
    task_id="bash_use_variable_good",
    bash_command="echo variable foo=${foo_env}",
    env={"foo_env": "{{ var.value.get('foo') }}"},
)
@task
def my_task():
    var = Variable.get("foo")  # this is fine, because func my_task called only run task, not scan DAGs.
    print(var)

In the "Bad Example" section, it states that Variable.get("foo") should not be specified in an Operator. Instead, it recommends calling it using a Jinja template, as described in the "Good Example" section.
However, it explains that it is possible to call it in the @task operator. This might make it seem like users who want to call the Variable.get() method shouldn't do so in a traditional Operator.

Secondly, to simplify the code, the Operators are not in the DAG and are not indented, so there is room for interpretation that the BashOperators in the bad example are outside the DAG (top-level code), and therefore the Variable.get() method is also top-level.

Solving the problem

I think some sentences should be added to explain why Variable.get() methods called from inside an Operator are a bad example.

I don't know enough about the logic that the scheduler is parsing to make a precise suggestion, but I think there are cases like this

  • If there is no case where you need to use the Variable.get() method inside an Operator, and using a Jinja template is absolutely fine: add to the documentation why calling Variable.get() inside an Operator is recognized as top-level code, and calling Variable.get() inside a @task operator is fine.
  • When it's okay to use the Variable.get() method inside an Operator: add appropriate examples to the documentation

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions