Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create UndefinedJinjaVariablesRule to ease upgrade to 2.0 #11144

Closed
potiuk opened this issue Sep 25, 2020 · 8 comments · Fixed by #11241
Closed

Create UndefinedJinjaVariablesRule to ease upgrade to 2.0 #11144

potiuk opened this issue Sep 25, 2020 · 8 comments · Fixed by #11241
Assignees
Labels
area:upgrade Facilitating migration to a newer version of Airflow good first issue kind:feature Feature Requests

Comments

@potiuk
Copy link
Member

potiuk commented Sep 25, 2020

This issue is part of #8765

Rule

Create UndefinedJinjaVariablesRule which corresponds to

Prior to Airflow 2.0 Jinja Templates would permit the use of undefined variables. They would render as an
empty string, with no indication to the user an undefined variable was used. With this release, any template
rendering involving undefined variables will fail the task, as well as displaying an error in the UI when
rendering.

entry in UPDATING.md. Introduced in #11016
This rule should check if any of the jinja templates use undefined variables. This might not be always possible, as this might involve the dynamic generation of some context variables, but likely we can at least print some warning or "double-check this" statements.

How to guide

To implement a new rule, create a class that inherits from airflow.upgrade.rules.base_rule.BaseRule.
It will be auto-registered and used by airflow upgrade-check command. The custom rule class has to have title,
description properties and should implement check method which returns a list of error messages in case of
incompatibility.

For example:

class ConnTypeIsNotNullableRule(BaseRule):
title = "Connection.conn_type is not nullable"
description = """\
The `conn_type` column in the `connection` table must contain content. Previously, this rule was \
enforced by application logic, but was not enforced by the database schema.
If you made any modifications to the table directly, make sure you don't have null in the conn_type column.\
"""
@provide_session
def check(self, session=None):
invalid_connections = session.query(Connection).filter(Connection.conn_type.is_(None))
return (
'Connection<id={}", conn_id={}> have empty conn_type field.'.format(conn.id, conn.conn_id)
for conn in invalid_connections
)

Remember to open the PR against v1-10-test branch.

@potiuk potiuk added the kind:feature Feature Requests label Sep 25, 2020
@mik-laj mik-laj added the area:upgrade Facilitating migration to a newer version of Airflow label Sep 25, 2020
@potiuk potiuk added this to To do in Airflow upgrade check via automation Sep 26, 2020
@ashmeet13
Copy link
Contributor

Hi, I am new to both Airflow and contributing to FOSS.
Saw this issue unpicked and I would like to pick this up.

I had one quick query -
The base branch from which I create a new feature branch in my fork would be v1-10-test?

Thanks! Looking forward to contribute and learn more!

@potiuk
Copy link
Member Author

potiuk commented Sep 28, 2020

Yep.

@turbaszek turbaszek moved this from To do to In progress in Airflow upgrade check Sep 28, 2020
@ashmeet13
Copy link
Contributor

I had a few doubts asking them here -

Currently I was writing the following logic for check -
For a given dag -

  1. It will gather all the tasks - dag.tasks
  2. Iterate over the task list and for every task
    - Get Jinja2 environment from task.get_template_env()
    - Find Jinja2 variables for all task.template_fields
    - Verify that all the above gathered variables are present in task.params
  3. Return a list of messages for every missing variable in task.params

Below is the script version

            for task in dag.tasks:
                env = task.get_template_env()
                params = set(task.params.keys())
                for attr_name in task.template_fields:
                    content = getattr(task, attr_name)
                    parsed = env.parse(content)
                    template_varaibles = meta.find_undeclared_variables(parsed) # Returns a set of variables present in the template
                    undefined_variables = template_varaibles-params # Set difference

Does this logic look like something we could follow?
And if yes then how can I write this logic to check for every DAG?

I did have a look at DagBag but I do not know how should I write a test for it.
Sorry for the newbie questions - would be really helpful if you could guide on these points.

@ashb
Copy link
Member

ashb commented Sep 29, 2020

@ashmeet13 That check looks like a good start yes, but not quite there. I don't think your params is right. TaskInstance.get_template_context() is the parameters that templates are executed in, and for that you will need to create a TaskInstance (it can be in memory, no need to write it to the DB)

@ashb
Copy link
Member

ashb commented Sep 29, 2020

Also check our BaseOperator.render_template for all the possible locations/field types we render templates on. You'll need to replicate much of that logic to handle all the types (inside lists, dicts, etc. etc.)

@ashmeet13
Copy link
Contributor

Got it - thank you so much. Will update on the progress or in case of any further doubts!

@ashmeet13
Copy link
Contributor

ashmeet13 commented Sep 30, 2020

Some progress that I have made. Thanks @ashb for guiding towards TaskInstance for getting the complete context.
Also towards BaseOperator.render_template

With this I was able to improve the logic to extract undefined variables in a template to this -

def check_rendered(rendered):
    if isinstance(rendered, six.string_types):
        return set(re.findall(r"{{(.*?)}}", rendered))

    elif isinstance(rendered, (tuple,list,set)):
        parsed_templates = set()
        for element in rendered:
            parsed_templates.union(check_rendered(element))
        return parsed_templates

    elif isinstance(rendered, dict):
        parsed_templates = set()
        for key, value in rendered.items():
            parsed_templates.union(check_rendered(value))
        return parsed_templates 



def check(dag):
    dag.template_undefined = jinja2.DebugUndefined
    for task in dag.tasks:
        jinja_env = task.get_template_env()
        task_instance = TaskInstance(task=task, execution_date=timezone.utcnow())
        template_context = task_instance.get_template_context()

        for attr_name in task.template_fields:
            content = getattr(task, attr_name)
            if content:
                rendered = task.render_template(content, template_context)
                undefined = check_rendered(rendered)
                for element in undefined:
                    print(element) 

Take into the consideration the following DAG -

dag = DAG(
    'tutorial',
    default_args=default_args,
    description='A simple tutorial DAG',
    schedule_interval=timedelta(days=1),
)

templated_command_string = """
{% for i in range(5) %}
    echo "{{ params.defined }}"
    echo "{{ execution_date.month }}"
    echo "{{ params.undefined }}"
    echo "{{ foo }}"
{% endfor %}
"""

t1 = BashOperator(
    task_id='templated_string',
    depends_on_past=False,
    bash_command=templated_command_string,
    params={'defined': 'test_val'},
    dag=dag,
)

The above check would print/return this -

foo
no such element: dict object['undefined'] 

One blocker still that I have is how exactly will I be able to get the dags when the user runs airflow upgrade-check
Thanks again for the help!

Edited: Adding the Stack Overflow source I used

@potiuk potiuk linked a pull request Oct 3, 2020 that will close this issue
ashmeet13 added a commit to ashmeet13/airflow that referenced this issue Oct 6, 2020
Part of Issue apache#8765 - adding a rule to check for
undefined jinja variables when upgrading to Airflow2.0
Logic - Use a DagBag to pull all dags and iterate over
every dag. For every dag the task will be rendered using
an updated Jinja Environment using - jinja2.DebugUndefined
This will render the template leaving undefined variables
as they were. Using regex we can extract the variables
and present possible error cases when upgrading.
ashmeet13 added a commit to ashmeet13/airflow that referenced this issue Oct 8, 2020
Adding a rule to check for undefined jinja variables when upgrading to Airflow2.0
Logic - Use a DagBag to pull all dags and iterate over
every dag. For every dag the task will be rendered using
an updated Jinja Environment using - jinja2.DebugUndefined
This will render the template leaving undefined variables
as they were. Using regex we can extract the variables
and present possible error cases when upgrading.
ashmeet13 added a commit to ashmeet13/airflow that referenced this issue Oct 23, 2020
Adding a rule to check for undefined jinja variables when upgrading to Airflow2.0
Logic - Use a DagBag to pull all dags and iterate over
every dag. For every dag the task will be rendered using
an updated Jinja Environment using - jinja2.DebugUndefined
This will render the template leaving undefined variables
as they were. Using regex we can extract the variables
and present possible error cases when upgrading.
ashmeet13 added a commit to ashmeet13/airflow that referenced this issue Nov 19, 2020
Adding a rule to check for undefined jinja variables when upgrading to Airflow2.0
turbaszek pushed a commit that referenced this issue Nov 19, 2020
Adding a rule to check for undefined jinja variables when upgrading to Airflow2.0
@turbaszek
Copy link
Member

Closed via #11241

Airflow upgrade check automation moved this from In progress to Done Nov 19, 2020
kaxil pushed a commit that referenced this issue Nov 20, 2020
Adding a rule to check for undefined jinja variables when upgrading to Airflow2.0

(cherry picked from commit 18100a0)
kaxil pushed a commit that referenced this issue Nov 21, 2020
Adding a rule to check for undefined jinja variables when upgrading to Airflow2.0

(cherry picked from commit 18100a0)
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this issue Mar 5, 2021
)

Adding a rule to check for undefined jinja variables when upgrading to Airflow2.0

(cherry picked from commit 18100a0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:upgrade Facilitating migration to a newer version of Airflow good first issue kind:feature Feature Requests
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

6 participants